329 114 36MB
English Pages [1341] Year 2016
C programming THE TUTORIAL
Thomas Gabriel
Copyright © 2002,2016 All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the written permission of the author. For information regarding permissions, write to [email protected] or [email protected]. ISBN: 978-2-9551114-2-0 Library of Congress Cataloging-in-Publication Data Thomas Gabriel C Programming: The Tutorial Cover Design: Najat Younsi/Thomas Gabriel. Disclaimer: Even though the author and the publisher have taken care in the preparation of this book, they assume no responsibilities for errors or omissions that might have been crept into it, and make no expressed or implied warranty of any kind. No liability is assumed for damages or negatives consequences coming from the use of the information or programs contained within the book. The examples contained within the book are intended for learning purposes not to be used as-is in professional environments. Contact: [email protected] or [email protected] Trademarks: BSD is a trademark of University of California, Berkeley, USA Solaris and NFS are registered trademarks of Oracle Corporation AIX is a registered trademark of International Business Machines Corporation POSIX is a registered trademark of The Institute of Electrical and Electronic Engineers, Inc. UNIX is a registered trademark of The Open Group Linux is a registered trademark of Linus Torvalds. X Window is a trademark of the Massachusetts Institute of Technology Microsoft Windows and MS-DOS are trademarks of Microsoft Corporation,
HP-UX is a registered trademark of Hewlett-Packard Company
Release 1.1
To Catherine for whom my love goes beyond the words for expressing it
CONTENTS PART I C PROGRAMMING CHAPTER I OVERVIEW I.1 Introduction I.2 The very first step I.3 Variables I.4 Comments I.5 Operations I.6 Control flow I.7 Functions I.8 Macros I.9 Line continuation I.10 Portability CHAPTER II BASIC TYPES AND VARIABLES II.1 Introduction II.2 Numeral systems II.3 Data representation II.4 Literals II.5 Variables II.6 Basic types II.7 Types of constants II.8 Type qualifiers II.9 Aliasing types II.10 Compatible types II.11 Conversions II.12 Exercises CHAPTER III ARRAYS, POINTERS AND STRINGS III.1 Introduction III.2 Arrays III.3 Pointers III.4 Strings III.5 Arrays are not pointers III.6 malloc(), realloc() and calloc() III.7 Emulating multidimensional arrays with pointers III.8 Array of pointers, pointer to array and pointer to pointer III.9 Variable-length arrays and variably modified types III.10 Creating types from array and pointer types III.11 Qualified pointer types
III.12 Compatible types III.13 Data alignment III.14 Conversions III.15 Exercises CHAPTER IV OPERATORS IV.1 Introduction IV.2 Arithmetic operators IV.3 Relational operators IV.4 Equality operators IV.5 Logical operators IV.6 Bitwise operators IV.7 Address and dereferencing operators IV.8 Increment and decrement operators IV.9 lvalue IV.10 Assignment operators IV.11 Ternary conditional operator IV.12 Comma operator IV.13 Operator precedence IV.14 Type conversion IV.15 Constant expressions IV.16 Exercises CHAPTER V CONTROL FLOW V.1 Introduction V.2 Statements V.3 if statement V.4 continue V.5 break V.6 goto V.7 Nested loops V.8 Exercises CHAPTER VI USER-DEFINED TYPES VI.1 Introduction VI.2 Enumerations VI.3 Structures VI.4 unions VI.5 Alignments VI.6 Compatible types VI.7 Conversions VI.8 Exercises CHAPTER VII FUNCTIONS VII.1 Introduction VII.2 Definition
VII.3 Function calls VII.4 Return statement, part1 VII.5 Function declarations VII.6 Scope of identifiers VII.7 Storage duration VII.8 Compound literals VII.9 Object initializations VII.10 Return statement, part2 VII.11 Default argument promotions VII.12 Function type compatibility VII.13 Conversions VII.14 Call-by-value VII.15 Call-by-reference VII.16 Passing arrays VII.17 Variable-length arrays and variably modified types VII.18 Type qualifiers VII.19 Recursive functions VII.20 Pointer to function VII.21 Understanding C declarations VII.22 Pointers to functions as structure members VII.23 functions and void * VII.24 Parameters declared as void * VII.25 Side effects VII.26 Compound statements VII.27 Inline functions and macros VII.28 Variable number of parameters VII.29 Some useful macros VII.30 main() function VII.31 exit() function VII.32 Exercises CHAPTER VIII C MODULES VIII.1 Introduction VIII.2 Overview VIII.3 Writing Source Files VIII.4 Header Files VIII.5 Separate Compilation VIII.6 Declaration, definition, initialization and prototype VIII.7 Scope of user-defined types VIII.8 Default argument promotions VIII.9 Compatible structure, union and enumerated types VIII.10 An example VIII.11 Encapsulation
VIII.12 Exercise CHAPTER IX INTERNATIONALIZATION IX.1 Locales IX.2 Categories IX.3 setlocale IX.4 localeconv() IX.5 Character encodings IX.6 Terminal settings IX.7 strcoll() and strxfm() IX.8 Conversion functions IX.9 Functions manipulating wide characters CHAPTER X INPUT/OUTPUT X.1 Introduction X.2 Files X.3 closing a file X.4 Reading a file X.5 Writing to a file X.6 Position indicator X.7 Managing errors X.8 Buffers X.9 freopen() X.10 Standard input, standard input, standard error X.11 Removing a file X.12 Renaming a file X.13 Temporary files X.14 Wide and Multibyte I/O functions X.15 Exercises CHAPTER XI STANDARD C LIBRARY XI.1 Introduction XI.2 XI.3 : character handling functions XI.4 XI.5 XI.6 XI.7 XI.8 XI.9 XI.10 XI.11 XI.12 XI.13 XI.14
XI.15 XI.16 : wide character handling functions XI.17 CHAPTER XII C11 XII.1 Introduction XII.2 Generic selection XII.3 Exclusive open mode XII.4 Anonymous unions and structures XII.5 Static assertion XII.6 No-return functions XII.7 Complex XII.8 Alignment XII.9 Bounds-checking functions PART II TOOLS CHAPTER XIII COMPILING C PROGRAMS XIII.1 Introduction XIII.2 Compilation Phases XIII.3 Preprocessing XIII.4 Lexical analysis XIII.5 Syntax analysis XIII.6 Semantic analysis XIII.7 Assembly code XIII.8 Assembly XIII.9 Linking XIII.10 Compilers and Interpreters XIII.11 Compiler Driver XIII.12 Compiling C Programs XIII.13 GNU gcc XIII.14 Writing Source Files XIII.15 Header Files XIII.16 Separate compilation XIII.17 Warning Messages XIII.18 Libraries CHAPTER XIV MAKEFILE XIV.1 Introduction XIV.2 Invocation XIV.3 Makefile XIV.4 Rules XIV.5 Dependency graph XIV.6 Macros XIV.7 Implicit rules XIV.8 Controlling make behavior
XIV.9 Recursive make XIV.10 Using multiple rules for one target XIV.11 Multiple targets in the same rule XIV.12 Continuation line XIV.13 Compiling C programs with make XIV.14 Dependency graph CHAPTER XV PROGRAMMING TOOLS XV.1 Introduction XV.2 Lint and splint XV.3 Time XV.4 Prof and gprof XV.5 GDB XV.6 Maintaining file versions
LIST OF FIGURES Figure II‑1 Byte ordering: Big-endian and Little-endian Figure II‑2 Piece of data in main memory Figure II‑3 Symbolic representation of a variable Figure II‑4 One’s complement Figure II‑5 Two’s complement Figure II‑6 Padding bits Figure II‑7 Ranges of normalized and denormalized floating-point numbers Figure II‑8 Binary floating-point representation Figure III‑1 Memory layout of the array age[5] Figure III‑2 Representation of the array age after initialization Figure III‑3 Two-dimension array arr[2][3] viewed as a table Figure III‑4 Memory layout of a two-dimension array arr[2][3] Figure III‑5 Three-Dimensional array arr[2][2][3] in a matrix representation Figure III‑6 Memory layout of the three-Dimensional array arr[2][2][3] Figure III‑7 Representation of a pointer Figure III‑8 Relationship between a pointer and the object it references Figure III‑9 Memory allocation with malloc() Figure III‑10 Representation of a pointer to int Figure III‑11 Pointers p and q referencing the same object Figure III‑12 Initialization of an array with a string literal Figure III‑13 Initialization of a pointer with a string literal Figure III‑14 Representation of an array and a pointer Figure III‑15 Pointer to pointer to int: int **p Figure III‑16 Pointer to pointer to strings Figure III‑17 Representation of char arr[2][3] Figure III‑18 Representation of char **arr Figure III‑19 Representation of char (*arr)[3] Figure III‑20 Representation of char *arr[2] Figure III‑21 Pointer to array and pointer to int Figure IV‑1 Bitwise NOT
Figure IV‑2 Bitwise left shift Figure IV‑3 Bitwise right shift Figure IV‑4 Bitwise AND Figure IV‑5 Bitwise OR Figure IV‑6 Bitwise XOR Figure IV‑7 Integer conversion rank Figure V‑1 continue statement Figure V‑2 break statement Figure V‑3 goto statement Figure VI‑1 Linked list Figure VI‑2 Tree data structure Figure VI‑3 Example of padding bytes inside structures Figure VI‑4 Example of padding bytes in unions Figure VII‑1 Function call Figure VII‑2 Scope overlaps Figure VII‑3 Call-by-value Figure VII‑4 Call-by-reference Figure VIII‑1 Simplified view of compilation steps Figure VIII‑2 Objects Figure VIII‑3 External linkage Figure VIII‑4 Structure student_node Figure IX‑1 UTF-8 encoding for € Figure IX‑2 Setting character encoding for Gnome Figure IX‑3 Setting character encoding for KDE: steps 1 and 2 Figure IX‑4 Setting character encoding for KDE: steps 3 and 4 Figure X‑1 Data transfer between stream and file Figure XI‑1 ISO 8601 Week Figure XI‑2 E and O modifiers used by strftime() Figure XIII‑1 Compilation Phases Figure XIII‑2 Interpreter Figure XIII‑3 Compiler Figure XIII‑4 Virtual Machine
Figure XIII‑5 Gcc steps Figure XIII‑6 Linking Object Files Figure XIII‑7 Building an executable Figure XIII‑8 Using a Static Library Figure XIII‑9 Three Processes Using the Same Functions Figure XIII‑10 Example of Project Organization Figure XIII‑11 Processes Sharing the Same Library Figure XIII‑12 Mapping Shared Libraries into process address spaces Figure XIV‑1 Dependency graph showing relationship between files Figure XIV‑2 Dependency graph showing target f depending on targets f1 and f2 Figure XIV‑3 Recursive make processing from the top target up to the leaves Figure XIV‑4 Dependency tree showing relationship between targets and prerequisites Figure XIV‑5 Compilation steps of C source files Figure XIV‑6 Tree showing dependencies between the executable and the source files Figure XIV‑7 Dependency tree of our project Figure XIV‑8 Directory hierarchy of our project Figure XV‑1 GDB launched within GNU emacs Figure XV‑2 SCCS directory hierarchy Figure XV‑3 Adding two branches from delta 1.2 Figure XV‑4 Derivation Graph of SCCS Versions Figure XV‑5 Derivation Graph of RCS Versions Figure XV‑6 Introducing two branches from revision 2.4
LIST OF TABLES Table II‑1 Meaning of the number 2512 in base 10 Table II‑2 Meaning of the number 7EFF in base 16 Table II‑3 Meaning of the number 7761 in base 8 Table II‑4 Meaning of the number 1101 in base 2 Table II‑5 Printing literals with printf() Table II‑6 Escape Sequences Table II‑7 Integer types Table II‑8 Range of unsigned integers Table II‑9 Range of integers using the signed magnitude representation Table II‑10 Range of integers using the one’s complementation representation Table II‑11 Range of integers using the two’s complementation representation Table II‑12 ASCII coded character set (ANSI X3.4-1986) Table II‑13 Basic character set Table II‑14 Trigraphs Table II‑15 Digraphs Table II‑16 Character types Table II‑17 Short types Table II‑18 Int types Table II‑19 Long types Table II‑20 Long long types Table II‑21 Boundaries of Integer types Table II‑22 Example of values for floating-point numbers Table II‑23 Some minimum limits defined in float.h Table II‑24 Some maximum limits defined in float.h Table II‑25 Examples of compatible types Table II‑26 Conversion to signed integer types Table II‑27 Conversion to unsigned integer types Table II‑28 Conversion to real floating-point types Table III‑1 Declarations mixing arrays and pointers Table III‑2 Examples of implementation of a dynamic three-dimensional array
Table III‑3 Explicit conversions on pointer and arithmetic types Table III‑4 Assignment conversions on pointer and arithmetic types Table IV‑1 Arithmetic operators Table IV‑2 Relational Operators Table IV‑3 Equality Operators Table IV‑4 Logical operators Table IV‑5 Logical AND Table IV‑6 Logical OR Table IV‑7 Bitwise operators Table IV‑8 Bitwise AND Table IV‑9 Bitwise OR Table IV‑10 Bitwise XOR Table IV‑11 Compound assignments Table IV‑12 Operator precedence in decreasing order Table VII‑1 Explicit conversions Table VII‑2 Implicit conversions Table VII‑3 Declaration of functions returning a pointer to a function Table VII‑4 Declaration of pointers to functions Table VIII‑1 C Types Table VIII‑2 Type of definition and linkage of inline functions Table VIII‑3 Scope and storage duration of identifiers Table VIII‑4 Storage-class specifiers, scopes, definitions, declarations and linkage Table IX‑1 Locale categories Table IX‑2 Members of the structure lconv Table IX‑3 UTF-8 encoding Table X‑1 Available modes for fopen() Table X‑2 Specifiers of fscanf() Table X‑3 Expected types of arguments for fscanf() Table X‑4 Examples with fscanf() Table X‑5 Flags for fprintf() Table X‑6 Specifiers for fprintf() Table X‑7 Types of the arguments passed to fprintf()
Table X‑8 fseek(): reference position Table X‑9 Byte and wide-characters I/O functions Table X‑10 Differences between fprintf() and fwprintf() Table X‑11 Modifier l used with %c in fprintf() anf fwprintf() Table X‑12 Modifier l used with %s in fprintf() and fwprintf() Table X‑13 Differences between fscanf() and fwscanf() Table X‑14 Conversion for %c and %lc performed by fscanf() and fwscanf() Table X‑15 Conversion for %s and %ls performed by fscanf() and fwscanf() Table XI‑1 Some data type models Table XI‑2 Conversion specifiers for strftime() Table XII‑1 C11 new open modes Table XIII‑1 Static and shared library comparison Table XIV‑1 Dynamic macros Table XIV‑2 Special targets Table XIV‑3 Make options Table XV‑1 GDB break points Table XV‑2 GDB enable/disable Table XV‑3 GDB subcommands for resuming execution Table XV‑4 GDB print command Table XV‑5 Displaying variables Table XV‑6 Frame-related subcommands Table XV‑7 SCCS commands Table XV‑8 SCCS kewords Table XV‑9 RCS keywords
PREFACE Introduction The C language was born in 1972 during the development of the Unix Operating system at Bell Labs. Basing on the B language (created by Ken Thompson in 1969), Denis Ritchie designed the C language in order to redevelop the Unix operating system that had been written in assembly language so far. The goal of the researchers at BTL (Bell Labs) was to build a portable operating system. In 1978, Brian Kernighan and Denis Ritchie released the renowned book “The C programming language”. The version is known as K&R C. In 1989, the very first standard specification of the C language known as C89 or ANSI C was released by the American National Standards Institute (ANSI). In 1990, the ANSI C became an international standard: the standard is called ISO/CEI 9899:1990 or C90 (also called C89). Therefore, ANSI C and C90 refer to the same C standard. In 1995, some minor features (amendment called ISO/CEI 9899/AMD1:1995) and corrections were added to C90: to distinguish it from other C standards, it is referred to as C90 Amendment 1 or C95 (sometimes called C94). In 2000, a new international C standard, adding a great number of new features and corrections, was published under the label ISO/CEI 9899:1990. It is commonly called C99. At the time this book is written, the current C standard, released in 2011, is ISO/CEI 9899:2011 or C11. The book is mainly focused on C99. As matter of fact, the philosophy of the language has not changed over years; the different standards corrected errors, introduced new features, and refined some concepts without altering the core of the language. Through the book, we will learn the C language as described by C90, the extensions brought by C95 and C99. As far as C11 is concerned, a chapter has been dedicated to it in order to introduce the most handy features that can be used by new comers in the C language.
A standard C program, though the language was closely connected to the UNIX operating system at its inception, can be compiled on any operating system and any computer provided you have the right compiler on your machine. A C program is human-readable program that cannot be executed as-is by a computer. Therefore, a translator is necessary to convert a human-understandable programming language into a machine-executable program. This is the role of a compiler. Logically, a book about C standards should be independent from the operating system, hardware and the compiler. Therefore, compilation should not be broached in the book. However, since the C language is tied to the C compiler, you cannot learn the C programming without understanding the basics of the compilation! For this reason, two chapters dealing with compilation have been added. As we cannot cover all the operating systems and compilers, we only talk about the GNU compiler called gcc on UNIX and Linux operating systems. The rationale is anyone can easily and freely install a virtual machine running a GNU/Linux operating system and directly install in it a great number of free and valuable GNU tools. Furthermore, to help new programmers in C to improve and correct errors in their programs, a chapter describing briefly some tools terminates the book.
Audience Throughout the book, we will suppose that the reader already knows the basics of operating systems. This book is suitable for users who wish to learn the standard C language. It is neither interesting for people who have never used a computer nor for those who have already a good knowledge of the C language searching for a “reference manual”. This book does not aim to explain in details all the features of the C standards because this is not compatible with learning smoothly a programming language. For example, threads, described by C11, are not described in the book because the topic cannot be broached by beginners: an entire book would be necessary for such a subject. The book attempts to give a strong foundation by detailing the core of the C language. The essential themes are thoroughly explained with simplicity, through numerous examples and figures. Trickier aspects of the C standards are examined in several locations with different perspectives to enable the reader to assimilate the concepts. This book explains with simple but progressive examples the essentials of the C language as described by the C standards C90, C95, C99 and C11. This book is the third of a series. Two other books are also available: o The UNIX & Linux Operating Systems: The Tutorial o UNIX & Linux Shell Scripting: The Tutorial
Organization The book is composed of two parts and fifteen chapters. The first part describes the C language, the second one explains how to compile C programs, and introduces some useful programming tools. The first part is independent from the operating system while the second one is intended for users working on UNIX or Linux operating systems.
PART I C PROGRAMMING Chapter 1 Overview Chapter 2 Basic types and Variables Chapter 3 Arrays, Pointers and Strings Chapter 4 Operators Chapter 5 Control Flow Chapter 6 User-defined Types Chapter 7 Functions Chapter 8 C Modules Chapter 9 Internationalization Chapter 10 X Input/Output Chapter 11 Standard C Library Chapter 12 C11
PART II TOOLS Chapter 13 Compiling C Programs Chapter 14 Makefile Chapter 15 Programming Tools
Conventions Throughout the book, the following conventions are used: o Explanations appear in Liberation serif font. o Definitions, syntaxes and synopsis are embedded within a white rectangle: float variable_name = val;
o Examples are placed within a blue rectangle.
$ pwd /users/michael $ cd /etc $ pwd /etc
o Algorithms are enclosed within a salmon-colored rectangle While there is input data For each record read
…. ENDFOR ENDWHILE
o We will use the following typographical conventions to present command syntaxes and examples:
How to work with the book Throughout the book, our examples are compiled on UNIX and Linux operating systems. If you work on another operating system or use a compiler other than the GNU Compiler gcc, please adapt the given compilation commands with your working environment. If you are working on a Microsoft operating system and would like to type the examples as [1] they are shown, you could install a hypervisor and then create a virtual machine running one of the following operating system:
o A GNU/Linux Distribution such as CentOS, OpenSUSE, Fedora, Ubuntu … o A BSD distribution such as NetBSD, FreeBSD, OpenBSD… o A UNIX distribution: Oracle Solaris. Do not hesitate to tinker the given examples to understand how they work. However, please, do not log in to a system as a user with an administrative role to test the examples. In all cases, use a machine dedicated to tests or trainings: do not work on a production machine. Let us view how you have to deal with the examples that we propose in the book. Suppose, the following example is given: $ cat first_program.c #include int main(void) { printf (“This is my first C program\n”); return 0; } $ gcc –o prog first_program.c $ ./prog This is my first C program
To test such an example, first, open a terminal. The last line of your terminal then looks like this: $
Every line of the terminal starts with a text known as a prompt printed by the shell. You should not type it: here, it appears as $. Then, perform the following tasks: o In a text editor, type the following text and save it as first_program.c: #include int main(void) { printf (“This is my first C program\n”); return 0; }
o Compile the source file with gcc by running the following command: $ gcc –o prog first_program.c
o Then execute it by typing ./prog followed by :
$ ./prog
Now let us give some recommendations to set up a programming environment on your computer. If the tools we propose are not suitable for you, feel free to choose others meeting your preferences. Unless specified otherwise, the examples presented throughout the book can be compiled in any operating system. On your computer, you can compile and run the C programs proposed in the book whatever the operating system provided you have an installed a compiler on it beforehand. Remember that in the book, our examples are compiled and executed on a UNIX and Linux operating systems. If your computer is running a UNIX operating system or a UNIX-like operating system (such as Linux, or BSD systems), you can write or modify C programs with a text editor such as vi, vim, emacs, gvim, and gedit. If your computer is running a Microsoft Windows operating, you can write or modify your programs with a text editor such as notepad, notepad++, XEmacs, and gvim. Throughout the book, to show the contents of a text file, we invoke the command cat (remember we will work on Linux and UNIX operating systems) followed by the name of the file. Thus, the following example displays the contents of the file main.c: $ cat main.c #include int main(void) { printf (“This is my first C program\n”); return 0; }
A compiler is a utility designed to translate a text file written in a programming language to a binary file (which can be then executed). Throughout the book, we will work with the GNU compiler gcc to compile our C programs but nothing prevents you from using the compiler of your choice. On UNIX operating systems, and UNIX-like operating systems (Linux, BSD systems), you can freely download and install gcc if not already present on your system. On IBM AIX system, you may use IBM XL C. On Oracle Solaris, you could use Oracle Solaris Studio. On Microsoft Windows operating system, you can download and install MingGW, Cygwin, Pelles C or Microsoft Visual Studio.
If you are working with an Integrated Development Environment (IDE) such as Microsoft Visual Studio® or Oracle Solaris Studio®, the text editor, the compiler and programming tools such as a debugger are already integrated within the software.
About the author Graduated from a French engineer school, specialized in systems and networks, the author worked as IT consultant for several leading international companies. Starting his career by developing software on UNIX® systems and Microsoft® Operating systems, before becoming partner with Sun Microsystems for more than ten years, he worked as a system architect in charge designing robust architectures for customers in large environments, writing specific tools on demand for the customers, training users…
FEEDBACK Any comments, questions or suggestions for improving the book are welcome. Please send them to [email protected] or [email protected].
PART I C PROGRAMMING
CHAPTER I OVERVIEW I.1 Introduction This chapter gives you a glance at the C programming; the objective being to penetrate the C world smoothly, easing the learning of the next chapters. After learning to write very simple programs, we will take our microscope to go through C programming in details in the subsequent chapters.
I.2 The very first step According to the complexity of the C program, you are intended to develop one or more text files could compose it. They can be read and modified by any text editor such as vi, emacs, notepad, Notepad++, or gedit. A file that contains C code (composed of C instructions) is known as a source file (source code). Though a C program can be composed of several files, we will start working with a single source file. Let us write a very simple program (called first_program.c) that just outputs to the screen the sentence “This is my first C program”. $ cat first_program.c #include int main(void) { printf (“This is my first C program\n”); return 0; }
Though it is quite simple, there are many things to say about this program. First, before explaining each line, we are going to compile it. What does it mean? Compiling a C program means translating a human-readable program to a computer-executable file. Thus, your small program stored in the file first_program.c cannot be executed as it is by your computer. Since your computer does not “speak” the C language, you have to use a particular tool, known as a compiler, that not only can understand the C language, translates it into a language understandable by the computer (machine language) but also writes it into a specific format that can be managed by the operating system. A compiler is a complex tool that actually is a suite of utilities performing many tasks ranging from the C preprocessing to the output of the binary file. The compilation steps will be fully described in the second part of the book. For now, we will simply call compiler the utility that produces the system-executable binary file.
Let us use the GNU compiler gcc to generate the binary file that we then execute: $ gcc first_program.c $ ./a.out This is my first C program
Above, we invoked the gcc utility with no option, which generated a binary file with the default name a.out. To give a specific name to the output file, just specify the –o option as shown below: $ gcc -o prog1 first_program.c $ ./prog1 This is my first C program
Explanations: o We invoked the gcc utility with the –o option to specify the name of output binary file. If you omit this option, gcc will spawn a binary file with the name a.out. o The last argument of the first command is the name of the file holding the C code you have written. o The second command (i.e. ./prog1) executes the binary file. You may encounter several issues when trying to compile your program. The first one is the compiler gcc is not installed at all in your system. In this case, just install it, and go on… The second one is the gcc tool is installed in your system but is not in a directory listed in the PATH environment variable: $ gcc -o prog1 first_program.c /usr/bin/ksh: gcc: not found [No such file or directory] $ which gcc no gcc in /usr/bin /usr/sbin $ PATH=$PATH:/opt/freeware/bin $ which gcc /opt/freeware/bin/gcc $ gcc -o prog1 first_program.c
Explanations: o First command: we invoked gcc but it failed o Second command: we invoked the which command that confirmed the gcc command was not in the PATH variable.
o Third command: we added to the environment variable PATH the directory in which the gcc command can be found. In our example, the gcc tool was installed in /opt/freeware/bin. o Fourth command: we invoked again the which command that showed the directory in which gcc was located. o Fifth command: we compiled successfully our C program. Another issue you could meet is a typo in you C program: $ gcc -o prog1 first_program.c first_program.c: In function ‘main’: first_program.c:5:1: error: expected ‘;’ before ‘}’ token
Don’t be afraid of that, this will often happen in your long lifetime of C programmer; fortunately compilers will tell you where the problem is and give you enough details to correct it. In our example, we forgot a semicolon as shown below: $ cat first_program.c #include int main(void) { printf (“This is my first C program\n”) return 0; }
So far, we have learned to generate, from our C program, a binary file that can be executed by the computer. Now, let’s go back to our C code: $ cat first_program.c #include int main(void) { printf (“This is my first C program\n”); return 0; }
First, you can notice our program name has the .c extension. This is not compulsory but it is highly recommended to use the .c extension for your C source files. You will understand why soon. The .c extension is an indicator for us (and everyone reading our program) telling: “this is a text file, holding a human-readable program written in C language”. First, a C code is made of set of actions, known as statements, telling the computer what to do. In our C code, we can see two main components: o #include .
o The main() function and its code. The #include statement is not actually a C statement but a preprocessor directive. For now, we can consider the preprocessor being part of the compiler itself. A preprocessor directive is just a macro (an action) meant for the compiler. Here, the directive #include tells the compiler to copy the contents of the file stdio.h in the place where the directive is found before actually compiling the source file. All happens as if the file stdio.h was actually present in the source file. Later, we fully explain why we do that. For now, you just have to know that the stdio.h file contains information about the I/O routine printf() allowing us to display our text. Files included in that way are known as header files: their names hold the .h extension. Don’t worry, this is not relevant yet…We are just learning to make our first step. The second part of the program is the main() function. First, do you know what a function is? A function is another name for subroutine or routine. If you have never programmed in your life, those words do not help much more. A function is just a named set of statements telling the computer what to do. For example, the function sum2numbers() could be composed of two statements: the first one sums the numbers you give it and the second one displays the result on the screen. Functions are very important because not only will they save you time, but they also ease and relieve dramatically your programs. Instead of writing the same code several times in your program, you could write it only once as a function and then call it each time you need it. In our example, we called the printf() function that is provided by the C library. A library is a set of functions written by you or someone else and that can be incorporated into your programs. Hence you can call printf() each time you need to display text without having to write code for that: it has been already done for you, just call it. You may have noticed that we have appended braces () to the names referring to functions: it is our way to indicate we are talking about a function. Thus, throughout the book, we do not write myfunc but myfunc() if we are referring to a function. Remember that any C program must contain one and only one main() function. Otherwise, your program will not be compiled. The compilation of the following code fails because there is no main() function: $ cat dummy_program_2.c #include void display() { printf (“This is my first C program\n”); } $ gcc dummy_program_2.c
Undefined first referenced symbol in file main /usr/lib/crt1.o ld: fatal: symbol referencing errors. No output written to prog1 collect2: ld returned 1 exit status
The reason why the main() function is requited is the main() function is directly executed [2] when the program is run . This implies that the main() function is the core of your program, or another way to say it, it is the scheduler, or the conductor of your program. You have noticed the main() function is composed of three parts: o int o main(void) o { printf (“This is my first C program\n”); return 0; }
The third part of the main() function is known as a block or a function body. It is composed of statements enclosed between braces ({}). The left brace indicates the beginning the statements and the right brace terminates the set of statements of the function. Take note that the braces can be alone in a line or with statements. Generally, the left brace is on the same line as the function name or alone, while the right brace is alone as in the following example: $ cat first_program.c #include int main(void) { printf (“This is my first C program\n”); return 0; }
In our example, the body of the main() function contains the statement printf (“This is my first C program\n”) displaying the text This is my first C program on the screen. Remember that any C [3] statement must end with a semi-colon . I am sure you have noticed the strange symbol \n at the end of the text to be displayed… It means the newline; that is, after displaying the text, the cursor goes to the next line. Try out the same example without \n… The second part of function indicates three things:
o The identifier (name of the function) that is main o The type of the identifier is a function. This is indicated by the parentheses. o The arguments that can be passed to it, specified between parentheses. We will not talk about them now. When a function accepts no argument, it takes the keyword void as in our example. The first part of the main() function (i.e. int) is the type of the return value of the function. In the C language, a function can return something (i.e. a value) or nothing. When it returns something, you have to specify the type of the value it returns (we will explain C types later). In the main() function, if you do not specify a return value, the default returned value 0 is used (C99 and C11). Remember that the main() function always returns an [4] integer and you cannot change that. The rationale for that is initially, any command under the UNIX system terminated with an integral number known as an exit status notifying the UNIX shell if it had ended successfully or not. Consequently, we have to specify an exit status (ranging from 0 to 255) for our program. This can be accomplished through the return statement as shown below: $ cat first_program_ok.c #include int main(void) { printf (“This is my first C program\n”); return 0; }
The value of 0 as a return value tells the operating system that our program ends with the value 0 (In UNIX, Linux, and BSD systems, 0 means OK, any other value indicates a failure). If we compile it and then run it on a Linux box, we would get something like this: $ gcc -o prog_ok first_program_ok.c $ ./prog_ok This is my first C program $ echo $? 0
We could specify any return value ranging from 0 to 255: $ cat first_program_ko.c #include int main(void) { printf (“This is my first C program\n”); return 10; }
If we compile it and then run it: $ gcc -o prog_ko first_program_ok.c $ ./prog_ko This is my first C program $ echo $? 10
[5] As you have guessed, under the shell , $? shows the exit status of the last command you have executed. Normally the last statement of the main() function should be something like return return_value. Though a default value is automatically set if no return value is found in the main() function, make sure you have specified a return value in the main() function, which ensures you to keep the control of the behavior of your code. If you do not specify a return value [6] in the main() function, the compiler will do it for you: C99 or C11 compilers set it to 0 . It is worth noting that since the C language can be used in other operating systems, a successful exit status may be a value different from 0. For this reason, the macros EXIT_SUCCESS and EXIT_FAILURE have been specified (in the header file stdlib.h) . We will explain later what a macro is. Now consider a macro a symbolic name representing a value. On the UNIX system (and UNIX-like systems), EXIT_SUCCESS is synonym for 0 and EXIT_FAILURE is synonym for 1. Since, those macros are defined in the header file stdlib.h, you have to include it if you wish to use them. Thus, the program can be rewritten as follows: $ cat first_program.c #include #include int main(void) { printf (“This is my first C program\n”); return EXIT_SUCCESS; }
As you have noticed, the body of the main() function is composed of two statements, each ended by a semi-colon. Theoretically, if the C standard allows you to put on the same line several statements, which saves space, it is always better to write readable code and then avoiding appending several statements on the same line. When writing C code, your goal is not to gain space but readability. For example, our first program could have been written in two lines like this:
$ cat first_program.c #include int main(void) {printf(“This is my first C program\n”);return EXIT_SUCCESS;}
In summary, a C program, whatever its complexity has at least one source file (the main source file) that looks like this: #include … … int main(void) { … return retval; }
The main source file is sometimes called main.c marking it holds the main() function but you can give it any name.
I.3 Variables Whatever the complexity of your program, you will need to store data coming from outside the program itself, or from computations, for next utilizations. The best way to store data temporarily, the time the program is running, is to use variables. A variable is just a piece of memory of the computer storing a value. Since a program may have several variables how to distinguish them? Simply by giving them a name. If we give the label X to a variable and fill it with a value, we could use it again just by calling it by its name. A variable could be viewed as a box. In C, before you can work with a variable, you have to specify the size of your box: in some way, you tell the compiler to reserve a piece of memory with a certain size that you are intended to use later. For example, if you think you will work with big numbers (let say 167900765456709876477890), it is wise to ask for a bigger box than if you plan to work with small numbers (let say numbers ranging from 0 to 999). If you request a little box and you put in it more than what can be supported, you will get an unexpected behavior. So, a variable is characterized by its name and its size. The name allows us to set or get a value. The variable’s size ensures us that we will have enough space in the computer’s memory to store our values. Over time, a variable may have different values. This is the reason why a variable has a type indicating what it is supposed to store. The C language has a number of predefined types described by the C standard, but also user-fined types. We first start with some basic types defined by the C standard. As said earlier, before working with a variable, you have to specify its name and its type
as shown below: $ cat prog_var1.c #include int main(void) { int age; return EXIT_SUCCESS; }
Explanation: o At the very first line, we include the header file stdlib.h in order to use the macro EXIT_SUCCESS
[7] o int is the type of the variable age. The type int indicates the set of integral numbers , such as 1, 20, -6, 0, or the number -3, we are going to use. o age is the identifier of the variable (name). A variable name is composed of letters, digits and underscores but cannot start with a digit. In the example prog_var1.c, we tell the compiler that we want to store a number into the variable age. This ensures us that while the program is running we will have a piece of memory in which we can store a number that may vary over time. Next, we can give a value to the variable: $ cat prog_var2.c #include int main(void) { int age; age = 44; return EXIT_SUCCESS; }
Here the equals sign (known an assignment symbol) allows us to set a value to a variable. Above we put the integer value of 44 into the age variable. The example could also have been written like this: $ cat prog_var3.c #include int main(void) { int age = 44; return EXIT_SUCCESS; }
Above, the number 44 on right side of the equals sign is said to be an integer literal or integer constant. The word literal means that even before running the program, the value is known and fixed at compilation time. What if we displayed the contents of the age variable? $ cat prog_var4.c #include #include int main(void) { int age = 44; printf (“age variable=%d\n”, age); return EXIT_SUCCESS; }
Explanations: o The statement int age = 44 reserves memory space called age that will store an integer, and initializes the age variable with the value 44. o The printf statement displays the text age variable= followed by the contents of the age variable. %d is called a specifier telling printf() the type of its argument (here age) so that it could displays it correctly. Let us compile and run it: $ gcc -o prog_var4 prog_var4.c $ ./prog_var4 age variable=44
The printf() function can display several arguments. Its general syntax is given below: printf(fmt, arg1, arg2…)
The very first argument, fmt, is known as a format allowing giving the type of the subsequent arguments. The format appears between double quotes and is composed of text and specifiers. A specifier is a letter preceded by the % symbol, expressing how the corresponding argument should be interpreted. For example, %d is used to display an integer, %s for a text and %f for a floating-point number. The following example displays the contents of the variables X and Y: $ cat prog_var5.c #include
#include int main(void) { int X = 10; int Y = 20; printf (“First argument=%d and Second Argument=%d\n”, X, Y); return EXIT_SUCCESS; } $ gcc -o prog_var5 prog_var5.c $ ./prog_var5 First argument=10 and Second Argument=20
The next example displays two variables of different types: the first one is a negative integer and the second is a floating-point number: $ cat prog_var6.c #include #include int main(void) { int X = -10; float Z = 3.14; printf (“X holds %d\nZ holds %f\n”, X, Z); return EXIT_SUCCESS; } $ gcc -o prog_var6 prog_var6.c $ ./prog_var6 X holds -10 Z holds 3.140000
Here, we can add two notes: o The format of the printf() function contains \n, indicating a newline is inserted after displaying the value of each variable. Then, you could also have written the previous example like this: #include #include int main(void) { int X = -10; float Z = 3.14;
printf (“X holds %d\n”,X); printf (“Z holds %f\n”,Z); return EXIT_SUCCESS; }
o You cannot swap the places of X and Z, and keeping the specifiers as they are. Otherwise, you will obtain an undefined behavior. If you swap the place of the variables, you must also invert the corresponding specifiers as shown below: $ cat prog_var7.c #include #include int main(void) { int X = -10; float Z = 3.14; printf (“Z holds %f\nX holds %d\n”, Z, X); return EXIT_SUCCESS; } $ gcc -o prog_var7 prog_var7.c Z holds 3.140000 X holds -10
The third basic type we would like to introduce is the string. A string is a series of characters forming a logical unit. In C, it can be declared as char *. Consider the following example: $ cat prog_var8.c #include #include int main(void) { char *my_text=“This is my first program”; printf (“%s\n”, my_text); return EXIT_SUCCESS; } $ gcc -o prog_var8 prog_var8.c $ ./prog_var8 This is my first program
Explanations: o The main() function is composed of three statements. The first one declares the variable my_text and the second one displays it. o The statement char *my_text=“This is my first program” tells two things: the variable my_text is supposed to hold a series of characters and it stores the text This is my first program. On the left side of the equals sign, we can see the name of the variable and its type. On the right side of the equals sign lies its value (string literal) that is my first program enclosed between double quotes. Double quotes are not part of the value to assign to the variable; they are only delimiters for the string literal: the first double quote starts the string and the second one terminates the string. Obviously, this infers that if you do not “close” a string by writing only one double quote, you will get a error as in the example below: $ cat prog_var8_err.c #include #include int main(void) { char *my_text=“This is my first program; printf (“%s\n”, my_text); return EXIT_SUCCESS; } $ gcc -o prog_var8_err prog_var8_err.c prog_var8_err.c: In function ‘main’: prog_var8_err.c:4:18: warning: missing terminating ” character prog_var8_err.c:4:4: error: missing terminating ” character prog_var8_err.c:6:4: warning: initialization makes pointer from integer without a cast
So far, we have only assigned a literal to a variable. Fortunately, you can store the contents of a variable into another variable: you assign a variable to another variable as shown below: $ cat prog_var9.c #include #include int main(void) { int X = -3; int Y = X; printf (“X=%d and Y=%d\n”, X, Y); return EXIT_SUCCESS;
} $ gcc -o prog_var9 prog_var9.c $ ./prog_var9 X=-3 and Y=-3
In our example, we placed the contents of the X variable into the variable Y. The equals sign allows setting a value to a variable: the container, known as a lvalue, is on the left side of the equals sign and the contents on the right side. On the right side, you can place a literal, or another variable. Once declared (a single declaration must be done), a variable can be reused as much as you wish as shown below: $ cat prog_var10.c #include #include int main(void) { int X = 0; printf (“X=%d\n”, X); X = 1; printf (“X=%d\n”, X); X = 2; printf (“X=%d\n”, X); return EXIT_SUCCESS; } $ gcc -o prog_var10 prog_var10.c $ ./prog_var10 X=0 X=1 X=2
I.4 Comments Comments within a program are of great importance particularly if it is large or complex. They are used to describe statements, functions, algorithms…They are ignored by compiler. You have two ways to write comments: o The characters /* introduce a comment that ends with the characters */. It can be composed of several lines. Comments enclosed between /* and */ can be used anywhere,
even within statements. o The characters // introduces a comment that ends with the line (when you press the key). It was introduced by C99. Here is a program containing examples of comments: #include #include /* The program shows examples of comments */ int main(void /* Comment: no parameter used */ ) { // this comment held in a single line // This is another single-line commment /* This comment spans over several lines */ int nb = 10; // nb is a variable int x = 7; /* x is also a variable */ x = 10 + /* dummy comment */ 8; return EXIT_SUCCESS; }
I.5 Operations Most of the operations in C language are quite natural and easy to understand but as we will study it later, you must pay attention to the type of variables and literals…. Let us start with basic arithmetic operations: addition, subtraction, division and multiplication. The example below adds two integers: $ cat prog_add1.c #include #include int main(void) { int p = 1 + 2;
printf (“p=%d\n”, p); return EXIT_SUCCESS; } $ gcc -o prog_add1 prog_add1.c $ ./prog_add1 p=3
Explanation: o The statement int p = 1 + 2 yields three different actions. ▪ It declares the variable p as an integer; ▪ It computes the sum of the two integer literals 1 and 2. The parameters (here the
literals 1 and 2) appearing on either side of the + operator are known as operands. An operand is an argument of an operator. ▪ It assigns the output of the operation 1 + 2 to the p variable.
o The printf() function displays the p variable that holds the value 3. Here again, we used the assignment operator (equals sign) to store the output of an operation into a variable. The operation appears on the right side of the operator. Of course, you can sum several operands as below: $ cat prog_add2.c #include #include int main(void) { int p = 1 + 2 + 3; printf (“p=%d\n”, p); return EXIT_SUCCESS; } $ gcc -o prog_add2 prog_add2.c $ ./prog_add2 p=6
The same + operator can operate with integers as well as with floating-point numbers. The following example adds floating-point numbers: $ cat prog_add3.c #include #include int main(void) { float x = 3.14 + 1;
printf (“x=%f\n”, x); return EXIT_SUCCESS; } $ gcc -o prog_add3 prog_add3.c $ ./prog_add3 X=4.14000
The subtraction operation works in the same way (the operator is the minus sign -): $ cat prog_sub.c #include #include int main(void) { int p = 1 - 2; printf (“p=%d\n”, p); return EXIT_SUCCESS; } $ gcc -o prog_sub prog_sub.c $ ./prog_sub p=-1
For the multiplication operation, the operator is the symbol star *. $ cat prog_mult.c #include #include int main(void) { float x = 3.14 * 2; printf (“x=%f\n”, x); } $ gcc -o prog_mult prog_mult.c $ ./prog_mult x=6.280000
We finish by the division operation that uses the slash symbol / as an operator: $ cat prog_div.c #include #include
int main(void) { float x = 2.1/3.2; printf (“x=%f\n”, x); } $ gcc -o prog_div prog_div.c $ ./prog_div x=0.656250
The C operations seem to be obvious, working as you learned in your math courses…but this is not actually the case, seemingly… There remain many things to say about them in the next chapters. Here is a flavor of the strangeness of the C language: $ cat prog_div2.c #include #include int main(void) { float x = 2/3; printf (“x=%f\n”, x); } $ gcc -o prog_div2 prog_div2.c $ ./prog_div2 x=0.000000
No, it is not an error! The output of the operation 2/3, as we coded it, is actually 0! You may have expected something like 0.666667. We will explain why…
I.6 Control flow So far, we have worked with sequential statements: statements are executed in order of appearance. It happens that we want to execute one or more actions if specific conditions are met, or we want some tasks to be accomplish several times until some conditions evaluates to true (or false). With no specific mechanism, your program always runs in the same way, always produces the same output and cannot adapt to input data. Fortunately, the C standard defines several statements that will allow you to yield actions according to the circumstances: they are known as control flow statements. Let us have a look at the if statement. In the chapter, we briefly describe only the following two forms: if (condition) { statement_list; }
if (condition) { statement_list; } else { else_statement_list; }
Where: o condition is an expression. As we describe the C language, we will give more and more details about C expressions. Here, condition is an expression that can evaluate to true or false such as x > 8. o statement_list is a set of statements, each of which terminated with a semicolon. Generally, there is one statement on a line, but you could write several statements on the same line. Statements are separated by one or more newlines (after the semicolon) for clarity. o else_statement_list is a set of statements, each of which terminated with a semicolon. o Blanks and newlines can be placed before and after the left and right braces. They have no effect. o Blanks and newlines can be placed before and after the left and right parentheses. They have no effect. The first form is composed of two parts: if (condition) and { statement_list; }. The first part is composed of the keyword if and a condition between parentheses: its task is to evaluate the expression condition: if it is true, the second part of the statement is executed. The second piece of the if statement is known as a block or body of the if statement: it consists of a set of statements embedded in braces that are executed only if the expression condition is true. The second form is composed of four parts: o if (condition) o { statement_list; } o else o { else_statement_list; } The first two parts are identical to the first form and have the same meaning. The last two parts complete the first form: they mean if condition is not true (represented by the keyword else) the block of else is executed. That is, if condition is true, the first block is executed, otherwise the second one is executed. Now, let us talk a little bit about relational expressions to help us better understand how the if statement works. A relational expression is an expression that compares two values and returns a value (0 for false or 1 for true). Here are some relational expressions:
o A > B: returns 1 (which means true) if A is greater than B. Otherwise, it returns 0 (false). o A < B: returns 1 (true) if A is less than B. Otherwise, it returns 0 (false). o A == B: returns 1 (true) if A is equal to B. Otherwise, it returns 0 (false). Consider the following example: $ cat prog_cflow1.c 1#include 2 3 int main(void) { 4 int num; 5 int rval; 6 7 printf(“Please, enter an integer less than or equal to 9: “); 8 scanf(“%d”, &num); 9 10 if (num > 9) { 11 printf(“Failure, the number is too big\n”); 12 rval = 1; 13 } else { 14 printf(“OK, the number is the requested range\n”); 15 rval = 0; 16 } 17 18 return rval; 19 }
Explanation: o Line 4: the num variable is declared as an integer. It will store a number read from the keyboard. o Line 5: the rval variable is declared as an integer. It will hold the return value of the main() function. o Line 7: the printf() function displays a text prompting the user to enter an integral number smaller than 9. o Line 8: the scanf() function reads the number the user has typed, and stores it into the num variable. The function will be described later. Here, we use it just to get the number that the user has typed. The ampersand (&) before the num variable will be explained when we will talk about pointers. o Line 10: the if…then…else statement is a control flow statement, more specifically a conditional statement. It means if the variable num holds a value greater than 9 (num > 9) then line 11 is executed. Otherwise, line 14 is executed. You have noticed, the statement
[8] has two parts: if and else, and each one having its own block . o Line 11: it displays the message Failure, the number is too big. This is the first statement of the if block. If the condition num > 9 is true, this line and the next one are executed. o Line 12: this is the second statement of the if block. The rval variable is set to 1. The rval variable holds the return value of the main() function. o Line 13: This line tells two things. First, the if block ends with the right curly brace. Secondly, the alternative introduced by the reserved word else starts. o Line 14: this line is the first statement of the else block. It is run only if the condition of the if statement is not met. That is, only if the variable num stores a number smaller than 9. o Line 15: this is the second statement of the else block. The rval variable is set to 0. The rval variable holds the return value of the main() function. o Line 16: end of the else block. o Line 18: the return value of the main() function appears here. o Line 19: the right brace ends the block of the main() function. Now, compile it and run it: $ gcc -o prog_cflow1 prog_cflow1.c $ ./prog_cflow1 Please, enter an integer less than or equal to 9: 10 Failure, the number is too big $ echo $? 1
Above, we typed the number 10: the number is out of range. Let us run the program again, but this time we type the integer 8: $ ./prog_cflow1 Please, enter an integer less than or equal to 9: 8 OK, the number is the requested range $ echo $? 0
Now, suppose we wanted the user to type a positive integral number less than or equal to 9 (in other word, a decimal digit). In this case, our if condition is composed of two conditions: num >= 0 and num = 0 && num = 0 is true and the sub-condition num = 0 && num =0 && num
}
??-
~ Table II‑14 Trigraphs
C94 introduced sequences of two characters, known as digraphs, more practical than trigraphs, replaced by one character by the compiler. Digraph
Replacement character
]
}
%:
#
%:%:
## Table II‑15 Digraphs
To break the substitutions of trigraphs (to prevent from having three successive characters forming a trigraph), a backslash must be used. The following example displays some trigraphs. $ cat trigraph1.c #include ??=include int main(void) ??< char trigraph; trigraph=’??=’; printf(“?\?= replaced by %c\n”, trigraph); trigraph=’??(‘; printf(“?\?( replaced by %c\n”, trigraph); trigraph=’??!’; printf(“?\?! replaced by %c\n”, trigraph); trigraph=’??>’; printf(“?\?> replaced by %c\n”, trigraph); trigraph=’??-‘; printf(“?\?- replaced by %c\n”, trigraph);
return EXIT_SUCCESS; ??> $ gcc -o trigraph1 -std=c99 -pedantic trigraph1.c $ ./trigraph1 ??= replaced by # ??( replaced by [ ??! replaced by | ??> replaced by } ??- replaced by ~
The backslash character \ preceding a character removes its special meaning. If a character has no special meaning, the backslash is ignored. For example, to print the backslash character \, we precede it with another backslash: $ cat trigraph2.c #include #include int main(void) { printf(“\?\?/ replaced by %c\n”, ‘\??/’); return EXIT_SUCCESS; } $ gcc -o trigraph2 -std=c99 -pedantic trigraph2.c $ ./trigraph2 ??/ replaced by \
Normally, you will not have to use trigraphs and digraphs unless your keyboard cannot represent those characters. II.6.1.3 Padding bits Data is stored in one or more bytes. A byte is composed of specific number of bits. Most of the time, all bits of each byte are used to represent data but it may happen that not all bits are used, some of them actually may be ignored as if they did not exist: they are called padding bits. Padding bits do not participate to the value (Figure II‑6). For example, a 32bit type (i.e. size of 32 bits) may be represented by 31 bits (width of 31 bits) with one padding bit: only 31 bits are used for encoding values.
Figure II‑6 Padding bits
In C, operations deal with values. That is, padding bits are invisible to programmers and normally you do not have to worry about them if your programs conform to the C standard. II.6.1.4 Size, width, and precision The precision of an integer is the number of digits used to represent its magnitude excluding padding bits. The width of an integer is the number of digits used to represent its magnitude and its sign, excluding padding bits: width=precision+1. The size of an integer is the number of digits used to represent its magnitude and its sign, including padding bits: size=width + padding bits. The size of a value or a type is yielded by the operator sizeof. II.6.1.5 Character types Three types of integers, known as character types, represented by at least 8 bits are defined by the C standard:
o char: it can be signed or unsigned depending on the implementation. This is known as plain char. o signed char: the minimum range is [-127,127]. o unsigned char: the minimum range is [0,255]. Take note that even though the size of a char is commonly 8 bits (i.e. 1 octet), it does not mean in some computers it could not be 9, 12, 16 bits… The C standard says only that its bit-length must be at least 8 bits. We can infer that to write a C program that would work on every machine (i.e. a portable program), we should ensure that our values of type char be in the range [-127, 127] if they are signed or [0-255] if unsigned. Likewise, since a char type can be signed or unsigned depending on the compiler, a portable program should use values in the range [0-127]: this range is common to signed char and unsigned char. In the following example, we display the values of an unsigned char variable called i and a char variable called j. $ cat char1.c 1 #include 2 #include 3 4 int main(void) { 5 unsigned char i = 255; 6 char j = 255; 7 8 printf (“i=%d j=%d\n”, i,j); 9 return EXIT_SUCCESS; 10 }
What do think such a program will produce? The answer is it depends. Let us compile it with gcc on our computer: $ gcc -o char1 char1.c $ ./char1 i=255 j=-1
As you can see it, the j variable (char type) appears as -1. This means that an overflow happened indicating that on our computer, with gcc, the char type is considered a signed type. In other words, on our computer, the char type is actually signed char. On another computer, or with another compiler we may have a different result. Compilers have options giving you more warnings while compiling: $ gcc -o char1 -std=c99 -pedantic char1.c char.c: In function ‘main’: char.c:6:3: warning: overflow in implicit constant conversion
In the example above, the option -std=c99 -pedantic tells the compiler to be compliant with the C99 standard and provides warnings if a program is not compliant: in our example, line 6 must be reviewed. Compilers have an option to treat a char type as unsigned char: $ gcc -o char1 -std=c99 -pedantic -funsigned-char char1.c $ ./char1 i=255 j=255
Or as signed char: $ gcc -o char1 -std=c99 -pedantic -fsigned-char char1.c char.c: In function ‘main’: char.c:6:3: warning: overflow in implicit constant conversion
You can force the compiler to translate char as signed or unsigned char only if you have fully understood how all char variables are used in the program. However, it is better use the right types without using such compiler options. This means you have to know the range of values that can be taken by your variables in order to use the right type. We said character types are “small” integers fitting in one byte but, as matter, they are used for variables holdings characters not for working with small integer numbers. The term character, within the book, has two meanings depending on the context in which it is used. In C, a character is an object of type character (unsigned char, char or signed char) fitting in one byte. For a given human language (Japanese, German, French…), characters are symbols forming words, and sentences: for example, the letter z is a character. Characters of languages cannot be represented any character sets. For example, ASCII describes characters used in English and their corresponding 7-bit code (integer number). The following example shows the mapping between a code value and a character (Unicode encoding UTF-8): $ cat char2.c #include #include int main(void) { char c1=’&’; char c2=38; printf (“c1: code is %d, character is %c\n”, c1, c1); printf (“c2: code is %d, character is %c\n”, c2, c2); return EXIT_SUCCESS;
} $ gcc -o char2 -std=c99 -pedantic char2.c $ ./char2 c1: code is 38, character is & c2: code is 38, character is &
Table II‑16 Character types
Character types always fit a byte whose size depends on the implementation. A byte is the smallest amount of computer’s memory that can be addressed. For this reason, the C language defines it as a unit of memory for storing data. The sizes of other types are multiples of byte. The sizeof operator returns the size of a type or a given variable. In the C language, sizeof(char) always returns 1 (bit-length of a byte) as shown below: $ cat char3.c #include #include int main(void) { printf (“Size of char %d.\n”, sizeof(char)); return EXIT_SUCCESS; } $ gcc -o char3 -std=c99 -pedantic char3.c $ ./char3 Size of char is 1.
In a given human language, such as French, a certain numbers of symbols (characters) are used. ASCII is not enough for representing all characters used by all languages. For example, the character ñ used in Spanish or œ used in French is not present in ASCII but within other character sets. More than seven bits are required for representing characters of most of languages. Hence, a character of a given language may actually fit in more than one byte (multibyte characters) and then may not be storable in type char.
In C, the type unsigned char is different from other types in that its encoding is a pure binary representation as stated by C99. Pure representation means there is no “hidden” bits: all bits are part of the number. This is the single type having this property. For example, in some computers, an integer composed of n bits may have some bits unused (padding bits). In such computers, the value is computed silently ignoring the padding bits. Programmers do not have to be aware of that. For an unsigned char, this is not permitted: all bits are part of the number. This feature is interesting, thanks to the type unsigned char, programmers can have access all bits of an object. II.6.1.6 Short types The following integer types represented by at least 16 bits can be used: o short (or short int): same as signed short. o signed short (or signed [−32767,+32767]).
short int):
the smallest allowed range is [−215-1, 215-1] (i.e.
o unsigned short (or unsigned short int): the smallest allowed range is [0, 216-1] (i.e. [0,65535]).
Table II‑17 Short types
In the following example, we show the biggest values that can be held by a variable of type signed and unsigned short in our computer: $ cat short1.c #include #include #include
int main(void) { short x = pow(2,15)-1; unsigned short y = pow(2,16)-1; printf (“max signed short value=%d\nmax unsigned short value=%u\n”, x, y); return EXIT_SUCCESS; } $ gcc -o short1 -std=c99 -pedantic short1.c $ ./short1 max signed short value=32767 max unsigned short value=65535
The following example is the same as the previous one except that the values we set are too big (hence the error message overflow in implicit constant conversion): $ cat short2.c 1 #include 2 #include 3 4 int main(void) { 5 short x = pow(2,15); 6 unsigned short y = pow(2,16); 7 8 printf (“max signed short value=%d\nmax unsigned short value=%u\n”, x, y); 9 return EXIT_SUCCESS; 10 } $ gcc -o short2 -std=c99 -pedantic short2.c short2.c: In function ‘main’: short2.c:5:3: warning: overflow in implicit constant conversion short2.c:6:3: warning: overflow in implicit constant conversion
In our example, we have introduced something new: the pow() math function. In the C language, there is no power operator, to compute x to the power of y (xy), programmers call the function pow(x,y). The function is declared in the header file math.h that is included by the directive #include . In our example, pow(2,15) means 215. II.6.1.7 int types The following integer types represented by at least 16 bits and having a bit-length greater than or equal to the bit-length of the short type: o int: same as signed int. o signed int: the minimum range is [−215-1, 215-1] (i.e. [−32767,+32767]).
o unsigned int: the minimum range is [0, 216-1] (i.e. [0,65535]). Usually, the int type is represented by 32 bits while the short type fits in 16 bits. However, never assume the bit-length of the int type is 32 bits in all computers.
Table II‑18 Int types
In the following example, we display the bit-length (expressed in bytes) of the i variable of type int: $ cat int1.c #include #include int main(void) { int i; printf (“size of i is %d\n”, sizeof i); return EXIT_SUCCESS; } $ gcc -o int1 -std=c99 -pedantic int1.c $ ./int1 size of i is 4
On our machine, the type int is represented by 4 bytes (32 bits). This number is given by the sizeof operator. It is very useful since it returns the size of a type as well as the size of an object. The following example displays the size of char, short and int types: $ cat int2.c #include #include int main(void) { printf(“char=%d byte(s)\n”, sizeof(char));
printf(“short=%d bytes\n”, sizeof(short)); printf(“int=%d bytes\n”, sizeof(int)); return EXIT_SUCCESS; } $ gcc -o int2 -std=c99 -pedantic int2.c $ ./int2 char=1 byte(s) short=2 bytes int=4 bytes
The sizeof operator can be called with a type name or a variable name. If the argument is a variable, you can omit the parentheses but if the argument is a type name, you must use the parentheses around it.
The sizeof operator returns a number of bytes (that is not necessarily 8 bits). In C, a byte means sizeof(char) that is the smallest amount of memory that the computer can access: the macro CHAR_BIT, defined in the limits.h header file, stores the length of a byte.
The following example shows the biggest values of an int and an unsigned int on our computer: $ cat int3.c #include #include #include int main(void) { int x = pow(2,31)-1; int y = x + 1; unsigned int z = pow(2,32)-1; printf (“x=%d\ny=%d\nz=%u\n”, x, y, z); } $ gcc -o int3 -std=c99 -pedantic int3.c $ ./int3 x=2147483647 y=-2147483648 z=4294967295
Explanations: o The statement int x = pow(2,31)-1 declares the x variable as an int and initializes it to 231-1.
o The statement int y = x + 1 declares the y variable as type int and sets its value to the contents of the x variable plus 1. That is, y holds the value 231. o Since the size of an int is 32 bits on our machine, the value we gave to the y variable was definitely too big, which should have risen an abnormal behavior. This was shown by the printf() function that displayed the contents of the variable x, then y. We can see the x variable was correctly printed while y was not (because of the overflow). o We can also see that the z variable (unsigned int type) was correctly printed. It held the biggest value for an unsigned int type on our computer. Notice that we used the %u specifier in printf() to display it. II.6.1.8 Long types The following integer types are represented by at least 32 bits and have a bit-length greater than or equal to the bit-length of type int: o long: same as long int. o long int: same as signed long int. o signed long int: the minimum range is [−231-1, 231-1] (i.e. [−2147483647, 2147483647]) o unsigned long int: the minimum range is [0, 232-1] (i.e. [0, 4294967295]).
Table II‑19 Long types
The following example displays the size of the type long: $ cat long1.c #include #include
int main(void) { printf(“long=%d bytes\n”, sizeof(long)); return EXIT_SUCCESS; } $ gcc -o long1 -std=c99 -pedantic long1.c $ ./long1 long=4 bytes
The following example shows the biggest values of long and unsigned long types on our computer (held in the variables x and z): $ cat long2.c 1 #include 2 #include 3 4 int main(void) { 5 long x = pow(2,31)-1; 6 long y = pow(2,31); 7 unsigned long z = pow(2,32) – 1; 8 9 printf (“x=%ld\ny=%ld\nz=%lu\n”, x, y, z); 10 return EXIT_SUCCESS; 11 } $ gcc -o long2 -std=c99 -pedantic long2.c long2.c: In function ‘main’: long2.c:6:3: warning: overflow in implicit constant conversion $ ./long2 x=2147483647 y=2147483647 z=4294967295
Above, the x and z variables (holding the biggest values respectively for types long and unsigned long on our computer) were correctly printed while the y variable was not because of an overflow error. II.6.1.9 Long long types The long long types were introduced in C99. The following integer types represented by at least 64 bits and having a bit-length greater than or equal to the bit-length of the type long [17] can be used : o long long: same as signed long long int o long long int: same as signed long long int
o signed long long: same as signed long long int o signed long long int: the minimum range is [−263-1, 263-1] (i.e. [- 9223372036854775807, 9223372036854775807]) o unsigned long: same as unsigned long int o unsigned long int: the minimum range is [0, 264-1] (i.e. [0,18446744073709551615])
Table II‑20 Long long types
The following example displays the size of a long long type: $ cat llong1.c #include #include int main(void) { printf(“long long=%d bytes\n”, sizeof(long long)); return EXIT_SUCCESS; } $ gcc -o llong1 -std=c99 -pedantic llong1.c $ ./llong1 long long=8 bytes
The following example shows the biggest values of long long and unsigned long long types on our computer (held in the x and z variables): $ cat llong2.c 1 #include 2 #include
3 #include 4 5 int main(void) { 6 long long x = pow(2,63)-1; 7 long long y = pow(2,63); 8 unsigned long long z = pow(2,64)-1; 9 10 printf (“x=%lld\ny=%lld\nz=%llu\n”, x, y, z); 11 return EXIT_SUCCESS; 12 } $ gcc -o llong2 -std=c99 -pedantic llong2.c llong2.c: In function ‘main’: llong2.c:7:5: warning: overflow in implicit constant conversion $ ./llong2 x=9223372036854775807 y=9223372036854775807 z=18446744073709551615
The y variable did not contain the expected value because of an overflow error. II.6.1.10 Boolean type The Boolean type _Bool, introduced in C99, is an integer type that can store only two values: 0 or 1; 0 meaning false 1 meaning true. In C, the value of 0 is considered false, while any other value is treated as true. Thus in C, the values 2 and -10 are both considered true as shown below: $ cat bool1.c #include #include int main(void) { if ( 2 ) { printf (“2 is TRUE\n”) ; } else { printf (“2 is FALSE\n”) ; } if ( 0 ) { printf (“0 is TRUE\n”) ; } else { printf (“0 is FALSE\n”) ;
} if ( -5 ) { printf (“-5 is TRUE\n”) ; } else { printf (“-5 is FALSE\n”) ; } return EXIT_SUCCESS; } $ gcc -o bool1 -std=c99 -pedantic bool1.c $ ./bool1 2 is TRUE 0 is FALSE -5 is TRUE
Here is an example using two Boolean variables b1 and b2 showing the value of 0 is synonym for false while 1 is synonym for true. $ cat bool2.c #include #include int main(void) { _Bool b1 = 0; _Bool b2 = 1; if ( b1 ) { printf (“b1 is TRUE\n”) ; } else { printf (“b1 is FALSE\n”) ; } if ( b2 ) { printf (“b2 is TRUE\n”) ; } else { printf (“b2 is FALSE\n”) ; } return EXIT_SUCCESS; } $ gcc -o bool2 -std=c99 -pedantic bool2.c $ ./bool2
b1 is FALSE b2 is TRUE
If you attempt to assign a number different from 0 to a Boolean variable, it will take the value 1: $ cat bool3.c #include #include int main(void) { _Bool b1 = 0; _Bool b2 = 12; _Bool b3 = -7; printf (“b1=%d\n”, b1) ; printf (“b2=%d\n”, b2) ; printf (“b3=%d\n”, b3) ; return EXIT_SUCCESS; } $ gcc -o bool3 -std=c99 -pedantic bool3.c $ ./bool3 b1=0 b2=1 b3=1
The C language defines a macro called bool, in stdbool.h, that expands to _Bool. Thus, our previous example can also be written like this: $ cat bool4.c #include #include #include int main(void) { bool b1 = 0; bool b2 = 12; bool b3 = -7; printf (“b1=%d\n”, b1) ; printf (“b2=%d\n”, b2) ; printf (“b3=%d\n”, b3) ; return EXIT_SUCCESS; }
$ gcc -o bool4 -std=c99 -pedantic bool4.c $ ./bool4 b1=0 b2=1 b3=1
Though not often used, you can work with the macros true (expanded to 1) and false (expanded to 0) defined in the header file stdbool.h: $ cat bool5.c #include #include #include int main(void) { bool b1 = true; bool b2 = false; printf (“b1=%d\n”, b1) ; printf (“b2=%d\n”, b2) ; if ( b1 == true ) { printf (“b1 is TRUE\n”) ; } else { printf (“b1 is FALSE\n”) ; } if ( b2 == true) { printf (“b2 is TRUE\n”) ; } else { printf (“b2 is FALSE\n”) ; } return EXIT_SUCCESS; } $ gcc -o bool5 -std=c99 -pedantic bool5.c $ ./bool5 b1=1 b2=0 b1 is TRUE b2 is FALSE
In the following example, we initialize the Boolean variables with expressions (see Chapter IV): $ cat bool6.c #include #include #include int main(void) { int x = 5; bool b1 = x > 0; /* true */ bool b2 = x < 10; /* true */ printf (“b1=%d\n”, b1) ; printf (“b2=%d\n”, b2) ; return EXIT_SUCCESS; } $ gcc -o bool6 -std=c99 -pedantic bool6.c $ ./bool6 b1=1 b2=1
Though a Boolean type is an integer type, when you assign a value different from 0 to a variable of type Boolean, it will take the value of 1. For example: $ cat bool7.c #include #include #include int main(void) { bool b = 0.2; int i = 0.2; printf (“b=%d\n”, b) ; printf (“i=%d\n”, i) ; return EXIT_SUCCESS; } $ gcc -o bool7 -std=c99 -pedantic bool7.c $ ./bool7 b=1
i=0
II.6.1.11 Limits So far, we have talked about the different integer types defined by the C standard. Through examples, we displayed the maximum values that can be in held by variables depending on integer types but we did not explain yet where the boundaries are defined. [18]
The boundaries of integers (see Table II‑21) are defined in the header file limits.h . Limits are not held in variables but are expressed in form of macros. For now, you can view a macro as an alias. For example, the directive #define CHAR_BIT 8 makes the symbolic name CHAR_BIT (macro) as an alias for the number 8.
Table II‑21 Boundaries of Integer types
The following C program displays the limits of integer types defined by your systems. $ cat limits_int.c #include #include #include int main(void) { printf (“CHAR_BIT=%d\n”, CHAR_BIT); printf (“====CHAR====\n”); printf (“SCHAR_MIN=%d (miminum value for signed char)\n”, SCHAR_MIN);
printf (“SCHAR_MAX=%d (maximum value for signed char)\n”, SCHAR_MAX); printf (“UCHAR_MAX=%u (maximum value for unsigned char)\n”, UCHAR_MAX); printf (“CHAR_MIN=%d (miminum value for char)\n”, CHAR_MIN); printf (“CHAR_MAX=%d (maximum value for char)\n”, CHAR_MAX); printf (“\n====SHORT====\n”); printf (“SHRT_MIN=%d (miminum value for signed short)\n”, SHRT_MIN); printf (“SHRT_MAX=%d (maximum value for signed short)\n”, SHRT_MAX); printf (“USHRT_MAX=%u (maximum value for unsigned short)\n”, USHRT_MAX); printf (“\n====INT====\n”); printf (“INT_MIN=%d (miminum value for int)\n”, INT_MIN); printf (“INT_MAX=%d (maximum value for int)\n”, INT_MAX); printf (“UINT_MAX=%u (maximum value for unsigned int)\n”, UINT_MAX); printf (“\n====LONG====\n”); printf (“LONG_MIN=%ld (miminum value for long)\n”, LONG_MIN); printf (“LONG_MAX=%ld (maximum value for long)\n”, LONG_MAX); printf (“ULONG_MAX=%lu (maximum value for unsigned long)\n”, ULONG_MAX); printf (“\n====LONG LONG====\n”); printf (“LLONG_MIN=%lld (miminum value for long long)\n”, LLONG_MIN); printf (“LLONG_MAX=%lld (maximum value for long long)\n”, LLONG_MAX); printf (“ULLONG_MAX=%llu (maximum value for unsigned long long)\n”, ULLONG_MAX); return EXIT_SUCCESS; }
Of course, you have noticed in the second line, we included the limits.h header files since it contains the limits. If we run it after compiling it, we obtain this in our computer: $ gcc -o limits_val -std=c99 -pedantic limits_int.c $ ./limits_val CHAR_BIT=8 ====CHAR==== SCHAR_MIN=-128 (miminum value for signed char) SCHAR_MAX=127 (maximum value for signed char) UCHAR_MAX=255 (maximum value for unsigned char) CHAR_MIN=-128 (miminum value for char) CHAR_MAX=127 (maximum value for char) ====SHORT==== SHRT_MIN=-32768 (miminum value for signed short) SHRT_MAX=32767 (maximum value for signed short)
USHRT_MAX=65535 (maximum value for unsigned short) ====INT==== INT_MIN=-2147483648 (miminum value for int) INT_MAX=2147483647 (maximum value for int) UINT_MAX=4294967295 (maximum value for unsigned int) ====LONG==== LONG_MIN=-2147483648 (miminum value for long) LONG_MAX=2147483647 (maximum value for long) ULONG_MAX=4294967295 (maximum value for unsigned long) ====LONG LONG==== LLONG_MIN=-9223372036854775808 (miminum value for long long) LLONG_MAX=9223372036854775807 (maximum value for long long) ULLONG_MAX=18446744073709551615 (maximum value for unsigned long long)
II.6.1.12 Overflow II.6.1.12.1 Unsigned integers
Whatever the operations involving unsigned integers, there is no overflow. This implies that if you assign a variable of an unsigned integer type of a value v (that may result from an expression) less than the minimum value or greater than the maximum value, it will still have a defined value. The actual value will be v modulo (umax+1), where umax is the maximum value of the unsigned integer type. Thus, the value of the variable always ranges from 0 through umax. Let us consider a variable of type unsigned int. Its maximum value is UINT_MAX. If you attempt to assign it the value UINT_MAX + 1, it will store the value (UNIT_MAX + 1) modulo (UINT_MAX+1) that yields 0. If you attempt to assign the value UINT_MAX + 2, it will store the value (UNIT_MAX + 2) modulo (UINT_MAX+1) that yields 1. If you attempt to assign the value UINT_MAX + 3, it will store the value (UNIT_MAX + 3) modulo (UINT_MAX+1) that yields 2… $ cat unsigned_overflow.c #include #include #include int main(void) { unsigned int max1 = UINT_MAX + 1; unsigned int max2 = UINT_MAX + 2; unsigned int max3 = UINT_MAX + 3;
printf(“max1=%d max2=%d max3=%d\n”, max1, max2, max3); return EXIT_SUCCESS; } $ gcc -o unsigned_overflow -std=c99 -pedantic unsigned_overflow.c $ ./unsigned_overflow max1=0 max2=1 max3=2
Let us give a quick explanation on the mathematic operator modulo. In C, it is denoted by the symbol %. A division of two integers n/q can be written n = p * q + r where p is an integer number and r is the remainder such that |r| < |n|. The result of the modulo operation n mod q (in C, it is written n % q) is the remainder r: n % q=r. For example, as 6 = 2 * 4 + 2 then 6 % 4 = 2. Of course, if n < q, n % q = n and if n = q, then n % q = 0.
II.6.1.12.2 Signed integers
When a variable of a signed integer type is assigned a value less than the minimum value or greater than the maximum, its value is undefined and an overflow occurs.
II.6.2 Real floating types In a computer, any value is stored in a fixed of number of bits according its types. Real numbers as mathematics define them cannot be stored in computer’s memory because a real number may have an infinite number of digits (for example π). Instead, in computing, we work with floating-point numbers. The adjective floating means the decimal point can have different positions (not fixed): the number 3.14 can also be written as 314 * 10-2 or 31.4*10-1 (the decimal point takes different positions). A floating-point number is composed of three parts: the sign, the significand (sometimes referred to as a mantissa) and the exponential part, that may be omitted, composed of the base representing a numeral system and an exponent: significand x basee
In decimal base, base is 10. In binary system, base is 2. In hexadecimal system, base is 16…. Consider the decimal number -31.4*10-1: o The sign is negative
o The significand is 31.4. o The exponential part is 10-1. The C language has two kinds of floating types: real floating types and complex (since C99). Real floating types are finite real numbers. The C language defines three kinds of real floating types: float, double and long double. The values represented by the type float are a subset of the set of values represented by the type double. The values represented by the type double are a subset of the set of values represented by the type long double. The C standard does not enforce the way to represent floating-point numbers. Thus, the number of bytes representing the significand and the exponent is defined by the implementation. The header file float.h contains a list of macros representing the radix (base of the numeral system in which floating-point numbers are represented), the number of decimal digits for the significand (known as the precision), the minimum and maximum values for the exponent… Each implementation defines its own values that are equal or greater than the minimum values and equal or less than the maximum values specified by the C standard. II.6.2.1 float In C, a variable of type float is declared like this: float variable_name;
Declaring a variable allows labeling a variable, specifying the type of data it contains and its size. If you also want to initialize a variable at the same time as its declaration (known as a definition): float variable_name = val;
o The semicolon (;) at the end of the statement is mandatory. o The keyword float is at the beginning of the statement. It cannot be used for naming a variable or a function. It is recognized as a special word denoting a type. o Spaces around the equals sign and the semicolon, are allowed o One or more spaces after the keyword float are required. o Variable_name is the name of the variable used to identify it. o val can be a variable, a floating-point constant, or an integer constant. More generally, it is an arithmetic expression (see Chapter IV). To display a double or a float with printf(), you have three ways: o by using %f: the number is displayed in the format [-]i.f, where i is the integral part and f
the fractional part of the number. o by using the specifier %e, %g, %E or %G: %e displays a floating-point number in scientific decimal notation (the base appears in lowercase) while %g is either %e or %f depending on the value and the precision of the number. The specifiers %E and %G are equivalent to %Le and %Lg respectively: they just display the base in uppercase. o by using the specifier %a or %A that displays a floating-point number in scientific hexadecimal notation. The following example displays the variable x initialized with the floating constant 3.14159: $ cat float1.c #include #include int main(void) { float x = 3.14159; printf(“x=%f\n”, x); return EXIT_SUCCESS; } $ gcc -o float1 -std=c99 -pedantic float1.c $ ./float1 x=3.141592
Explanations: o The statement float x = 3.14159 declares the x variable as type float and initialized it to the value 3.14159. o The statement printf(“x=%f\n”, x) displays the x variable. There are two ways to display and initialize a floating-point number: by using or not an exponent part. The following example initializes the x variable by using the exponential notation: $ cat float2.c #include #include int main(void) { float x = 1.52e-3; printf(“x (%%f)=%f\n”, x); printf(“x (%%e)=%e\n”, x);
printf(“x (%%g)=%g\n”, x); return EXIT_SUCCESS; } $ gcc -o float2 -std=c99 -pedantic float2.c x (%f)=0.001520 x (%e)=1.520000e-03 x (%g)=0.00152
Explanations: o The statement float x = 1.52e-3 sets the x variable of type float to a floating-point literal by using the exponential notation (1.52 10-3). o The first printf() function displays x with no exponent part (%f specifier). o The second printf() function displays x with an exponent part (%e specifier). o The third printf() function displays the variable x. The %g specifier refers to the most appropriate format (either %f or %e). o To display the % symbol, you have to precede it with another %. Otherwise, it is considered a specifier. Hence, %%f appears as %f. In C, a floating-point number that is too big to be represented is considered an infinite number denoted by a special value called infinity (+infinity or –infinity) as shown below: $ cat float3.c #include #include int main(void) { float x = 1e900; /* value too big => infinity*/ float y = -1e900; /* value too big => infinity*/ printf(“%%f: x=%f and y=%f \n”, x, y); printf(“%%e: x=%e and y=%e \n”, x, y); printf(“%%g: x=%g and y=%g \n”, x, y); return EXIT_SUCCESS; } $ gcc -o float3 -std=c99 -pedantic float3.c float3.c: In function ‘main’: float3.c:5:4: warning: floating constant exceeds range of ‘double’ float3.c:6:4: warning: floating constant exceeds range of ‘double’
$ ./float3 %f: x=Inf and y=-Inf %e: x=Inf and y=-Inf %g: x=Inf and y=-Inf
II.6.2.2 double The type double is similar to type float with more digits to represent the significand and the exponent. A variable of type double is declared like this: double variable_name;
You could also initialize a variable at the same time as its declaration (definition): double variable_name = val;
o The semicolon at the end of the statement is mandatory. o The keyword double is at the beginning of the statement. It cannot be used for naming a variable or a function. It is recognized as a special word denoting a type. o Spaces around the equals sign and the semicolon, are allowed o One or more spaces after the keyword double are required. o val can be a variable, a floating-point constant, or an integer constant. More generally, it is an arithmetic expression (expressions are broached in Chapter IV). The type double can be used exactly in the same way as the type float. The difference is the type double is a superset of the type float. The set of values represented by the type double contains the set of values representable by the type float. The following example shows that a variable of type double can hold bigger floating numbers than if it was of type float: $ cat double1.c #include #include int main(void) { float x = 1.52e135; printf(“x (%%e)=%e\n”, x); printf(“x (%%g)=%g\n”, x); double y = 1.52e135; printf(“y (%%e)=%e\n”, y); printf(“y (%%g)=%g\n”, y); return EXIT_SUCCESS;
} $ gcc -o double1 -std=c99 -pedantic double1.c $./double1 x (%e)=Inf x (%g)=Inf y (%e)=1.520000e+135 y (%g)=1.52e+135
In our computer, the number 1.52*10135 is too big to be held by the variable x of type float. It is displayed as Inf (infinite) by gcc while it fits in the variable y of type double. The following example shows the type double allows a better accuracy than the type float. Two variables of type float and double are assigned a floating constant that is an approximation of . Both the variables cannot support such a precision, they are both rounded to the nearest floating-point number. $ cat double2.c #include #include int main(void) { double dbl_pi = 3.141592653589793238462643383279; float flt_pi = 3.141592653589793238462643383279; printf(“literal =3.141592653589793238462643383279\n”); printf(“dbl_pi =%.30lf\n”, dbl_pi); printf(“flt_pi =%.30f\n”, flt_pi); return EXIT_SUCCESS; } $ gcc -o double2 -std=c99 -pedantic double2.c $ ./double2 literal =3.141592653589793238462643383279 dbl_pi =3.141592653589793115997963468544 flt_pi =3.141592741012573242187500000000
The type double has a precision greater than or equal to the precision of the type float. In our computer, the double variable has fifteen correct digits while the float variable has six correct digits. The section II.6.2.6 will explain why… II.6.2.3 long double
The type long double can be used in the same way as the types double and float. A variable of type long double is declared like this: long double variable_name;
The C language allows you to initialize a variable at the same time as its declaration: long double variable_name = val;
o The semicolon at the end of the statement is mandatory. o The keyword long double is at the beginning of the statement. o Spaces around the equals sign and the semicolon, are allowed o One or more spaces after the keyword long double are required. o val can be a variable, a floating-point constant, or an integer constant. More generally, it is an arithmetic expression (see Chapter IV). To display a long double with printf(), you have three ways: o by using %Lf: the number is displayed in the format [-]i.f, where i is the integral part and f the fractional part of the number. o by using %Le, %Lg, %LE or %LG: %Le displays a floating-point number in scientific decimal notation (the base appears in lowercase) while %Lg is either %Le or %Lf depending on the value and the precision of the number. %LE and %LG are equivalent to %Le and %Lg respectively: they just display the base in uppercase. o by using %La or %LA that displays a floating-point number in scientific hexadecimal notation. The type long double works in the same way as the types float and double. It is a superset of the double type. The following example tries to display the π number with 30 digits after the decimal point after storing it into the dbl_pi variable having the type double and into the ldbl_pi variable of type long double: $ cat ldbl1.c #include #include int main(void) { double dbl_pi = 3.141592653589793238462643383279; long double ldbl_pi = 3.141592653589793238462643383279; printf(“literal =3.141592653589793238462643383279\n”); printf(“dbl_pi =%.30f\n”, dbl_pi); printf(“ldbl_pi =%.30Lf\n”, ldbl_pi);
return EXIT_SUCCESS; } $ gcc -o ldbl1 -lm -std=c99 -pedantic ldbl1.c $ ./ldbl1 literal =3.141592653589793238462643383279 dbl_pi =3.141592653589793115997963468544 ldbl_pi =3.141592653589793238512808959406
The long double type has a precision greater than or to that of the type double. In our computer, the double variable has fifteen correct digits while the long double variable has eighteen correct digits. The range of values represented by long double type is greater than or equal to that of the type double. In the following example, in our operating system, the number 103000 assigned to a variable of type double is treated as infinite while it can be represented by the type long double. $ cat ldbl2.c #include #include int main(void) { double dbl = 1e3000 ; long double ldbl = 1e3000; printf(“dbl =%f\n”, dbl); printf(“ldbl =%Lf\n”, ldbl); return EXIT_SUCCESS; } $ ./ldbl2 dbl =Inf ldbl =1e+3000
II.6.2.4 Infinity Floating-point numbers that are too large to be represented by a real floating type are considered infinite. In the following example, the floating-point numbers 105000 and -105000 cannot be represented by the type float, they are treated as +infinite and –infinite: $ cat float_infinite.c #include
#include int main(void) { float x = 1e5000 ; float y = -1e5000 ; printf(“x=%f and y=%f\n”, x, y); return EXIT_SUCCESS; } $ gcc -o float_infinite -std=c99 -pedantic float_infinite.c float_infinite.c: In function ‘main’: float_infinite.c:5:4: warning: floating constant exceeds range of ‘double’ float_infinite.c:6:4: warning: floating constant exceeds range of ‘double’ $ ./float_infinite x=Inf and y=-Inf
II.6.2.5 NaN Operations or functions dealing with floating-point numbers may yield special values known as NaN. NaNs (Not a Number) represent undefined values. There can be several NaNs whose values depend on the implementation. For example, the square root of -1, sqrt(-1), produces NaN. The following operations also produce NaN: 0/0, infinite/infinite, infinite infinite, 0*infinite. Here is an example: $ cat float_NaN.c #include #include #include int main(void) { double v = 1E900; /* Infinite */ double u = 1E-900; /* 0 */ double w = v * 0; /* NaN */ double x = v / v; /* NaN */ double y = v - v; /* NaN */ double z = u/u; /* NaN */ printf(“square root(-1): sqrt(-1)=%f\n”, sqrt(-1)); printf(“v=%f u=%f\n”, u, v); printf(“v*0=%f\n”, w); printf(“v/v=%f\n”, x);
printf(“v-v=%f\n”, y); printf(“u/u=0/0=%f\n”, z); return EXIT_SUCCESS; } $ gcc -o float_NaN -std=c99 -pedantic -lm float_NaN.c float_NaN.c: In function ‘main’: float_NaN.c:6:4: warning: floating constant exceeds range of ‘double’ float_NaN.c:7:4: warning: floating constant truncated to zero $ ./float_NaN square root(-1): sqrt(-1)=-NaN v=0.000000 u=Inf v*0=-NaN v/v=-NaN v-v=-NaN u/u=0/0=-NaN
II.6.2.6 Floating-point limits In scientific notation, a floating-point number is composed of three parts: a sign, a significand and an exponent part. The significand is made up of an integer part, the radix point, and a fractional part. The exponent part may be omitted such as in the number 3.14 (instead of 3.14*100). A floating-point number has the form: ± m x be, where: o ± is the sign. It can be positive or negative. o m is the significand (sometimes referred to as a mantissa). It is a number with a fractional part o b represents the base or radix. In the base 10 number system, b is 10. In the binary number system, b is 2. Generally, systems work with base 2 but nothing prevents from using another base. o e is the exponent. It is an integer that can be positive, zero or negative As our computer has a finite memory and then stores floating-point numbers in a fixed bitlength memory chunk, how could the number 3.14 be stored? Should it be stored as 0.314*10 or 314*10-2? How many bits should be reserved for the significand and how many bits for the exponent? The first issue is that a floating-point number may be written in several ways: 3.14, 31.4x10-1, 0.314x101… That’s why, a floating-point number is normalized so as to have a single representation of the number. The normalization of a number depends on the representation that is adopted. For example, a normalized floating-point number could
start with 0, followed by the radix point followed by a nonzero digit such as 0.314x101. [19] In order to store a floating-point number, a specific representation must be used . There exist several representations of floating-point numbers. The most widely used is described by the standard IEEE 754 also referred to as ISO/IEC/EEEE 60559. To understand the limits of the C language, defined in the header file float.h, we have to resort to a representation of floating-point numbers. Otherwise, they would appear as cryptic. In the following section, we resort to the examples of floating-point representation given by the C99 standard deriving from the representations described by the standard IEEE 754. II.6.2.7 Example of representation A floating-point number could be represented as follows (see the beginning of the chapter about system numerals): fnb=sign m be Where m=d1 b-1 + d2 b-2 + … + dn b-n Where emin ≤ e ≤ emax
Where 0 ≤ di ≤ b-1 Where: o sign is the sign of the floating-point number (±). o b is the radix. In decimal numeral system, b is 10. In binary base, b is 2. In C99, it is denoted by the macro FLT_RADIX. o d1, d2,…, dn are digits expressed in base radix number system. They are in the range of the natural numbers [0, b-1]. For example, in base 2, they can be either 0 or 1 . In base 10, the digits are in the integral interval [0-9]. o n is the number of digits of the significand, known as a precision. The C99 standard represents it by the macro FLT_MANT_DIG for the type float, DBL_MANT_DIG for the type double, LDBL_MANT_DIG for the type long double. o e is the exponent within the integral range [emin,emax]. The values emin and emax depend on the implementation and the floating type. In C99, emin is called FLT_MIN_EXP for the type float, DBL_MIN_EXP for the type double, LDBL_MIN_EXP for the type long double. emax is called FLT_MAX_EXP for the type float, DBL_MAX_EXP for the type double, LDBL_MAX_EXP for the type long double For example, in base 10, the number 3.14 can be represented as 0.314*10-1 = (3x101+1x10-2+4x10-3+) x 10-1. It is composed of: o The sign +
o The significand is 0.314: d1=3, d2=1, d3=4 and 0 ≤ di ≤ 9. Its precision is 3. o The exponent is -1 o The base is 10. A variable of real floating type can take several kinds of values: o Finite floating-point numbers: ▪ If the floating-point number fnb is not zero and d1 > 0, the number is said to be
normalized. ▪ If the floating-pointer number fnb is not zero, d1=0 and e = emin, the number is said
to be denormalized. Denormalized numbers (also called subnormal) are too small to be represented as normalized numbers. They can be used to represent very small floating-point numbers. o Infinite numbers: +infinite and –infinite. The values depend on the implementation. o NaN (Not a number) representing an undetermined value. There can be several kinds of NaN whose values depend on the implementation. What is the difference between normalized and denormalized floating-point numbers? The normalized form ensures a single way to represent a finite floating-point number: the very first significant digit d1 is different from 0. The denormalized form is used to represent numbers too small to be represented by the normalized form: the first digit d1 is 0 which yield the loss of one digit of precision. In our representation, a normalized floating-point number takes the form ± 0.d1d2d3… x be. For example, the number -827.6 takes the normalized form -0.8276*103 composed of: o The sign – o The significand is 0.8276: d1=8, d2=2, d3=7 and d4=6. Its precision is 4. o The exponent is 2 o The base is 10. [20] Likewise, in our representation, the binary number 101.112 has the normalized form 1.01112*22: o The sign is + o The significand is 1.01112. o The precision is 5: d1=1, d2=1, d3=1, d4=1, d5=1. o The exponent is 4
o The radix is 2.
How do you think we could convert the binary number 101.11 into decimal number? 101.112=1*22 + 0*21 + 1*20 + 1*2-1 + 1*2-2=5+0.75=5.75. So, the binary number 101.11 has the normalized form 1.01112*22 and stands for 5.7510 in the decimal number system.
In Figure II‑7, we have represented the intervals for normalized and denormalized numbers. In our representation, the bounds can be computed easily, they are given below: NFLPmax=bemax (1-b-n) NFLPmin= bemin-1 DFLPmax = bemin-1 (1-b-n+1) DFLPmin = bemin-n
Where: o NFLPmax is the maximum normalized floating-point number. It represents the largest representable finite number. In C, it is represented by the macro FLT_MAX for the type float, DBL_MAX for the type double and LDBL_MAX for the type long double. o NFLPmin is the minimum normalized floating-point number. It represents the smallest representable number without losing precision. In C, it is denoted by the macro FLT_MIN for the type float, DBL_MIN for the type double and LDBL_MIN for the type long double. o DFLPmax is the maximum denormalized floating-point number. It is not specified in C. o DFLPmin is the minimum denormalized floating-point number. It represents the smallest representable number but with precision loss. It is not specified in C.
Figure II‑7 Ranges of normalized and denormalized floating-point numbers
If the base is 2: NFLPmax=2emax(1-2-n). NFLPmin=2emin-1
DFLPmax = 2emin-1(1-2-n+1) DFLPmin = 2emin-n.
A normalized floating-point number is in the range [-NFLPmin, -NFLPmax] U [NFLPmin, NFLPmax]. A denormalized floating-point number is in the range [-DFLPmin, -DFLPmax] U [DFLPmin, DFLPmax]. Not all normalized and denormalized floating-point numbers can be represented because the number of digits for the significand is finite while a real floating-point number can have any number of significand digits. Figure II‑7 shows several bounds: NFLPmin, NFLPmax, DFLPmin and DFLPmax. A real floating-point number with a precision m > n (n being the largest precision defined by the system according to the floating type) cannot be represented and then is rounded to the nearest representable real floating-point number. The absolute value of a floating-point number greater than NFLPmax cannot be represented either (overflow): it is considered as infinite. The absolute value of a floating-point number less than NFLPmin is not a normalized number (underflow) but can be approximated by a denormalized number with precision loss. The absolute value of a floating-point number less than DFLPmin is not representable at all.
Let us compute the DFLPmax, DFLPmin, NFLPmax, NFLPmin. We are going to play with mathematics. A normalized number takes the form d1 b-1 + d2 b-2 + … + dn b-n where d1 > 0. The maximum normalized floating-pointer number NFLPmax is equal to: bemax((b-1)xb-1 + (b-1)xb-2 + … + (b-1)xb-n). The minimum normalized floating-pointer number NFLPmin is equal to: bemin(1xb-1 + 0xb-2 + … + 0x2-n) = bemin x b-1= bemin-1 In mathematics, the geometric series 1+q+q2+…+qn equals to (1-qn+1)/(1-q). Which implies 1+r+r-2+…+r-n= 1+1/r+(1/r)+…+(1/r)n = (1-1/rn+1)/(1-1/r).
So, we can write: (b-1)xb-1 + (b-1)xb-2 + … + (b-1)xb-n = (b-1) b-1 (1+1/b2+…+1/bn-1)
=(b-1) b-1 (
= (b-1) (
)
)
= 1-b-n
Then, NFLPmax=bemax (1-b-n) Let’s move on…Let us compute the maximum and minimum denormalized floating-point number respectively denoted by DFLPmax and DFLPmin. DFLPmax = bemin((b-1)b-2+…+(b-1)b-n) = bemin (b-1) b-2 (1+1/b2+…+1/bn-2)
= bemin (b-1) b-2(
)
= bemin (b-1) b-1(
)
= bemin b-1 (1-b-n+1) DFLPmax = bemin-1 (1-b-n+1) DFLPmin = bemin (0xb-2+…+1xb-n)=bemin-n.
Figure II‑8 Binary floating-point representation
The C99 standard specifies another value represented by the macro FLT_EPSILON for the type float, DBL_EPSILON for the type double, LDBL_EPSILON for the type long double. Let us call it epsilon. It is the smallest significand (with no order of magnitude: exponent is set to 0) such that 1 + epsilon > 1. With our representation, its value would be: epsilon = b1-n.
For a floating-point number v that is less than epsilon, 1 + v=1!
Let us compute epsilon, 1+epsilon=1+d1xb-1+…+d1xb-i The normalized form of that number is 1+epsilon=1+d1xb-1+…+dixb-i=( b-1+d1xb-2+…+dixb-i-1)b The smallest number such that 1+epsilon=(b-1+d1xb-2+…+dixb-i-1)b > 1=(b-1)b is d1=0, d2=0,…,di=1 and –i-1=-n because n is the maximum number of digits for a significand (precision). Then, i=n-1 and epsilon=b-(n-1)=b1-n
Table I‑22 shows examples of binary floating-point representation for the types float and double.
Table II‑22 Example of values for floating-point numbers
II.6.2.8 Limits The C language does not impose a specific representation for floating point numbers: base (radix), and the size of the radix and the significand are left to implementations. Table II‑23 and Table II‑24 describe some limits represented by macros defined in the header file float.h. Macros beginning with FLT apply to type float. Macros beginning with DBL apply to type double. Macros beginning with LDBL apply to type long double.
Table II‑23 Some minimum limits defined in float.h
Table II‑24 Some maximum limits defined in float.h
The following program displays the limits list in Table II‑23 and Table II‑24 for the type float: $ cat float_max.c #include #include #include int main(void) { printf(“FLT_RADIX=%d\n”, FLT_RADIX); printf(“FLT_MANT_DIG=%d\n”, FLT_MANT_DIG); printf(“FLT_MIN_EXP=%d\n”, FLT_MIN_EXP); printf(“FLT_MAX_EXP=%d\n”, FLT_MAX_EXP); printf(“FLT_MIN_10_EXP=%d\n”, FLT_MIN_10_EXP); printf(“FLT_MAX_10_EXP=%d\n”, FLT_MAX_10_EXP); printf(“FLT_MIN=%e\n”, FLT_MIN); printf(“FLT_MAX=%e\n”, FLT_MAX); printf(“FLT_DIG=%d\n”, FLT_DIG); printf(“FLT_EPSILON=%e\n”, FLT_EPSILON); return EXIT_SUCCESS; }
In our computer, after compiling the program, we get this: $ gcc -o float_max -std=c99 -pedantic float_max.c $ ./float_max FLT_RADIX=2 FLT_MANT_DIG=24 FLT_MIN_EXP=-125 FLT_MAX_EXP=128 FLT_MIN_10_EXP=-37 FLT_MAX_10_EXP=38 FLT_MIN=1.175494e-38 FLT_MAX=3.402823e+38 FLT_DIG=6 FLT_EPSILON=1.192093e-07
The following program displays the limits listed in Table II‑23 and Table II‑24 for the type double: $ cat dbl_max.c #include
#include #include int main(void) { printf(“FLT_RADIX=%d\n”, FLT_RADIX); printf(“DBL_MANT_DIG=%d\n”, DBL_MANT_DIG); printf(“DBL_MIN_EXP=%d\n”, DBL_MIN_EXP); printf(“DBL_MAX_EXP=%d\n”, DBL_MAX_EXP); printf(“DBL_MIN_10_EXP=%d\n”, DBL_MIN_10_EXP); printf(“DBL_MAX_10_EXP=%d\n”, DBL_MAX_10_EXP); printf(“DBL_MIN=%e\n”, DBL_MIN); printf(“DBL_MAX=%e\n”, DBL_MAX); printf(“DBL_DIG=%d\n”, DBL_DIG); printf(“DBL_EPSILON=%Le\n”, DBL_EPSILON); return EXIT_SUCCESS; }
If we run it in our computer, we get this $ ./dbl_max FLT_RADIX=2 DBL_MANT_DIG=53 DBL_MIN_EXP=-1021 DBL_MAX_EXP=1024 DBL_MIN_10_EXP=-307 DBL_MAX_10_EXP=308 DBL_MIN=2.225074e-308 DBL_MAX=1.797693e+308 DBL_DIG=15 DBL_EPSILON=2.220446e-16
The following program displays the limits listed in Table II‑23 and Table II‑24 for the type long double: $ cat ldbl_max.c #include #include #include int main(void) { printf(“FLT_RADIX=%d\n”, FLT_RADIX); printf(“LDBL_MANT_DIG=%d\n”, LDBL_MANT_DIG); printf(“LDBL_MIN_EXP=%d\n”, LDBL_MIN_EXP);
printf(“LDBL_MAX_EXP=%d\n”, LDBL_MAX_EXP); printf(“LDBL_MIN_10_EXP=%d\n”, LDBL_MIN_10_EXP); printf(“LDBL_MAX_10_EXP=%d\n”, LDBL_MAX_10_EXP); printf(“LDBL_MIN=%Le\n”, LDBL_MIN); printf(“LDBL_MAX=%Le\n”, LDBL_MAX); printf(“LDBL_DIG=%d\n”, LDBL_DIG); printf(“LDBL_EPSILON=%Le\n”, LDBL_EPSILON); return EXIT_SUCCESS; }
If we run it in our computer, we get this: $ ./dbl_max FLT_RADIX=2 LDBL_MANT_DIG=64 LDBL_MIN_EXP=-16381 LDBL_MAX_EXP=16384 LDBL_MIN_10_EXP=-4931 LDBL_MAX_10_EXP=4932 LDBL_MIN=3.362103e-4932 LDBL_MAX=1.189731e+4932 LDBL_DIG=18 LDBL_EPSILON=1.084202e-19
As floating-point numbers have internal binary representation in computers, decimal floating-numbers you will use may actually be an approximation. Consider the decimal floating-point numbers 0.5 and 0.125, their binary representations are 0.1 (0.5=1x2-1) and 0.001 (0.125=0x2-1+0x2-2+1x2-3) respectively. Both the numbers are accurately represented in binary. Now, consider the number 0.1: in binary, it is written 0.0001100110011… Whatever the precision adopted, the decimal floating-point number 0.1 will never be represented accurately in binary base. Therefore, we have four kinds of issues with floating-point numbers: o A floating-point number with too many digits (such as π) cannot be represented accurately: it is approximated. o A floating-point number with a magnitude too large (such as represented: it is considered infinite.
109999)
cannot be
o A floating-point number with a magnitude too small (such as represented: it is considered 0.
10-9999)
cannot be
o A decimal floating-point number may be approximated if FLT_RADIX is not 10 (usually 2).
If a floating-point number, expressed in base 10, has a precision greater than FLT_DIG (for float), DBL_DIG (for double), or LDBL_DIG (for long double), there may be a loss of accuracy. Consider the following example: $ cat float_limit1.c #include #include int main(void) { float x = 3.1415926535; printf(“x set to 3.1415926535. x=%.10f\n”, x); return EXIT_SUCCESS; } $ gcc -o float_limit1 float_limit1.c $ ./float-limit1 x set to 3.1415926535. x=3.1415927410
In our example, the x variable is set to a decimal floating-point literal (3.1415926535) with a precision of 11, which is greater than FLT_DIG. The number held in x is converted to a binary number (if FLT_RADIX is 2, which is generally the case) with a precision of FLT_MANT_DIG and rounded if required before being stored into the variable. This means, we may not get exactly the same number and then there may be a loss of accuracy. There will be no loss if the floating-point number has a precision less than or equal to FLT_DIG digits as shown by the following example: $ cat float_limit2.c #include #include int main(void) { float x; x = 3.14159; printf(“x set to 3.14159. x=%f\n”, x); x = 33.14159; printf(“x set to 33.14159. x=%f\n”, x); x = 333.14159; printf(“x set to 333.14159. x=%f\n”, x);
x = 3333.14159; printf(“x set to 3333.14159. x=%f\n”, x); x = 33333.14159; printf(“x set to 33333.14159. x=%f\n”, x); x = 333333.14159; printf(“x set to 333333.14159. x=%f\n”, x); x = 3333333.14159; printf(“x set to 3333333.14159. x=%f\n”, x); return EXIT_SUCCESS; } $ gcc -o float_limit2 -std=c99 -pedantic float_limit2.c $ ./float_limit2 x set to 3.14159. x=3.141590 x set to 33.14159. x=33.141590 x set to 333.14159. x=333.141602 x set to 3333.14159. x=3333.141602 x set to 33333.14159. x=33333.140625 x set to 333333.14159. x=333333.156250 x set to 3333333.14159. x=3333333.250000
The example shows the more the magnitude of a floating-point number is large, the less the number of significant digits for the fractional part is small and can even be ignored as shown below: $ cat float_limit3.c #include #include int main(void) { float f = 8888888.125; float g = 8888888.225; printf(“%f-%f=%g\n”, g, f, g-f); return EXIT_SUCCESS; } $ gcc -o float_limit3 -std=c99 -pedantic float_limit3.c $ ./float_limit3
8888888.000000-8888888.000000=0
The less significant digits of the integral part may be discarded and the number may be rounded as shown by the following example: $ cat float_limit4.c #include #include int main(void) { float f = 777777777; /* precision of 9 */ printf(“777777777=%f\n”, f); printf(“777777777=%e\n”, f); return EXIT_SUCCESS; } $ gcc -o float_limit4 -std=c99 -pedantic float_limit4.c $ ./float_limit4 777777777=777777792.000000 777777777=7.777778e+08 0100 and dbl_g=1e-08
When a number is too big to be held in a variable of type float, it takes the symbolic value Inf (or –Inf): $ cat float_limit5.c #include #include int main(void) { float x = 10e+130; float y = -10e+130; printf(“x=%f\ny=%f\n”, x, y); return EXIT_SUCCESS; } $ gcc -o float_limit5 -lm -std=c99 -pedantic float_limit5.c $ ./float_limit5 x=Inf y=-Inf
It is possible to have numbers less than FLT_MIN. They are denormalized numbers. In the
following example, we display a number less than FLT_MIN: $ cat float_limit6.c #include #include #include int main(void) { float x = FLT_MIN*0.01; printf(“FLT_MIN=%e\n”, FLT_MIN); printf(“FTL_MIN*0.01=%e\n”, x); return EXIT_SUCCESS; } $ gcc -o float_limit6 -std=c99 -pedantic float_limit6.c $ ./float_limit6 FLT_MIN=1.175494e-38 FTL_MIN*0.01=1.175493e-40
The decimal floating-point number 1.25 has a precision of 3 while the decimal floating-point number 1.250 has a precision of 4. Mathematically, they are equal but there is a subtle distinction: the first notation indicates we are sure that the less significant digit is 5 and the digits afterwards are unknown and then are not written. The second notation shows our quantity is known accurately with three digits after the decimal point.
II.6.3 Complex types In mathematic a complex number takes the form: a + i b
Where a and b are real numbers, and i the imaginary unit equal to (i.e. i2=-1). The real number a is called the real part of the complex number and b the imaginary part. An imaginary number is a complex number with no real part having the form: i b. In C, real floating types and complex types are called floating types. In C (as of C99), the complex type is called _Complex, and the imaginary type is called
_Imaginary. However, practically, they are not often used because the header file complex.h
defines type names more natural: complex, and imaginary. The header file complex.h defines several useful functions and macros: o complex that expands to _Complex. You can then define a variable holding a complex number as complex or _Complex. Both are equivalent. o imaginary that expands to _Imaginary. Thus, you can define a variable holding an imaginary number as imaginary or _Imaginary. Both are equivalent. o _Imaginary_I and _Complex_I (imaginary unit) that expand to a constant i such that i2=-1. o I (representing the imaginary unit) that expands to _Imaginary_I is not implemented, it expands to _Complex_I.
_Complex_I
or
_Imaginary_I.
If
The imaginary type may not be supported on your system. Accordingly, the macros imaginary and _Imaginary_I would not be defined. As matter of fact, there are three kinds of complex types: o float _Complex (same as float complex if you include complex.h): real and imaginary parts are of type float. o double _Complex (same as double complex if you include complex.h) : real and imaginary parts are of type double. o long double _Complex (same as long double complex if you include complex.h) : real and imaginary parts are of type long double. Likewise, if the imaginary type is implemented, three kinds of imaginary types can be used: o float _Imaginary (same as float imaginary if you include complex.h) o double _Imaginary (same as double imaginary if you include complex.h) o long double _Imaginary (same as long double imaginary if you include complex.h) To get the real part of a complex number, use the functions, defined in complex.h, creal(), crealf(), or creall() whose prototypes are given below: float creal(float complex z); double creal(double complex z); long double creal(long double complex z);
If you declare a variable of type float complex, call the function crealf(). If you declare a variable of type double complex, call the function creal()…
To get the imaginary part of a complex number, use the function, defined in complex.h, cimag(), cimaglf() or cimagll() whose prototypes are shown below: float cimag(float complex z); double cimag(double complex z); long double cimag(long double complex z);
Not all compilers support complex types. For example: $ cat complex.c #include #include #include int main(void) { double complex z1 = 1 + 2*I; double complex z2 = 2.8 + 2.2*I; double complex z3 = z1 + z2; printf(“z1=%f+%f i\n”, creal(z1), cimag(z1) ); printf(“z2=%f+%f i\n”, creal(z2), cimag(z2) ); printf(“z3=%f+%f i\n”, creal(z3), cimag(z3) ); return EXIT_SUCCESS; } $ gcc -o complex -std=c99 -pedantic complex.c $ ./complex z1=1.000000 + 2.000000 i z2=1.100000 + 2.200000 i z2=2.100000 + 4.200000 i
II.7 Types of constants We talked about constants but we say hardly anything about their type. If it is obvious the constant 12 is an integer, we could wonder what kind of integer type it is: int, unsigned int, long… It is worth noting integer and floating constants are positive numbers. The minus sign before arithmetic constants is treated as a unary operator (see Chapter IV Section IV.2.2) that is not part of the constant. For example, when you write int v = -12, the integer constant
is 12 not -12 while the variable v actually holds a negative value (-12).
II.7.1 Character constants A character constant such as ‘Z’ has type int. An object of type char can hold any basic character as a positive integer. If a basic character fits in one byte, an extended character may be represented by more than one byte. For example, in UCS, the character constant ‘€’ has the integer value 0x20AC. The character encoding UTF-8 represents it by three bytes: 0x20, 0xE2, and 0x82. Basic characters can be represented by a character type (char, signed char or unsigned char) while extended characters (such as €), described in Chapter IX, are represented by one or more bytes (multibyte characters) or as a wide character (wchar_t).
II.7.2 Integer constants The C language defines a list of suffixes for integer constants specifying their type: u or U for unsigned, l or L for long, ll and LL for long long. The suffix u or U can be combined with l (or L) and ll (or LL), which leads to several possibilities. According to C99: o No suffix ▪ If a decimal integer constant has no suffix, the first integer type that can hold it is
used according to the following order: int, long, long long ▪ If a hexadecimal or octal integer constant has no suffix, the first integer type that
can hold it is used according to the following order: int, unsigned int, long, unsigned long, long long, unsigned long long
o Suffix U: ▪ If a decimal, hexadecimal or octal integer constant has the suffix U, the first integer
type that can hold it is used according to the following order: unsigned int, unsigned long, unsigned long long
o Suffix L: ▪ If a decimal integer constant has suffix L, the first integer type that can hold it is
used according to the following order: long, long long ▪ If a hexadecimal or octal integer constant has the suffix L, the first integer type that
can hold it is used according to the following order: long, unsigned long, long long, unsigned long long
o Suffix UL:
▪ If a decimal, hexadecimal or octal integer constant has the suffix UL, the first
integer type that can hold it is used according to the following order: unsigned long, unsigned long long
o Suffix LL: ▪ If a decimal integer constant has suffix LL, the first integer type that can hold it is: long long ▪ If a hexadecimal or octal integer constant has the suffix LL, the first integer type
that can hold it is used according to the following order: long long, unsigned long long.
o Suffix ULL: ▪ If a decimal, hexadecimal or octal integer constant has the suffix ULL, the first
integer type that can hold it is: unsigned long long.
For example, the integer constants 12, 0xFA, 012 have type int. the integer constant 12U has type unsigned int. The integer constant 12LL has type long long…
II.7.3 Floating constants Real floating constants can be of type float, double or long double. Suffixes can be appended to floating constants to specify their type: f (or F) for float, l (or L) for long double. With no suffix, a floating constant is of type double. Here are some floating constants: 1.0, 1., 3.14e1, 3.1e-2, 2.8f, 2.618e-2L.
II.8 Type qualifiers [21]
The C language specifies three kinds of type qualifiers: const, volatile and restrict . A type without a qualifier is called unqualified type: such as int, float… A type with a qualifier is called qualified type: const int, volatile int, restrict int, const restrict int, const volatile restrict int… A type can be qualified with one, two or three qualifiers in any order. A qualifier does not change the representation of a type but the way it is used. For example, an object of type const int has the same representation as an int but it is used as a read-only object.
II.8.1 Const So far, our variables could be altered at any time. In some cases, programmers do not want their variables to be modified. The C variable defines the type qualifier const that tells the compiler the variable that follows it cannot be modified once created. The const
qualifier can be placed before or after the type it qualifies. Such a variable is not an actual constant such as 16, 1.2, or “hello”. For example: $ cat const1.c #include int main(void) { float const pi = 3.14; pi = 3.1459; return EXIT_SUCCESS; } $ gcc -o const1 -std=c99 -pedantic const1.c const1.c: In function ‘main’: const1.c:5:3: error: assignment of read-only variable ‘pi’
The compilation failed because we tried to modify the variable pi declared as read-only with the qualifier const. What happened if we did not initialize it at declaration time? $ cat const2.c #include int main(void) { float const pi; pi = 3.14; return EXIT_SUCCESS; } $ gcc -o const2 -std=c99 -pedantic const2.c const2.c: In function ‘main’: const2.c:6:3: error: assignment of read-only variable ‘pi’
We got the same error. So, do not forget to initialize your const variable at the time of declaration. The const qualifier can also be placed before the type it qualifies: $ cat const3.c #include #include
int main(void) { const float pi = 3.14; printf(“pi=%f\n”, pi); return EXIT_SUCCESS; } $ gcc -o const3 -std=c99 -pedantic const3.c $ ./const3 pi=3.140000
II.8.2 Volatile Though not often used, the type qualifier volatile may be useful in some circumstances. It tells the compiler to avoid performing any optimization related to volatile variables because they may be altered by external routines other than the pieces of code containing them (by a hardware component or a thread). What does it actually mean? Most of the time, in a C program, a variable is modified by a single routine in a predictable way. For this reason, the compiler may perform optimizations. Optimizations allow the program to run faster. For example, some variables have not to be accessed each time they are used as in the following code: int flag=0; while (flag == 0) ;; printf(“Flag=%d\n”, flag);
The compiler considering the flag variable is not modified between its initialization and the while loop, could optimize it like this: int flag=0; while (1) ;; printf(“Flag=%d\n”, flag);
It makes sense. Most of the time, the compiler is right but it happens that optimizations cause an unexpected behavior of the program if variables are also modified by an element external to the program (such a hardware component or a thread). By qualifying a variable as volatile, the register storing the value will be checked each time the variable is accessed and no optimization is done.
Volatile variables are also used when the functions setjmp() and longjmp() are invoked (see section XI.15).
II.9 Aliasing types The C language allows creating new types (broached in Chapter VI) and aliasing existing types. The typedef keyword lets you create a synonym for an existing type: typedef exitsing_type_name new_name
Both the types are the same and considered the same way. In the following example, we create an alias for the type int: $ cat alias_type.c #include #include int main(void) { typedef int myinteger; myinteger i = 10; printf(“i=%d\n”, i); return EXIT_SUCESS; }
II.10 Compatible types We will talk again about compatible types; later, we will complete the definition when we broach pointers, arrays, structures, unions and functions. Two types are said to be compatible if they are the same. Two compatible types with the same qualifiers (whatever the order the qualifiers) are also compatible. In Table II‑25, types within the same cell are compatible types.
Table II‑25 Examples of compatible types
Two compatible types with the same qualifiers are compatible: const volatile int is compatible with volatile const int. Two types with different qualifiers are not compatible: const volatile int is not compatible with const int. A corollary is an unqualified type is not compatible with a qualified type: for example, const int is not compatible with the type int.
II.11 Conversions II.11.1 Assigment As explained earlier, a variable is characterized by its name, its type and the value it holds. The name of the variable identifies an object that is a memory area of the computer, identified by an address, holding a value. The type of the variable defines the way the piece of data it holds is represented, the range of values allowed and the operations that can apply on. The value is the contents of the variable depending on its type. This means that you cannot store any value in a variable. At any time, you can set a value to a variable as follows: varname=val;
Where: o varname is the identifier of the variable composed of letters, underscores and digits, starting with a letter or an underscore. o val is an expression. An expression is a combination of functions, operations, literals and variables. Later in the book, we will talk about expressions, and functions. For now, let us just imagine val as a literal or another variable.
Take note that in C, the equals sign (=) is an assignment operator (it is not a comparison operator). The variable, that is an lvalue (object that can store a value), is on the left side of the equals sign operator while the value to be stored, sometimes called an rvalue, is on the right hand. A value or a variable (object) has an implicit or an explicit type. Literals have an implicit type. A variable has an explicit type given at the time of its declaration. If the type of the value val to assign (on the right side of =) is the same as that of the variable varname (on the left side of =), there is no conversion. The value val is just copied into the variable, replacing its older value. If the type of the variable is different from the type of the value val to assign, the value is converted to the type of the variable before being copied into the variable. Such an operation is known as an implicit conversion or implicit cast. A variable can appear on the left hand or on the right hand of the equals sign. When a variable appears on the left side of the assignment operator =, it means the programmer wants to set it: it is then used as a container. When it appears on the right side, it used as its value: the variable is then replaced by its contents. A variable is an lvalue, meaning it refers to an object (memory block). If you attempt to assign a value to an operator or a literal, you will get an error at compilation time: $ cat assig1.c #include int main(void) { 17 = 1; } $ gcc -o assig1 -std=c99 -pedantic assig1.c assig1.c: In function ‘main’: assig1.c:4:2: error: lvalue required as left operand of assignment
The integer constant 17 does not refer to an object. An object has a memory location that you can access through its name or its address. Literals have no memory address. They are loaded into registers when used but have to memory address that you can deal with. In the following example, we assign the integer variable x the value of 31: $ cat assig2.c #include #include
int main(void) { int x; x = 31; printf(“x=%d\n”, x); return EXIT_SUCCESS; } $ gcc -o assig2 -std=c99 -pedantic assig2.c $ ./assig2 x=31
In the following example, we assign the integer variable x the value of the variable y: $ cat assig3.c #include #include int main(void) { int x; int y; y = 31; x = y; printf(“x=%d\n”, x); return EXIT_SUCCESS; } $ gcc -o assig3 -std=c99 -pedantic assig3.c $ ./assig3 x=31
The contents of a variable may vary over time, and can be altered as many times as you wish: $ cat assig4.c #include #include int main(void) { int x; x = 31; printf(“x=%d\n”, x); x = 407; printf(“x=%d\n”, x); return EXIT_SUCCESS;
} $ gcc -o assig4 -std=c99 -pedantic assig4.c $ ./assig4 x=31 x=407
You cannot assign any value to a variable. The type of the value you assign to a variable must be compatible or allowed (explained in the next section). The following example generates an error because we try to assign a string to a variable of type int. $ cat assig5.c #include #include int main(void) { int x; x = “hello”; printf(“x=%d\n”, x); return EXIT_SUCCESS; } $ gcc -o assig5 -std=c99 -pedantic assig5.c ssig5.c: In function ‘main’: assig5.c:6:4: warning: assignment makes integer from pointer without a cast
So far, we have assigned values that have a type compatible with the variables. Since the value on the right side of the assignment operator (=) may be converted to the type of the variable, some questions naturally rise: what happens if we try to assign a floating-point value to a variable of an integer type? What happens if we assign a negative floating-point value to a variable of type unsigned int? And so on. Answers in the next sections…
II.11.2 Implicit and explicit cast In C, a value of a certain type can be converted to another type. Depending on the types, there may be constraints but as far as arithmetic types are concerned, a value of any arithmetic type can be converted to any arithmetic type. In this chapter, the conversions we describe are only between arithmetic types. Most of them are quite natural. The C language has two kinds of type conversions also known as casts. An implicit conversion (implicit cast) is automatically performed in some expressions (such as the addition and assignment operations. Expressions are described in Chapter IV), in assignments, and when passing arguments to function (described in Chapter VII). An explicit conversion, also known as an explicit cast, is carried out by programmers. The following example shows an implicit conversion performed by the assignment operation:
$ cat type_conv1.c #include #include int main(void) { int x; x = 31.2; printf(“x=%d\n”, x); return EXIT_SUCCESS; } $ gcc -o type_conv1 -std=c99 -pedantic type_conv1.c $ ./cast1 x=31
It worked as expected: the float literal 31.2 is automatically converted to int before being assigned to the variable x. Thus, the fractional part is discarded, only keeping the integer part after the conversion. Now, run this: $ cat type_conv2.c #include #include int main(void) { float x; x = 31; printf(“x=%f\n”, x); return EXIT_SUCCESS; } $ gcc -o type_conv2 -std=c99 -pedantic type_conv2.c x=31.000000
Here again it works as expected, the integer literal 31 is automatically cast to type float (31.0) before being assigned to the variable x. The C language allows another type of conversion known as an explicit conversion or explicit cast. The implicit type conversion is automatically done. The explicit cast acts in the same way except that the conversion task is controlled by the programmer. To cast explicitly a value or a variable to type newtype, place before it the new type name newtype between parentheses:
(newtype)rval
Where: o newtype is a type name to which the value of the expression rval will be converted. o rval is an expression evaluating to a value. It can be a function, an operation, a literal, a variable or a combination of all of them. Normally, the explicit cast operator is used when a type conversion is required while the compiler cannot perform it automatically. Let us consider the following example: $ cat type_conv3.c #include #include int main(void) { int a = 3; int b = 2; float c = a / b; printf(“a/b=%d/%d=%f\n”, a, b, c); return EXIT_SUCCESS; } $ gcc -o type_conv3 -std=c99 -pedantic type_conv3.c $ ./type_conv3 a/b=3/2=1.000000
In the example above, we declared the variables a and b as type int. We also declared the variable c as float that is assigned the resulting value of the division a/b. As we will find out in Chapter IV, an arithmetic operation returns an integer type if all of its operands have an integer type. It returns a floating-point value if either operand has a floating-point type. For this reason, the division a/b did not return 1.5 as expected but 1. Since all of its operands have type int, the division returns an integral value: the fractional part is discarded. Obviously, you can tell the compiler you do not want to get only the integer part of a division but a floating-point number by using the cast operator. In the following example, we cast the variable a to float, which causes the division to return a real floatingpoint value: $ cat type_conv4.c #include #include int main(void) {
int a = 3; int b = 2; float c = (float)a / b; printf(“a/b=%d/%d=%f\n”, a, b, c); return EXIT_SUCCESS; } $ gcc -o type_conv3 -std=c99 -pedantic type_conv3.c $ ./type_conv3 a/b=3/2=1.500000
We could also have cast the variable b to float, which would have yield the same output. The following example shows implicit and explicit casts: $ cat type_conv5.c #include #include int main(void) { float v = 1/3; /* implicit cast */ float w = 1/3.0; /* no cast*/ float x = 1.0/3; /* no cast */ float y = (float)1/3; /* explicit cast */ float z = 1/(float)3; /* explicit cast */ printf(“v=%f\nw=%f\nx=%f\ny=%f\nz=%f\n”, v, w, x, y, z); return EXIT_SUCCESS; } $ gcc -o type_conv5 -std=c99 -pedantic type_conv5.c $ ./type_conv4 v=0.000000 w=0.333333 x=0.333333 y=0.333333 z=0.333333
Explanations: o float v = 1/3 declares the v variable as float and assigns it the output of the operation 1/3. As all operands of the operation are of type int, the result will be of type int. Therefore, being of type int, the expression 1/3 evaluates to 0. Then, it is converted to float before being assigned to the variable v.
o In the statement float w = 1/3.0 there is no type casting. The division operation 1/3.0 has type float and then fits into the float variable w; both have the same type. o Similarly to the previous statement, in the statement float x = 1.0/3 there is no type casting since there is one operand of type float causing the operation 1.0/3 to be evaluated to float. o The statement float y = (float)1/3 uses an explicit casting. In this case, only the integer number 1 is converted to float causing the whole expression to be evaluated to float before being actually processed. o The statement float z = 1/(float)3 also uses an explicit casting. Only the integer number 3 is converted to float causing the expression to be of type float before being actually computed. While converting a value, there may be a change of its representation. For example, converting a value of type float to type int leads to a representation change. That is the bit pattern representing a value may change after a conversion. Programmers do not have to be aware about the representation changes.
II.11.3 Conversion to integer types II.11.3.1 Conversion to Boolean type A value of any arithmetic type can be converted to a Boolean type _Bool. If the value to convert is 0, the Boolean value will be 0 after conversion. Otherwise, it will be 1. There is no overflow. II.11.3.2 Conversion to a signed integer A value of any arithmetic type (we call it source value) can be converted to a signed integer (target type). There are two cases: o The target signed integer type is too small to represent the value. That is, the source value is out of the range of the values that can be represented by the target signed integer. o The target signed integer type is large enough to represent the value. That is, the source value is in the range of the values that can be represented by the target signed integer. In this section, we will call val the original value (source value), int_val its integral part if it is a floating-point number, tgt_max the maximum value of the target signed integer type and tgt_min the minimum value of the target signed integer type.
Table II‑26 Conversion to signed integer types
If the original value has an integer type and the target signed integer type is too small to represent it, the value obtained after conversion is undefined. That is, the range of values that can be represented by the target signed integer type does not contain the original value: an overflow occurs (val > tgt_max or val < tgt_min). The result is undefined. In the following example, the variables sh1 and sh2 have an undefined value: $ cat conv2signed_int1.c #include #include #include int main(void) { signed short sh1 = INT_MAX; /* overflow */ signed short sh2 = 9876543210.123456; /* overflow */ return EXIT_SUCCESS; } $ gcc -o conv2signed_int -std=c99 -pedantic conv2signed_int.c conv2signed_int.c: In function ‘main’:
conv2signed_int.c:6:4: warning: overflow in implicit constant conversion conv2signed_int.c:7:4: warning: overflow in implicit constant conversion
If the original value has an integer type and the target signed integer type is large enough to represent it, the value obtained after conversion is the same (tgt_min ≤ val ≤ tgt_max). If the source value has a floating-point type, the fractional part is discarded. If the integral part of the original value (int_val) is within the range of values that can be represented by the target signed integer type, the target value is the integral value (tgt_min ≤ int_val ≤ tgt_max). Otherwise, an overflow occurs generating an undefined target value. Here is an example: $ cat conv2signed_int2.c #include #include int main(void) { unsigned int ui = 10; double f = 19.123456; signed short sh1 = ui; /* conversion to signed int */ signed short sh2 = f; /* conversion to signed int */ printf(“sh1=%d sh2=%d\n”, sh1, sh2); return EXIT_SUCCESS; } $ gcc -o conv2signed_int2 -std=c99 -pedantic conv2signed_int2.c $ ./conv2signed_int2 sh1=10 sh2=19
II.11.3.3 Conversion to an unsigned integer A value of any arithmetic type can be converted to an unsigned integer. In this section, we will call val the original value, int_val its integral part if it is a floating-point number, umax the maximum value of the target unsigned integer type. First, let us consider only original values that are positive. If the original value has an integer type: o If the original value is outside the range of the values that can be represented by the target unsigned integer type (val > umax), the value obtained after conversion is the original value modulo the maximum value of the unsigned integer type plus one (val % (umax+1)). The result is always defined. o If the value is within the range of the values that can be represented by the target
unsigned integer type (0 ≤ val ≤ umax), the value obtained after conversion is the same as the original value. What happens if a negative integer value is converted to an unsigned integer type? The original value v is converted to ( v + p*(umax+1) ) % (umax+1), where p is a positive integer such that v + p*(umax+1) ≥ 0. Consider the following example: $ cat conv2unsigned_int1.c #include #include #include int main(void) { int i = -1; int j = -10; unsigned int ui1 = i; unsigned int ui2 = j; printf(“UINT_MAX=%u u1i=%u ui2=%u\n”, UINT_MAX, ui1, ui2); return EXIT_SUCCESS; } $ gcc -o conv2unsigned_int1 -std=c99 -pedantic conv2unsigned_int1.c $ ./conv2unsigned_int1 UINT_MAX=4294967295 u1i=4294967295 ui2=4294967286
The value -10 (of type int) is converted to ( -10 + 1*(4294967295+1) ) modulo (4294967295+1)= 4294967286 modulo 4294967296 = 4294967286. The same rule applies for a longer target unsigned integer: $ cat conv2unsigned_int2.c #include #include #include int main(void) { int j = -10; unsigned long long ull = j; printf(“ULLONG_MAX=%llu u1=%llu\n”, ULLONG_MAX, ull); return EXIT_SUCCESS;
} $ gcc -o conv2unsigned_int2 -std=c99 -pedantic conv2unsigned_int2.c $ ./conv2unsigned_int2 ULLONG_MAX=18446744073709551615 u1=18446744073709551606
In the example above, the value -10 is converted to (-10+1*(18446744073709551615+1)) modulo (18446744073709551615+1) = 18446744073709551606 modulo 18446744073709551616 = 18446744073709551606. If the source value has a floating-point type, the fractional part is expelled: o If the integral part of the original value is within the range of the values that can be represented by the target unsigned integer type (0 ≤ int_val ≤ umax), the resulting value obtained after conversion is the integral part of the original value. o If the fractional part is not within the range that can be represented by the target unsigned integer type (int_val < 0 or int_val > umax), the value obtained is undefined. Implementations often perform modulo operations as for integer values.
Table II‑27 Conversion to unsigned integer types
II.11.4 Conversion to floating-point types A value of any arithmetic type can be converted to a floating-point type. There are several cases described in Table II‑28.
Table II‑28 Conversion to real floating-point types
II.12 Exercises Exercise 1. Display the size of the types int and long Exercise 2. Why the value -128 can be represented by the type signed char on some systems (we suppose it is represented by eight bits)? Exercise 3. Why the operation x = 1+10e-30 is equivalent to x = 1 in some systems (x is of type float)? Exercise 4. What would be the output of the operation x = (unsigned int)-1?
CHAPTER III ARRAYS, POINTERS AND STRINGS
III.1 Introduction In the previous chapter, we have learned to work with variables and basic types. So far, a variable can hold only one value at a time. Suppose you need to create a program that reads a file containing information about one thousand of persons and you need to store some pieces of data about all of them in order to perform some processes. Let us say you want to store the names, surnames and ages: how many variables are needed? 3000! Could you imagine you declare 3000 variables and work with them? Fortunately, the C language has two other very useful types that ease programming: arrays and pointers. Though they are similar and often interchangeable, they are different and must not be confused.
III.2 Arrays An array is an object composed a set of items having the same type. An array is identified by a name composed of underscores, letters and digits, starting with an underscore or a letter. We can distinguish two kinds of arrays: one-dimensional arrays and multidimensional arrays.
III.2.1 One-dimensional array III.2.1.1 Declaration Before being used, an array must be declared as shown below so that a memory block is allocated for the items if contains: arr_type arr_name[n];
Where: o arr_type is a user-defined type or a C standard type (int, long, float, array, pointer…). Userdefined types will be discussed later.
o arr_name is the name of the array. o n is a positive integer number indicating the number of elements the array stores. It represents the length of the array. More generally, n can be an integer constant expression (an expression that evaluates to an integer constant (see Chapter IV Section IV.14). An expression is a simple value, an operation or a combination of operations (Chapter IV). For example, you could declare an array as arr[2+4+1], which equivalent to arr[7]: the expression 2+4+1 evaluates to an integer constant (i.e. known at compile time). The contiguous memory area allocated at compile time is large enough to hold all of its elements: the array size is n * sizeof arr_type (see Figure III‑1). Built from other types, an array type is a derived type. Containing several objects (of same type), it is also an aggregate type. The size of an array does not change over time: it is determined at compile time and cannot be changed afterwards. Below, the array age is declared with five elements of type int (see Figure III‑1): $ cat array_decl1.c #include #include int main(void) { int age[5]; return EXIT_SUCCESS; }
Our array age can store five values of type int. All elements are independent from each other: they can be directly accessed or modified as any variable. Before talking about how we can have access to elements, let us explain how an array can be initialized.
Figure III‑1 Memory layout of the array age[5]
In C, the length of an array had to be a positive integer constant (integer literal). III.2.1.2 Initialization You have two methods to assign values in an array: at the time of declaration [22] (initialization ) or after the declaration of the array. When you declare an array, you can also initialize it by giving values enclosed between braces: arr_type arr_name[n]={val1,val2,…,valp};
Where: o arr_type is a user-defined type or a C type. o arr_name is the name of the array. o n is an integer number indicating the number of elements the array stores (length). o val1,…,valp are p values of type arr_type. o n ≥ p. If n = p, all elements are initialized. Otherwise, other elements having subscript m
such that m > p are set to 0 by default. The first element denoted by arr_name[0] takes the value of val1, the second one denoted by arr_name[1] takes the value of val2,…, the last element denoted by arr_name[p-1] takes the value of valp. Take note after you declare an array, you cannot set values of the array in this way.
Figure III‑2 Representation of the array age after initialization
The following example declares and initializes all items of the array age at the same time (depicted in Figure III‑2): $ cat array_init1.c #include #include int main(void) { int age[5] = {54,17,59,44,64};
return EXIT_SUCCESS; }
The length of the array n can be omitted if n=p: the length of the array is then computed by the compiler by counting the number of values between the braces. The following statement is equivalent to previous one if n=p: arr_type arr_name[]={val1,val2,…,valn};
The previous example is equivalent to the following code: $ cat array_init2.c #include #include int main(void) { int age[] = {54,17,59,44,64}; return EXIT_SUCCESS; }
If you do not initialize your array at declaration time, you can no longer do it in a single statement; you must then use the second method that consists in assigning directly values to elements of the array. An item in an array can be accessed by its index (subscript) that is an integer number: array[i] references the item number i+1. The first item of an array is placed at index 0, the second one at index 1, and so on. The last index (element number n) is n-1 where n is the length of the array. In our example array_init2.c, the array age is composed of five elements: the first item is denoted by age[0], the second one by age[1]…and the last one (fifth) by age[4] (see Figure III‑2). Each item of the array age is a number of type int. The following example assigns each element of the array age: $ cat array_init3.c #include #include int main(void) { int age[5]; age[0] = 54; age[1] = 17; age[2] = 59;
age[3] = 44; age[4] = 64; return EXIT_SUCCESS; }
As of C99, you can initialize only some specific elements in an array at declaration time as shown below: $ cat array_init4.c #include #include int main(void) { int age[100] = {54,17,59,44,64,[50]=22,[90]=47}; return EXIT_SUCCESS; }
In the example above, we set the elements from index 0 through index 4, along with elements of index 50 and index 90. It is equivalent to the following code: $ cat array_init5.c #include #include int main(void) { int age[100]; age[0] = 54; age[1] = 17; age[2] = 59; age[3] = 44; age[4] = 64; age[50] = 22; age[90] = 47; return EXIT_SUCCESS; }
III.2.1.3 Accessing elements in an array All of the elements of an array are of the same type and then of the same size. The only way to have access to an element in an array is to resort to its subscript: if arr is the name of an array, arr[i] is an element of the array: i is the subscript (index) that allows you to
reference the element number i+1. Why i+1 and not i? Because, in C, the first element is placed at index 0, which involves that 0 ≤ i ≤ n-1 (where n is the number of items of the array). An element of an array may be modified (it can be assigned another value as shown in example array_init5.c) or a read (the value it holds is retrieved). In the following example, we assign the variable v the value held in the second element of the array age, and then we display both the contents of the variable v and the second element of the array age. $ cat array_access1.c #include #include int main(void) { int age[5]; int v; age[0] = 54; age[1] = 17; age[2] = 59; age[3] = 44; age[4] = 64; v = age[1]; printf(“v=%d and age[1]=%d\n”, v, age[1]); return EXIT_SUCCESS; } $ gcc -o array_access1 -std=c99 -pedantic array_access1.c $ ./array_access1 v=17 and age[1]=17
Keep in mind that an array declared as type arr[n] contains n elements: the first one is arr[0] and the last one is arr[n-1]. A common mistake made by beginners is they consider the last item is arr[n], which causes bugs…
What happens if we use elements in an array that were not initialized? Consider the
following example: $ cat array_access2.c #include #include int main(void) { int age[100] = {54,17,59,44,64,[50]=22,[90]=47}; printf(“age[4]=%d\n”, age[4]); printf(“age[5]=%d\n”, age[5]); printf(“age[54]=%d\n”, age[54]); printf(“age[90]=%d\n”, age[90]); return EXIT_SUCCESS; } $ gcc -o array_access2 -std=c99 -pedantic array_access2.c $ ./array_access2 age[4]=64 age[5]=0 age[54]=0 age[90]=47
Uninitialized elements in an initialized array take the value of 0. However, if the array had not been initialized, things would have been different. Compare with the following example: $ cat array_access3.c #include #include int main(void) { int age[100]; printf(“age[4]=%d\n”, age[4]); printf(“age[5]=%d\n”, age[5]); printf(“age[54]=%d\n”, age[54]); printf(“age[90]=%d\n”, age[90]); return EXIT_SUCCESS; } $ gcc -o array_access3 -std=c99 -pedantic array_access3.c $ ./array_access3
age[4]=2 age[5]=-25616384 age[54]=134546946 age[90]=-16782720
Elements of uninitialized arrays have undetermined values. So, do not forget to initialize your arrays or setting values to their elements before using them.
Ensure the elements of your arrays have been initialized. You can initialize an array at the time of declaration or later by setting separately their elements. Whatever the method you apply, never use an item with an undefined value.
III.2.1.4 Array size The size of an array is its length multiplied by the size of an item. The sizeof operator returns the size of an array in bytes as shown below: $ cat array_size1.c #include #include int main(void) { int array1[5]; float array2[21]; printf(“size of array1=%d Bytes\n”, sizeof array1); printf(“size of array2=%d Bytes\n”, sizeof array2); return EXIT_SUCCESS; } $ gcc -o array_size1 -std=c99 -pedantic array_size1.c $ ./array_size1 size of array1=20 Bytes size of array2=84 Bytes
It is easy to get the number of elements an array holds: just divides the size of the array in bytes by the size of an element also expressed in bytes: $ cat array_size2.c #include #include
int main(void) { int array1[5]; float array2[21]; printf(“Nb of elements in array1=%d\n”, sizeof array1 / sizeof array1[0] ); printf(“Nb of elements in array2=%d\n”, sizeof array2 / sizeof array2[0] ); return EXIT_SUCCESS; } $ gcc -o array_size2 -std=c99 -pedantic array_size2.c $ ./array_size2 Nb of elements in array1=5 Nb of elements in array2=21
Here, we chose to use the first element of each array but nothing prevents you from using any element in the array as shown below: $ cat array_size3.c #include #include int main(void) { int array1[5]; float array2[21]; printf( “Nb of elements in array1=%d\n”, sizeof array1 / sizeof array1[1] ); printf( “Nb of elements in array2=%d\n”, sizeof array2 / sizeof array2[8] ); return EXIT_SUCCESS; } $ gcc -o array_size3 -std=c99 -pedantic array_size3.c $ ./array_size3 Nb of elements in array1=5 Nb of elements in array2=21
As explained in the previous chapter, the sizeof operator returns the size of a type or a variable. Now, you also know that it can get the size of an array or an element of an array. The size of an element in an array is the size of the type of the element. Thus, though the previous example is a better programming style, the previous example could also be written like this: $ cat array_size4.c #include
#include int main(void) { int array1[5]; float array2[21]; printf( “Nb of elements in array1=%d\n”, sizeof array1 / sizeof(int) ); printf( “Nb of elements in array2=%d\n”, sizeof array2 / sizeof(float) ); return EXIT_SUCCESS; } $ gcc -o array_size4 -std=c99 -pedantic array_size4.c $ ./array_size4 Nb of elements in array1=5 Nb of elements in array2=21
The operand of the sizeof operator can be a type name or an identifier (such as a variable, a pointer, an array). If the argument is an identifier, you can omit the parentheses but if the argument is a type name, you must use the parentheses around it telling the compiler the operand is a type.
The sizeof operator returns a number of bytes (that is not necessarily 8 bits). In C, a byte means sizeof(char) that is the smallest amount of memory that the computer can access: the macro CHAR_BIT, defined in the limits.h header file, stores the bit-length of a byte.
As we will see it later, the operand of the sizeof operator can be an expression. The size in bytes of the expression is the size of the type of the resulting value. The expression sizeof(1/3) returns 4 while sizeof(1.0/3) returns 8 in our computer: the type of the first expression is evaluated to an int while the second one to a double.
Keep in mind that an array’s subscript must not be greater than the length of the array minus one (i≤n-1 where i is the index and n the length of the array). The following example generates no error at compilation time but will cause bugs:
$ cat array_size5.c #include #include int main(void) { int arr[] = {200,300,400,500,600}; int i = 1; int v; arr[5] = 10; arr[6] = 10; v = arr[5]; printf( “v=%d\n”,v); printf( “i=%d\n”,i); return EXIT_SUCCESS; } $ gcc -o array_size5 -std=c99 -pedantic array_size5.c $ ./array_size5 v=10 i=10
The result is unpredictable. In our example, we accessed by mistake the memory location of the variable i and we modified it involuntarily! As the example shows it, C lets you do illegal accesses to memory. The C language is permissive because it lets you the whole control of your program. It does not check the indexes you use. It is interesting to note you can use negative integers as subscript without any complaints from the compiler: $ cat array_size6.c #include #include int main(void) { int arr[] = {200,300,400,500,600}; int v; arr[-1] = 10; v = arr[-1]; printf( “v=%d\n”,v); return EXIT_SUCCESS;
} $ gcc -o array_size6 -std=c99 -pedantic array_size6.c $ ./array_size6 v=10
Of course, this program is not correct. Why negative integers are allowed? This will be explained when we will talk about pointers…
If n is the length of an array (n a positive integer), subscripts to access elements are in the range [0,n-1].
III.2.1.5 Showing all elements of an array The for loop, described in Chapter V, allows you to display all the elements of an array. $ cat array_disp1.c #include #include int main(void) { int age[] = {54,17,59,44,64}; int i; int age_size = sizeof age / sizeof age[0]; printf( “Display %d elements of array age\n”,age_size); for (i=0; i < age_size; i++) { printf( “age[%d]=%d\n”,i, age[i]); } return EXIT_SUCCESS; } $ gcc -o array_disp1 -std=c99 -pedantic array_disp1.c $ ./array_disp1 Display 5 elements of array age age[0]=54
age[1]=17 age[2]=59 age[3]=44 age[4]=64
The for loop is composed of three parts separated by a semicolon within parentheses, and a set of statements list_statements enclosed between braces ({}) known as a block: for (part1;part2;part3) { list_statements }
When the for loop statement is executed: o Firstly, the expression part1 is processed. It is the initialization step of the loop. Here, in our example array_disp1.c, the variable i is assigned the value of 0. It is executed only once. o Secondly, the expression part2 is evaluated. If it is true, the block is executed. Otherwise, the loop ends. o Thirdly, the expression part3 is processed. In our example, the expression i++ is shorthand for i=i+1. That is, the variable i is incremented. o Then, the expression part2 is evaluated again, if it is true, the block is executed. Otherwise, the loop ends. o The expression part3 is processed, and so on. o Partt2 and part3 are executed at each iteration until the loop ends. In our example as long as the condition i < age_size is true, the for loop executes. Let us view the cycles of the for loop of our example: o array_size is evaluated to 5. o Initialization of the for loop: i is set to 0. o Cycle 1: ▪ i holds the value of 0. The condition i < array_size is then true, the block is run: the
text age[0]=54 is printed. ▪ The expression i++ increments i yielding 1.
o Cycle 2: ▪ i holds the value of 1. The condition i < array_size is then true, the block is run: the
text age[1]=17 is printed. ▪ The expression i++ increments i. The variable i holds 2.
o And so on o Cycle 4:
▪ i holds the value of 4. The condition i < array_size is then true, the block is run: the
text age[4]=64 is printed. ▪ The expression i++ increments i. The variable i holds 5.
o Cycle 5: ▪ i holds the value of 5. The condition i < array_size is false, the loop ends.
III.2.1.6 Boundaries The C language lets you go beyond the memory allocated for an array without complaining. There is no bound checking at all. Accordingly, check your subscripts are valid… III.2.1.7 Memory address The memory address of an object can be known thanks to the operator &: &v stands for the address of an object called v. For example, if age is a variable &age represents its memory address; if name_list is a one-dimensional array, &name_list[0] represents the memory address of its first element (whose subscript is 0), &name_list[1] the address of its second element… What would the address of an array be? The address of an array is the address of its very first element. Therefore, if name_list is a one-dimensional array, &name_list[0] is the also address of the array. To be consistent, in C, &name_list is the address the array as well. This is only a taste of what we are going to explain when we talk about pointers and addresses…
III.2.2 Multidimensional arrays A C multidimensional array is an array of arrays. Let us begin with a two-dimensional array. A two-dimensional array is declared like this: arr_type arr_name[n][p];
Where: o arr_type is a type name. o arr_name is the name of the array. o n is an integer number indicating the number of p-length one-dimensional arrays of type arr_type it stores. The number n is the first dimension. o p is a positive integer number indicating the number of elements of type arr_type stored in each array arr_name[i] (where i ≤ n-1). The number p is the second dimension. o An element of the array is represented by arr_name[i][j], where i ranges from 0 to n-1, and j ranges from to p-1:
The two-dimensional array arr_name can be represented as an n x p matrix, composed of n rows and p columns, but in fact, a multidimensional array is not laid out like this in memory. A row arr_name[i] represents a one-dimensional array of p elements and arr_name[i] [j] represents an element of the one-dimensional array arr_name[i]. What we say about one-dimensional arrays also applies to multidimensional arrays. An element of a two-dimensional array arr_name[i][j] can be manipulated as a variable: you can get its value or alter it. As you can easily guess it, the memory address of an element arr_name[i][j] is &arr_name[i][j]. The memory address of an array arr_name[i] is given by [23] &arr_name[i] or &arr_name[i][0] .
The following example creates a two-dimensional array called arr. $ cat array_multidim1.c #include #include int main(void) { char arr[2][3]; printf(“ARRAY arr[0] (row 0):\n”); printf(“address of arr[0][0]=%p and address of arr[0]=%p\n”, &arr[0][0], &arr[0]); printf(“ address of arr[0][1]=%p\n”, &arr[0][1]); printf(“ address of arr[0][2]=%p\n”, &arr[0][2]); printf(“\nARRAY arr[1] (row 1):\n”); printf(“address of arr[1][0]=%p and address of arr[1]=%p\n”, &arr[1][0], &arr[1]); printf(“ address of arr[1][1]=%p\n”, &arr[1][1]); printf(“ address of arr[1][2]=%p\n”, &arr[1][2]); printf(“\nsizeof arr[0][0]=%d and sizeof arr[0]=%d\n”, sizeof arr[0][0], sizeof arr[0]); printf(“sizeof arr[1][0]=%d and sizeof arr[0]=%d\n”, sizeof arr[1][0], sizeof arr[1]); return EXIT_SUCCESS; } $ gcc -o array_multidim1 -std=c99 -pedantic array_multidim1.c $ ./array_multidim1 ARRAY arr[0] (row 0): address of arr[0][0]=feffea8a and address of arr[0]=feffea8a address of arr[0][1]=feffea8b address of arr[0][2]=feffea8c
ARRAY arr[1] (row 1): address of arr[1][0]=feffea8d and address of arr[1]=feffea8d address of arr[1][1]=feffea8e address of arr[1][2]=feffea8f sizeof arr[0][0]=1 and sizeof arr[0]=3 sizeof arr[1][0]=1 and sizeof arr[0]=3
In our example array_multidim1.c, the array arr, declared as char arr[2][3], is a two-dimensional array composed of two arrays of three char. Another way to say is the array arr holds two arrays arr[0] and arr[1], each containing three elements of type char (see Figure III‑3 and Figure III‑4). A two dimensional array can be viewed as a table (2x3 matrix) composed of rows and columns as depicted in Figure III‑3 or as a linear table as sketched in Figure III‑4 that is the way a multidimensional array is actually laid out in memory. We can see, as pointed out by our previous program, and represented by Figure III‑3 and Figure III‑4, the addresses of arr[i][0] and arr[i] are identical (i taking the value 0 or 1 in our example). However, do not confuse the objects arr[i][0] and arr[i]. The object arr[i] is a onedimensional array, whose size is 3 bytes, holding three objects of type char while the object arr[i][0] is an object of type char whose size is one byte as highlighted by the program array_multidim1.c.
Figure III‑3 Two-dimension array arr[2][3] viewed as a table
A better way to view a multidimensional array is a linear representation (real layout in memory) as depicted in Figure III‑4.
Figure III‑4 Memory layout of a two-dimension array arr[2][3]
You can initialize a two-dimensional array at declaration time: $ cat array_multidim2.c #include #include int main(void) { int arr[2][3] = { { 1, 2, 3 }, /* first array: array arr[0] */ { 11, 12, 13 } /* second array: array arr[1] */ };
return EXIT_SUCCESS; }
Which is equivalent to (but prone to errors): $ cat array_multidim3.c #include #include int main(void) { int arr[2][3] = { 1, 2, 3 , /* first array: array arr[0] */ 11, 12, 13 /* second array: array arr[1] */ }; return EXIT_SUCCESS; }
Without comments, we have this: $ cat array_multidim4.c #include #include int main(void) { int arr[2][3] = { 1, 2, 3, 11, 12, 13 }; return EXIT_SUCCESS; }
Multidimensional arrays work in the same way as one-dimensional arrays. Elements in a multi-dimensional array are accessed through their subscripts. In a two-dimensional array, an element is determined by two indexes as shown below: $ cat array_multidim5.c #include #include int main(void) { int arr[2][3] = { { 1, 2, 3 }, { 11, 12, 13 } }; printf( “arr[0][0]=%d\n”, arr[0][0]); printf( “arr[1][2]=%d\n”, arr[1][2]); return EXIT_SUCCESS; }
$ gcc -o array_multidim5 -pedantic array_multidim5.c $ ./array_multidim5 arr[0][0]=1 arr[1][2]=13
The Initialization of an array can be done quite after the declaration: $ cat array_multidim6.c #include #include int main(void) { int arr[2][3]; /* init first array */ arr[0][0]=1; arr[0][1]=2; arr[0][2]=3; /* init second array */ arr[1][0]=11; arr[1][1]=12; arr[1][2]=13; printf( “arr[0][0]=%d\n”, arr[0][0]); printf( “arr[1][2]=%d\n”, arr[1][2]); return EXIT_SUCCESS; } $ gcc -o array_multidim6 -pedantic array_multidim6.c $ ./array_multidim6 arr[0][0]=1 arr[1][2]=13
As we saw it for one-dimensional arrays, an element of a multidimensional array that has not been initialized has an undefined value. Therefore, do not forget to set the elements in your multidimensional arrays before using them. In the following example, uninitialized elements of the initialized array arr take the default value of 0: $ cat array_multidim7.c
#include #include int main(void) { int arr[2][3] = { { 1, 2 }, { 11, 12, 13 } }; printf( “arr[0][2]=%d\n”, arr[0][2]); printf( “arr[1][0]=%d\n”, arr[1][0]); return EXIT_SUCCESS; } $ gcc -o array_multidim7 -std=c99 -pedantic array_multidim7.c $ ./array_multidim7 arr[0][2]=0 arr[1][0]=11
In the example above, the array arr[0] was initialized with only two values: the last element arr[0][2] was not initialized. By default, it took the value of 0. Compare with the following example: $ cat array_multidim8.c #include #include int main(void) { int arr[2][3]; printf( “arr[0][2]=%d\n”, arr[0][2]); printf( “arr[1][0]=%d\n”, arr[1][0]); return EXIT_SUCCESS; } $ gcc -o array_multidim8 -std=c99 -pedantic array_multidim8.c $ ./array_multidim8 arr[0][2]=134548698 arr[1][0]=134614376
The elements in the uninitialized array arr have an undetermined value. The last two examples show you that you have to initialize your arrays or setting values to their items before using them.
At declaration, the first dimension can be omitted if the array is initialized while the second dimension cannot be omitted even if you fully initialize the array. Here is an example omitting the first dimension: $ cat array_multidim9.c #include #include int main(void) { int arr[][3] = { { 1, 2 }, { 11, 12, 13 } }; printf( “arr[0][2]=%d\n”, arr[0][2]); printf( “arr[1][0]=%d\n”, arr[1][0]); return EXIT_SUCCESS; } $ gcc -o array_multidim9 -std=c99 -pedantic array_multidim9.c $ ./array_multidim9 arr[0][2]=0 arr[1][0]=11
Figure III‑5 Three-Dimensional array arr[2][2][3] in a matrix representation
Now, let us talk about three-dimensional arrays. You will find out nothing new, they work the same way as two-dimensional arrays. A three-dimensional array arr declared as type arr[n][p][q] is an array of n two-dimensional arrays. Naturally, we would tend to view a three-dimensional array as an nxpxq matrix (see Figure III‑5) though it is not the best way to comprehend them. Figure III‑5 shows a 2x2x3 array viewed as a 3-D matrix.
Figure III‑6 Memory layout of the three-Dimensional array arr[2][2][3]
A more appropriate way to view a multidimensional array in C is the flat representation that is the also memory layout of a multidimensional array (see Figure III‑6). A threedimensional array arr declared as type arr[n][p][q]
where n ≥ 1, p ≥ 1, and q ≥ 1 could be viewed like this (Figure III‑6): o arr is an array of n two-dimensional arrays. o arr[i] is a pxq two-dimensional array, where 0 ≤ i ≤ n-1. o arr[i][j] is a one-dimensional array composed of q elements, where 0 ≤ i ≤ n-1 and 0 ≤ j ≤ p1. o arr[i][j][k] is an element, where 0 ≤ i ≤ n-1, 0 ≤ j ≤ p-1, and 0 ≤ k ≤ q-1. The following example shows what said above and depicted in Figure III‑6: $ cat array_multidim10.c #include #include int main(void) { char arr[2][2][3]; int i, j, k; printf(“== ADDRESSES ==\n”); printf(“ARRAY arr:\n”); printf(“&arr=%p\n”, arr); printf(“\nARRAY arr[0]:\n”); printf(“&arr[0]=%p\n &arr[0][0]=%p\n &arr[0][0][0]=%p\n”, &arr[0], &arr[0][0], &arr[0][0][0]); printf(“\nARRAY arr[1]:\n”); printf(“&arr[1]=%p\n &arr[1][0]=%p\n &arr[1][0][0]=%p\n”, &arr[1], &arr[1][0], &arr[1][0][0]); printf(“\n\n== SIZES ==\n”); printf(“sizeof arr=%d\n”, sizeof arr); printf(” sizeof arr[0]=%d\n”, sizeof arr[0]); printf(“ sizeof arr[0][0]=%d\n”, sizeof arr[0][0]); printf(“ sizeof arr[0][0][0]=%d\n”, sizeof arr[0][0][0]); printf(“\n sizeof arr[1]=%d\n”, sizeof arr[1]); printf(“ sizeof arr[1][0]=%d\n”, sizeof arr[1][0]); printf(“ sizeof arr[1][0][0]=%d\n”, sizeof arr[1][0][0]); return EXIT_SUCCESS;
} $ gcc -o aray_multidim10 -std=c99 -pedantic aray_multidim10.c $ ./aray_multidim10 == ADDRESSES == ARRAY arr: &arr=feffea84 ARRAY arr[0]: &arr[0]=feffea84 &arr[0][0]=feffea84 &arr[0][0][0]=feffea84 ARRAY arr[1]: &arr[1]=feffea8a &arr[1][0]=feffea8a &arr[1][0][0]=feffea8a == SIZES == sizeof arr=12 sizeof arr[0]=6 sizeof arr[0][0]=3 sizeof arr[0][0][0]=1 sizeof arr[1]=6 sizeof arr[1][0]=3 sizeof arr[1][0][0]=1
What we said about two-dimensional arrays holds true for multi-dimensional arrays. Here is another example with a three-dimensional array: $ cat array_multidim11.c #include #include int main(void) { /* arr is a three-dimensional array holding 2 two-dimensional arrays */ char arr[2][3][2] = { /* 2 x two-dimensional arrays */ { /* First array holding a 3 two-dimensional arrays of two items: arr[0] */ { ‘a’, ‘b’ }, /* arr[0][0] first one-dimensional array: 2 elements */ { ‘c’, ‘d’ }, /* arr[0][1] second one-dimensional array: 2 elements */ { ‘e’, ‘f’ } /* arr[0][2] Third one-dimensional array: 2 elements */ },
{ /* Second array of holding a 3x2 two-dimensional array: arr[1] */ { ‘A’, ‘B’ }, /* arr[1][0] first two-dimensional array: 2 elements */ { ‘C’, ‘D’ }, /* arr[1][1] second two-dimensional array: 2 elements */ { ‘E’, ‘F’ } /* arr[1][2] Third two-dimensional array: 2 elements */ } }; printf(“Displaying three-dimensional array 2x3x2 arr:\n”); printf(“First two-dimensional array arr[0]:\n”); printf(“ First one-dimensional array arr[0][0]:\n”); printf( “ arr[0][0][0]=%c arr[0][0][1]=%c\n\n”, arr[0][0][0], arr[0][0][1]); printf(“ Second one-dimensional array arr[0][1]:\n”); printf( “ arr[0][1][0]=%c arr[0][1][1]=%c\n\n”, arr[0][1][0], arr[0][1][1]); printf(“ Third one-dimensional array arr[0][2]:\n”); printf( “ arr[0][2][0]=%c arr[0][2][1]=%c\n\n”, arr[0][2][0], arr[0][2][1]); printf(“\nSecond two-dimensional array arr[1]:\n”); printf(“ First one-dimensional array arr[1][0]:\n”); printf( “ arr[1][0][0]=%c arr[1][0][1]=%c\n\n”, arr[1][0][0], arr[1][0][1]); printf(“ Second one-dimensional array arr[1][1]:\n”); printf( “ arr[1][1][0]=%c arr[1][1][1]=%c\n\n”, arr[1][1][0], arr[1][1][1]); printf(“ Third one-dimensional array arr[1][2]:\n”); printf( “ arr[1][2][0]=%c arr[1][2][1]=%c\n”, arr[1][2][0], arr[1][2][1]); return EXIT_SUCCESS; } $ gcc -o array_multidim11 -std=c99 -pedantic array_multidim11.c $ ./array_multidim11 Displaying three-dimensional array 2x3x2 arr: First two-dimensional array arr[0]: First one-dimensional array arr[0][0]: arr[0][0][0]=a arr[0][0][1]=b Second one-dimensional array arr[0][1]: arr[0][1][0]=c arr[0][1][1]=d Third one-dimensional array arr[0][2]: arr[0][2][0]=e arr[0][2][1]=f Second two-dimensional array arr[1]:
First one-dimensional array arr[1][0]: arr[1][0][0]=A arr[1][0][1]=B Second one-dimensional array arr[1][1]: arr[1][1][0]=C arr[1][1][1]=D Third one-dimensional array arr[1][2]: arr[1][2][0]=E arr[1][2][1]=F
More generally, an M-dimensional array declared as type arr[n1][n2]…[nM] is an array containing n1 dimensional arrays of dimension M-1. That is, an array arr[i] is an array of n2x…xnM arrays where 0 ≤ i ≤ n1-1.
III.3 Pointers III.3.1 Definition A pointer is a memory location holding the memory address of an object (an object is a memory area holding a value), hence the name pointer: a pointer is a variable that points to an object (Figure III‑7).
Figure III‑7 Representation of a pointer
Introduced in this way, with no practical examples, you may wonder what kind of help we could expect from them. In C, pointers are so handy that you could not work without them. They are extensively used because they allow creating and manipulating high-level objects (this will be described in the next chapters, mainly in Chapter VI in which we explain how to create and work with your own data types). We will also use them to pass data to functions or to work directly on it instead of a copy (detailed in Chapter VII and Chapter VIII). For now, we are just trying to tame the concept that is so important in C programming. Declaring a pointer is done is like this: ptr_type *ptr_name
Where: o ptr_name is a name (called identifier) identifying the pointer. It is made of letters, underscores and digits starting with a letter or an underscore. o ptr_type is the type of the object the pointer points to.
o The asterisk * declares a pointer, meaning the name appearing after is a pointer. The following example declares pointers: $ cat pointer1.c #include #include int main(void) { float *fp; /* pointer to an object of type float */ int *ip; /* pointer to an object of type int */ unsigned int *uip; /* pointer to an object of type unsigned int */ char *s; /* pointer to an object of type character */ return EXIT_SUCCESS; }
III.3.2 Memory addresses Since a pointer is a variable holding the address of an object, how could we get the address of an object in order to initialize a pointer? This can be done by using the addressof operator & as shown below: $ cat pointer2.c #include #include int main(void) { int v = 10; float f = 1.23; printf(“v holds value %d and has address %p\n”, v, &v); printf(“f holds value %f and has address %p\n”, f, &f); return EXIT_SUCCESS; } $ gcc -o pointer2 -std=c99 -pedantic pointer2.c $ ./pointer2 v holds value 10 and has address feffea8c f holds value 1.230000 and has address feffea88
The memory address of the v variable is denoted by &v and the address of the f variable is
[24] denoted by &f. We used the specifier %p to show the addresses held in pointers . More generally, to get the address of an object named obj_name, precede it by an ampersand: &obj_name.
III.3.3 Null pointers In C, a special pointer constant, called a null pointer constant, indicates a pointer does not point to an object but to “nothing that can store a value”. A null pointer constant is a constant expression (see Chapter IV IV.14) that evaluates to 0 (integer constant expression) or (void*)0 (address constant expression): for example, 0, 2-2, 0*8 are constant expressions that evaluates to 0. The implementation chooses the null pointer constant as 0 or (void *)0. The macro NULL, representing the null pointer constant, is defined in the standard header file stdlib.h. A null pointer constant cast to a given pointer type is known as a null pointer. When a null pointer constant is cast to a pointer type, it is called a null pointer. For example, if you declare the pointer p as float *p = NULL, p will be set to a null pointer (i.e. (float *)0) that has type float *. This means there is a null pointer for each pointer type: null pointer of type char *, null pointer of type float *… Whatever the representation of null pointers, the following rules are always true: o A null pointer compares unequal to a pointer pointing to an object or a function. This is an important rule. It means null pointers allow us to set pointers to indicate they do not have to be used to get or set values. This avoids having uninitialized pointers (invalid pointer) that can hold any address that may represent no objet: uninitialized pointers may point anywhere! A null pointer assigned to a pointer tells the program “Do not attempt to access this pointer. It does not point to an object”. o A null pointer, whatever its type, can be converted to a null pointer to another type. Two null pointers compare equal even if their types are different. For example, if p and q are declared as int *p=NULL and float (*q)[10] = NULL, the expression p == (int *)q is true. This does not mean all null pointers hold the same value: as their types are different, their internal representation may then differ. Whether null pointers may not have the same internal representation should not worry you since the compiler knows when it deals with null pointers and performs the appropriate conversions.
III.3.4 Initializing a pointer Now you know that a pointer stores a memory address, you might think you could have [25] access to any address of the computer’s memory. This is not true : o Your program does not have access to the whole memory of your computer. The UNIX system and most of modern operating systems use the concept of virtual memory that
give the illusion that your program uses the entire main memory but this is not true. o Your program when run becomes a process that will be has a specific address space split into several areas. Some areas are read-only and then if you try to modify them your program will crash. This means you should not set a pointer to any address. That is, you should avoid initializing a pointer with any integer literal as in the following example: $ cat pointer3.c #include #include int main(void) { int *p = 10; printf(“p holds address %p\n”, p); return EXIT_SUCCESS; } $ gcc -o pointer3 -std=c99 -pedantic pointer3.c $ ./pointer3 pointer3.c: In function ‘main’: pointer3.c:5:12: warning: initialization makes pointer from integer without a cast p holds address a
You may think it worked. Yes but it did nothing: we just set the value of the pointer p to the address 10 and printed the value in the pointer p. You can notice the compiler complained: in our code, the variable p is a pointer to an int while the integer literal 10 is a numeric value that is not a pointer. The compiler did an implicit type casting and generated a warning telling you “please check this doubtful assignment”. You can be more specific to avoid such a warning telling the compiler “Yes, I do know what I am doing. Please go ahead…”: $ cat pointer4.c #include #include int main(void) { int *p = (int *)10; printf(“p holds address %p\n”, p); return EXIT_SUCCESS;
} $ gcc -o pointer4 -std=c99 -pedantic pointer4.c $ ./pointer4 p holds address a
No warnings generated by the code pointer4.c at compilation time. What did we do? We just explicitly cast the integer literal 10 to the expected type: (int *)10 tells the compiler that the integer literal 10 is not a mere integer but a pointer to int or another way to say it is the literal 10 is an address referencing a memory location holding an int. Thus, the type of (int *)10 is the same as that of the pointer p. Always be cautious when you resort to explicit casts: this will bypass warnings of the compiler but can be a cause of bugs. Our program generated no warnings but still suffers a big problem: the address 10 is illegal as it is not allocated by the operating system, it is an arbitrary value: it is an invalid pointer. What happens if we try to access it? Run this: $ cat pointer5.c #include #include int main(void) { int *p = (int *)10; printf(“p holds address %p\n”, p); printf(“Value referenced by pointer p %d\n”, *p); return EXIT_SUCCESS; } $ gcc -o pointer5 -std=c99 -pedantic pointer5.c $ ./pointer5 p holds address a Segmentation Fault (core dumped)
Invalid pointers do not point to valid objects. If you try to access an invalid address, your program will have an undetermined behavior messing the memory. The second printf() function crashed our program because we tried to access an illegal address (Segmentation Fault error). The variable p is a variable holding the address of an object while *p is the object itself: *p represents the contents of the memory location pointed to by the pointer p. The operator * means the contents of the memory block identified by the address held in a pointer.
Figure III‑8 Relationship between a pointer and the object it references
So, remember that you do not have to manage the memory of the computer, just use the memory that the The first way of initializing a pointer is to work with addresses of existing objects by using the address-of operator & as in the following example in which we assign the address of the variable v to the pointer p (depicted in Figure III‑8) $ cat pointer6.c #include #include int main(void) { int v = 21; int *p = &v;
printf(“variable v holds value %d and has address %p\n”, v, &v); printf(“pointer p holds value %p and points to value %d\n”, p, *p); return EXIT_SUCCESS; } $ gcc -o pointer6 -std=c99 -pedantic pointer6.c $ ./pointer6 variable v holds value 21 and has address feffea88 pointer p holds value feffea88 and points to value 21
If pointers were used only to store addresses of existing objects (allocated by the compiler at compile time), they would not be conceived! Obviously, we can imagine they can do more for programmers… Suppose you wrote a C program that read a file holding information on customers stored into arrays as we studied it previously. Suppose you had one hundred customers: obviously, you created arrays with a size larger than one hundred; let’s say two hundred. At the time you created your program you imagined that your arrays were big enough…What happens if the number of customers grows to two hundred and one? You program will fail. Therefore, you have to allocate memory dynamically. Using addresses of existing objects, as described earlier, may be useful but do not enable to write programs working with dynamic data: existing objects are known at compilation time. The problem is your program may need much more objects depending on events. You could use arrays but arrays cannot be resized once created: once your array of two hundred elements has been created, you could not insert the 201th element. Fortunately, and this is what makes pointers so useful, there is another way to initialize a pointer: using the malloc() function that is part of the C standard library, declared in the system header file stdlib.h. The malloc() functions requests the operating system a piece of available memory and returns a pointer to the allocated memory area. This method allows you to get dynamically memory according to the needs. Let us start smoothly with malloc(): $ cat pointer7.c #include #include int main(void) { int *p = malloc( sizeof(int) ); *p = 10; printf(“pointer p holds value %p and points to value %d\n”, p, *p);
*p = 19; printf(“pointer p holds value %p and points to value %d\n”, p, *p); return EXIT_SUCCESS; } $ gcc -o pointer7 -std=c99 -pedantic pointer7.c $ ./pointer7 pointer p holds value 8061010 and points to value 10 pointer p holds value 8061010 and points to value 19
In this example, the call malloc(sizeof(int)) allocates a piece of memory of size of an int and returns its address. That is, the operating system will allocate a memory area that can store an object of type int. Once the pointer references a valid address, you can work with it safely. In our example, the allocated memory lied at address 8061010. Take note that at each execution of the executable, the address may change: it is not fixed since memory is dynamically allocated. The statement *p = 10 stores the value of 10 in the memory location pointed to by the pointer p. Likewise, the statement *p = 19 stores the value of 19 in the memory location pointed to by the pointer p. We used so far the symbol * to declare a pointer and to access the value a pointer points to. When used with a pointer, it is a unary operator. This symbol * also denotes the multiplication operator: it is then an operator requiring two operands (binary operator). So, do not confuse them: o If p and q are variables holding numbers, the statement x=q*p is a multiplication operation (two operands), it has nothing to do with pointers. The operand p and q have numeric values. o If p has been declared as a pointer, the statement x=*p stores the value pointed to by the pointer p: it is not a multiplication operation. The operator * applies to the operand that follows it. In this case, the operand must a pointer. Contrast the following example: $ cat pointer8.c #include #include int main(void) { int p = 5; int x = *p;
printf(“x=%d\n”, x); return EXIT_SUCCESS; } $ gcc -o pointer8 -std=c99 -pedantic pointer8.c pointer8.c: In function ‘main’: pointer8.c:6:11: error: invalid type argument of unary ‘*’ (have ‘int’)
With: $ cat pointer9.c #include #include int main(void) { int v = 5; int *p = &v; int x = *p; printf(“x=%d\n”, x); return EXIT_SUCCESS; } $ gcc -o pointer9 -std=c99 -pedantic pointer9.c $ ./pointer9 x=5
The program pointer8.c failed because the compiler expected a pointer while we gave it an int. The statement int x =*p is illegal. Let us take one step further. Consider now the following example: $ cat pointer10.c #include #include int main(void) { int n = 5; int *p = malloc( n * sizeof(int) ); return EXIT_SUCCESS; }
What does it means? The call malloc(n * sizeof(int)) dynamically allocates a contiguous piece
of memory that can store n elements of type int. Since n holds the value 5, the pointer p points to a memory area that can take five numbers of type int. It becomes very interesting, such a pointer looks like an array… You may think we could have declared our pointer p as char p[5], we would have gotten the same result. The output would have been the same but there are differences. In program pointer10.c, the memory area is dynamically allocated, which means the allocation is done while the program is running not at compile time. The second big difference is our memory area can be resized while the size of an array cannot change (we will explain it soon). The third difference is we can free the memory allocated when we no longer need it. We will find out throughout the book other differences between arrays en pointers. In our previous example, we allocated a memory area composed of five elements of type int: malloc() returned a pointer to it. The question is if a pointer pointing to a memory area can store several elements, how can we access each element? The answer is not so obvious because the pointer holds only one address not the location of all the elements. Let us give a clue: the pointer holds the location of the memory area that is also the address of the first element. This implies that if the pointer p contains the address of the first element (let us call it addr) and as the allocated memory area is contiguous, the second element is at address addr+sizeof(int), the third at addr+2*sizeof(int)…as depicted in Figure III‑9. At this stage, you may think that since a pointer is a variable holding the address of the first element (we called it addr) then the first element should logically also be at address p, the second one at address p+sizeof(int), and so on. This seems to be obvious since p holds the value addr but in C, things are different because pointer arithmetic comes into play…
Figure III‑9 Memory allocation with malloc()
The reasoning is mathematically valid but is not true in C! Why? Because the compiler does not process a pointer as a mere numeric value even though it holds an integer number representing an address. For the compiler, a pointer is also bound to the type of the object it points to: a pointer is not an integer type; it is more than a variable holding an address. In C, a pointer has two attributes: an address and a type it points to. Thus, if the compiler encounters a pointer in an addition or a subtraction operation such as p+1, it translates it to addr+sizeof(obj_type). This is known as pointer arithmetic. More generally, if p is a pointer (holding addr) to an object obj of type obj_type, the operation p±i is converted to addr
±i*sizeof(obj_type) by the compiler. It is interesting is to note if p is a pointer and i an integer
value, the addition p+i works in pointer context (pointer arithmetic) and then also returns a pointer: keep it in mind. Why doing such a conversion? Previously, we came to the conclusion that if p, holding the value addr, is the address of the allocated contiguous memory area that is also the address of the first element, addr+sizeof(obj_type) is the address of the second element…and then addr+ (i-1)*sizeof(obj_type) is the address of ith element (counting from 1). Since the compiler converts pointers when encountered in addition and subtraction operations, this means the first element is at address p, the second one at address p+1, the third at p+2…and the ith element at p+i-1. This is a good news because they you do have to work with addresses. Working with addresses should be avoided because the size of an address held in a pointer depends on computers and then is not portable. The following example sets and displays the first and second items of the memory area pointed to by p: $ cat pointer11.c #include #include int main(void) { int n = 5; int *p = malloc( n * sizeof(int) ); /* allocates memory for 5 items of type int */ *p = 1; *(p+1) = 2; printf(“first element=%d \n”, *p); printf(“second_element=%d\n”, *(p+1)); return EXIT_SUCCESS; } $ gcc -o pointer11 -std=c99 -pedantic pointer11.c $ ./pointer11 first element=1 second_element=2
The C language allows you use array subscripts with pointers. The following example is equivalent to the previous one: $ cat pointer12.c #include #include
int main(void) { int *p = malloc( 5 * sizeof(int) ); /* allocates memory for 5 items of type int */ p[0] = 1; p[1] = 2; printf(“first element=%d \n”, p[0]); printf(“second_element=%d\n”, p[1]); return EXIT_SUCCESS; } $ gcc -o pointer12 -std=c99 -pedantic pointer12.c $ ./pointer12 first element=1 second_element=2
In summary, if p is a pointer to a memory area composed of several items: o p is a pointer to the memory area o p is also a pointer to the first object of the memory area o p[0] holds the value of the first item of the memory area: p[0] is synonym for *p o p+i is a pointer to the ith item of the memory area (counting from 0) o p[i] and *(p+i) hold the value of the ith item of the memory area (counting from 0). o The compiler converts p[i] to *(p+i). Remember that even if pointers and arrays use the same notation, they are two different types: a pointer is not an array. This will be detailed the subsequent sections.
Figure III‑10 Representation of a pointer to int
We also draw your attention that pointers cannot be used in any numeric operations: you cannot use pointers in multiplications and divisions. You can add or subtract an integer to a pointer yielding a pointer, and you can subtract two pointers of the same type to get the number of elements between the given pointers. The following example shows you that the addition operation also returns a pointer of the same type. The example pointer13.c is equivalent to pointer12.c (see Figure III‑10): $ cat pointer13.c #include
#include int main(void) { int *p = malloc( 5 * sizeof(int) ); /* allocates memory for 5 items of type int */ int *p_first_element = p; int *p_second_element = p + 1; *p_first_element = 1; *p_second_element = 2; printf(“first element=%d \n”, p[0]); printf(“second_element=%d\n”, p[1]); return EXIT_SUCCESS; } $ gcc -o pointer13 -std=c99 -pedantic pointer13.c first element=1 second_element=20
Explanation: o The statement int *p=malloc(5*sizeof(int)) allocates a contiguous memory area that can store five numbers of type int. The pointer p stores the address of the first element. o The statement int *p_first_element=p declares p_first_element as a pointer to an int and initializes it to the value held in the pointer p. It points to the first element of a memory area. o The statement int *p_second_element=p+1 declares p_second_element as a pointer to an int and initializes it to the value held in the pointer p+1. It points to the second element. o The statement *p_first_element=1 assigns the element pointed to by the pointer p_first_element to the value of 1. o The statement *p_second_element=2 assigns the element pointed to by the pointer p_second_element to the value of 2. o The printf(“first element=%d \n”, p[0]) statement displays the value of the first element. o The printf(“second_element=%d\n”, p[1]) statement displays the value of the second element. This simple example shows us a very important subtlety that could make you crazy if you do not understand it at the beginning of your learning. You have noticed that the pointer p_first_element points to same object as the pointer p and the pointer p_second_element points to the same object as the pointer p+1. This means that they have access to the same object. However, the pointer p_first_element is not the pointer p and the pointer p_second_element is not the pointer p+1. They are actually two different pointers pointing to the same object. To
allows you understand clearly the subtlety, consider the following example: $ cat pointer14.c #include #include int main(void) { int *p = malloc( 5 * sizeof(int) ); /* allocates memory for 5 items of type int */ int *q = p; *p = 1; printf(“p holds %p and points to %d but p is at address %p\n”, p, p[0], &p); printf(“q holds %p and points to %d but q is at address %p\n”, q, q[0], &q); return EXIT_SUCCESS; } $ gcc -o pointer14 -std=c99 -pedantic pointer14.c $ ./pointer14 p holds 8061068 and points to 1 but p is at address feffea8c q holds 8061068 and points to 1 but q is at address feffea88
The above example shows that both the pointers p and q points to the same memory area. The memory area lied at memory address 8061068. This implies that you can access the memory area equally through the pointer p or q (Figure III‑11). The example also shows that the pointer p is different from the pointer q: they have two different addresses meaning they represent two different objects (p and q are two distinct variables). This means that we could assign another value to the pointer q without altering the pointer p as in the following example: $ cat pointer15.c #include #include int main(void) { int *p = malloc( 5 * sizeof(int) ); /* allocates memory for 5 items of type int */ int *r = malloc( 2 * sizeof(int) ); /* allocates memory for 2 items of type int */ int *q = p; *p = 1; printf(“p holds %p and points to %d but p is at address %p\n”, p, p[0], &p); printf(“q holds %p and points to %d but q is at address %p\n”, q, q[0], &q);
q = r; r[0]=10; printf(“\np holds %p and points to %d but p is at address %p\n”, p, p[0], &p); printf(“r holds %p and points to %d but r is at address %p\n”, r, r[0], &r); printf(“q holds %p and points to %d but q is at address %p\n”, q, q[0], &q); return EXIT_SUCCESS; } $ gcc -o pointer15 -std=c99 -pedantic pointer15.c $ ./pointer15 p holds 8061160 and points to 1 but p is at address feffea6c q holds 8061160 and points to 1 but q is at address feffea64 p holds 8061160 and points to 1 but p is at address feffea6c r holds 8061968 and points to 10 but r is at address feffea68 q holds 8061968 and points to 10 but q is at address feffea64
As we explained it several times, your objects should always be set to valid values before using them. An uninitialized pointer is an invalid pointer that may have any value. What default value could we give to a pointer that we want to initialize with a valid address later in our program? A corollary of the question is how could we know that a pointer has been properly initialized or not? That is, how could we know that we could use safely a pointer? Every time you declare a pointer, initialize it with an address of an existing object, with a memory allocation function such as malloc() or just set it to the default value NULL. The macro NULL, representing a null pointer constant, is defined in the standard header file stdlib.h. A null pointer indicates there is no object pointed to: a null pointer points to “no object”. Accordingly, before accessing an object pointed to by a pointer, just check if it holds the NULL value: if yes, do not attempt dereference it with the operator *. The following example initializes the pointer q to NULL: $ cat pointer16.c #include #include int main(void) { int *q = NULL; return (EXIT_SUCESS); }
We said previously that the malloc() function returned a pointer to the allocated memory block but this not always true. It may happen that malloc() cannot allocate memory, in this
case, it returns a null pointer. That’s why, you will have to check the return value of the function. If the returned pointer compares equal to NULL, it means you cannot work with it. From now, we will check the pointer return by the malloc() function as shown below: if ( p == NULL ) { /* memory allocation failed */ printf(“malloc() cannot allocate memory\n”); return (EXIT_FAILURE); }
In your programs, after calling malloc(),check if the returned pointer is valid. If the pointer compares equal to NULL, the program could print a warning message and ends with the exit code EXIT_FAILURE.
Figure III‑11 Pointers p and q referencing the same object
If you attempt to access a pointer holding the value NULL, your program will crash.
III.3.5 Accessing an object through a pointer We have already talked about how to access pointers. In this section, we just review with additional explanations what we explained earlier. A pointer is a variable holding the address (sometimes called a reference) of an object. You can access the pointer itself by using its name as you would do with any variable. Thus, in the statement p = &v, the pointer p is considered a container (left side of =) in which a value is placed while in the statement q = p, the pointer p (in the right side of =) represents the value it holds (an address). However, here is the thing: a pointer has a double meaning. It is more than a simple address. It references an object. To have access to the object the pointer p references, just place the dereferencing operator * before the pointer: *p is the object the pointer p [26] references . Conversely, if obj is an object, to get its address, just place the reference operator & before the object name. Thus, &obj is a pointer to obj (see Figure III‑8). For example, if v is a variable of type int, &v is a pointer to int. Conversely, if r is pointer to a float, *r is a float… We have also seen that a pointer could reference a memory area composed of several items. In such a case, the pointer p references the very first item, p+1 the second one… Which means, that *p is the first item, *(p+1) denotes the second item…There is another method to access a pointer that is also extensively used: accessing a pointer as an array. Though a pointer is not array, you can resort to array subscripts to have access to objects in memory area pointed to by a pointer: p[0] is a synonym for *p, *(p+1) is a synonym for p[1] …which implies &p[0] is a synonym for p, &p[1] is a synonym for p+1… as shown below: $ cat pointer17.c #include #include int main(void) { long *p = malloc( 2*sizeof(long) ); /*allocates memory for 2 items of type long*/ if ( p == NULL ) { /* memory allocation failed */ printf(“malloc() cannot allocate memory\n”); return (EXIT_FAILURE);
} p[0] = 1; p[1] = 2; printf(“size of a long=%d\n”, sizeof(long)); printf(“p[0]=%ld *p=%ld , p=%p &p[0]=%p\n”, p[0], *p, p, &p[0]); printf(“p[1]=%ld *(p+1)=%ld , p+1=%p &p[1]=%p\n”, p[1], *(p+1), p+1, &p[1]); return EXIT_SUCCESS; } $ gcc -o pointer17 -std=c99 -pedantic pointer17.c $ ./pointer17 size of a long=4 p[0]=1 *p=1 , p=8061090 &p[0]=8061090 p[1]=2 *(p+1)=2 , p+1=8061094 &p[1]=8061094
In the example above, we can notice that in our computer the type long fits in 4 bytes: the address stored in p is 8061070, and the pointer p+1 holds the address 8061074. The rationale, if you remember what we said in the previous section, is the pointer p+1 is converted to addr+sizeof(long) by the compiler. Take note that the array operator [] takes precedence over the address-of operator &: &(p[i]) means &p[i] that is the address of the object p[i]: &(p[i]) is equivalent to p+i. You may remember that in C, you can use negative subscripts to access items. The rationale is the array notation is translated to a pointer notation by the compiler: p[-1] is converted to *(p-1) as shown below: $ cat pointer18.c #include #include int main(void) { int *p = malloc( 5 * sizeof(int) ); /* allocates memory for 5 items of type int */ if ( p == NULL ) { /* memory allocation failed */ printf(“malloc() cannot allocate memory\n”); return (EXIT_FAILURE); } int *p_second_item = p + 1; int *p_first_item = p_second_item - 1; p[0] = 12;
p[1] = 98; printf(“p[0]=%d address=%p\n”, p[0], &p[0]); printf(“p_second_item[-1]=%d address=%p\n”, p_second_item[-1], &p_second_item[-1]); printf(“p_first_item[0]=%d address=%p\n”, p_first_item[0], & p_first_item[0]); return EXIT_SUCESS; } $ gcc -o pointer18 -std=c99 -pedantic pointer18.c $ ./pointer18 p[0]=12 address=8061088 p_second_item[-1]=12 address=8061088 p_first_item[0]=12 address=8061088
In the example above, we could access any element from the second item p_second_item even the first one. The first element can be denoted by p_first_item[0], p[0], or p_second_item[-1].
Do not use illegal subscripts. If you have created a memory area, holding n objects, pointed to by the pointer p, do not try to access the element p[n]: the index is out of range. It should be in the range [0,n-1]
III.3.6 Freeing a pointer The malloc() function dynamically allocates memory to your program and returns a pointer. If the return pointer compares equal to NULL, it means the function failed to get free memory. In this case, of course, the pointer is not useable. However, if the memory allocation succeeds, you will be returned a valid pointer to a memory area. If your program consumes a lot of memory and never releases it, there may be memory shortage: your program may crash and could disrupt other running processes requesting memory. You should always think about freeing memory each time you allocate it: it is good practice to determine when allocated memory can be freed. The function free() relinquishes the memory area pointed to by the given pointer as shown in the following example: $ cat pointer19.c #include #include int main(void) { int *p = malloc( 5 * sizeof(int) ); /* allocates memory for 5 items of type int */
if ( p == NULL ) { /* memory allocation failed */ printf(“malloc() cannot allocate memory\n”); return (EXIT_FAILURE); } p[0] = 12; printf(“p[0]=%d address=%p\n”, p[0], &p[0]); free(p); p = NULL; return (EXIT_SUCCESS); }
In our example above, we freed the allocated memory pointed to by the pointer p. After you release a pointer, it is best practice to set it to the NULL value indicating the pointer is no longer valid. Take not that if you provide a null pointer to the free() function, it does nothing. Do not pass a pointer that was not returned by the malloc() function The following program is not correct: $ cat pointer20.c #include #include int main(void) { int *p = malloc( 5 * sizeof(int) ); /* allocates memory for 5 items of type int */ if ( p == NULL ) { /* memory allocation failed */ printf(“malloc() cannot allocate memory\n”); return (EXIT_FAILURE); } int *p_second_item = p + 1; p[0] = 12; printf(“p[0]=%d address=%p\n”, p[0], &p[0]); free(p_second_item);
[27] to the free() function.
return EXIT_SUCCESS; }
The above example frees the memory area pointed to by the pointer p_second_item that is not the beginning of the allocated memory. The following example is a heresy: $ cat pointer21.c #include #include int main(void) { int v = 10; int *p = &v; free(p); return EXIT_SUCCESS; }
Here is the third thing to avoid: do not reuse a pointer released by the free() function. A pointer relinquished by free() becomes an invalid pointer. The following example seems to work but it actually upsets the memory of your program: it would crash if it were more complex and had to run for a long time. $ cat pointer22.c #include #include int main(void) { int *p = malloc( 5 * sizeof(int) ); /* allocates memory for 5 items of type int */ if ( p == NULL ) { /* memory allocation failed */ printf(“malloc() cannot allocate memory\n”); return (EXIT_FAILURE); } p[0] = 12; printf(“p[0]=%d address=%p\n”, p[0], &p[0]); free(p);
p[0] = 13; printf(“p[0]=%d address=%p\n”, p[0], &p[0]); return (EXIT_SUCCESS); } $ gcc -o pointer22 -std=c99 -pedantic pointer22.c $ ./pointer22 p[0]=12 address=8061038 p[0]=13 address=8061038
To avoid reusing pointers that have been freed, always set them to a pointer as in example pointer19.c. Keep in mind that setting a pointer to another value does not free the allocated memory: $ cat pointer23.c #include #include int main(void) { int *p = malloc( 5 * sizeof(int) ); /* allocates memory for 5 items of type int */ if ( p == NULL ) { /* memory allocation failed */ printf(“malloc() cannot allocate memory\n”); return (EXIT_FAILURE); } p[0] = 12; printf(“p[0]=%d address=%p\n”, p[0], &p[0]); p = NULL; return (EXIT_SUCCESS); }
The example pointer23.c does not free the allocated memory, it just loses the reference to the allocated memory (causing memory leak). If you do that, the memory will remain allocated until the program terminates. If possible, write the statement that releases allocated memory at the same time you write code that allocates it. Thus, you will not forget to free unused memory. Memory blocks remain allocated until you free them with the free() function or at the termination of the program. When your program terminates all the resources (including allocated memory
blocks) that it uses will be relinquished.
III.3.7 void * pointer III.3.7.1 Definition The void * pointer type is a special type used to represent any pointer. Why introducing such a type in C? It happens that the type of an object that a pointer points to is not known. For example, if you have a look at the declaration of the malloc() function, you will see something like this: void *malloc(size_t s);
We can see two special types that we have not talked about so far. The type size_t is defined in the header file stdlib.h. It is an unsigned integer measuring the size of an object (in bytes). The sizeof operator returns an integer number of type size_t. The argument s of the malloc() function denotes the number of bytes of the memory area to be allocated. As matter of fact, it is not a new basic type but an “alias”: we will explain how to create aliases of existing types later. In 64-bit computer, size_t is usually an alias for unsigned long. The size s is the size of a type or that of an object itself. The type void * is very interesting. It is a pointer to an object of unknown type. The malloc() function reserves a memory space having the requested size s. It does not need to know what you will put in it: if you request four bytes, it will allocate four bytes: you will be able to put an integer, a floating-point number, four characters… it is up to you. Of course, the pointer void * will be cast to a known type later in order to work with it. For example, the statement int *p = malloc(sizeof(int)) allocates memory to an object of type int but the type of the pointer returned by malloc() does not remain as a void *, it is implicitly cast to type int *. Remember the malloc() function does not always return a valid pointer. If the function cannot allocate memory, a null pointer is retuned. Please, take note that in some examples (pointer7.c, pointer11.c, pointer12.c, pointer13.c, pointer14.c, and pointer15.c), we assumed the malloc() function returned a valid pointer (that is not a null pointer) without checking the returned value. We prefer explaining smoothly new concepts with very simple examples without complicating them with too many details when introducing them. As far as you are concerned, in your code, you have to check the pointer returned by malloc(). III.3.7.2 Usage The void * pointer is subject to some constraints. Since its type is unknown, you cannot use it to access objects unless you cast it. For example, you cannot access an object it points to by dereferencing it with * or using the subscript operator []. The following example will
not compile: $ cat void_ptr1.c #include #include int main(void) { int v = 10; void *p = &v; printf(“%d\n”, *p); return EXIT_SUCCESS; } $ gcc -o void_ptr1 -std=c99 -pedantic void_ptr1.c void_ptr1.c: In function ‘main’: void_ptr1.c:8:18: warning: dereferencing ‘void *’ pointer void_ptr1.c:8:3: error: invalid use of void expression
The following example will not compile either: $ cat void_ptr2.c #include #include int main(void) { int v = 10; void *p = &v; printf(“%d\n”, p[0]); return EXIT_SUCCESS; } $ gcc -o void_ptr2 -std=c99 -pedantic void_ptr2.c void_ptr2.c: In function ‘main’: void_ptr2.c:8:19: warning: pointer of type ‘void *’ used in arithmetic void_ptr2.c:8:19: warning: dereferencing ‘void *’ pointer void_ptr2.c:8:3: error: invalid use of void expression
While the following example will work: $ cat void_ptr3.c #include
#include int main(void) { int v = 10; void *p = &v; printf(“%d\n”, *(int *)p); printf(“%d\n”, ((int *)p)[0]); return EXIT_SUCCESS; } $ gcc -o void_ptr3 -std=c99 -pedantic void_ptr3.c $ ./void_ptr3 10 10
Any pointer to an object can be converted to void * and back to its original type without losing data. In the following example, the pointer p that is of type float * is converted void * and then back to float *: $ cat void_ptr4.c #include #include int main(void) { float * p = malloc( 2*sizeof(float) ); void *q; float *r; p[0] = 10.1; p[1]= 9.7; q = p; /* float * converted to void */ r = q; /* void * converted to float */ printf(“%f %f\n”, r[0], r[1]); return EXIT_SUCCESS; } $ gcc -o void_ptr4 -std=c99 -pedantic void_ptr4.c $ ./void_ptr4 10.100000 9.700000
III.3.8 Sizeof operator and pointers
The sizeof operator returns the size of an object or a type. If you pass a type, do not forget to enclose it between parentheses. For example: $ cat size1.c #include #include int main(void) { long long i; printf(“sizeof(long long)=%d, sizeof(i)=%d\n”, sizeof(long long), sizeof i); return (EXIT_SUCCESS); } $ gcc -o size1 -std=c99 -pedantic size1.c $ ./size1 sizeof(long long)=8, sizeof(i)=8
It is interesting to note it also holds true for pointers: $ cat size2.c #include #include int main(void) { double *p = NULL; printf(“size of double=%d, size of object=%d\n”, sizeof(double), sizeof *p); return (EXIT_SUCCESS); } $ gcc -o size2 -std=c99 -pedantic size2.c $ ./size2 size of double=8, size of object=8
Very interesting…At compile time, the sizeof operator evaluates to an integer constant that represents the size of the operand. It means, sizeof *p represents the size of the object pointed to by p even though the pointer points to nothing meaningful. Accordingly, the statement int *p = malloc(10*sizeof(int)) can also be written int *p = malloc(10*sizeof *p). The compiler will replace *p by the type of the object the pointer p points to. Why is it interesting? If you change the type referenced by a pointer, you do not need to change it in malloc() calls: you will have to do it only once, at the declaration of the pointer. This will save time and avoid you many errors.
This also works with pointers to pointer as in the following example: $ cat size3.c #include #include int main(void) { double **p = malloc( 2 * sizeof *p ); p[0] = malloc( 3 * sizeof **p); p[1] = malloc( 3 * sizeof **p); return (EXIT_SUCCESS); }
In this example, p is a pointer to memory area holding two pointers to type double (p is a pointer to type double *, p is a pointer to pointer to double), and then *p is a pointer to type double. This implies, p = malloc( 2*sizeof(double *) ) can be replaced by p = malloc(2 * sizeof *p). In the same way, p[0] = malloc(3 * sizeof **p) is equivalent to p[0] = malloc( 3 * sizeof(double) ).
III.3.9 Const and pointers In Chapter II, we introduced the const qualifier that makes a variable read-only. Normally, a const variable should not be modified by an indirect mean. Otherwise, the result would be undefined. The following example modifies the value of a const variable through a pointer (it does not conform to the C standard): $ cat pointer_const1a.c #include #include int main(void) { const int v = 10; int *p = (int *)&v; printf(“v=%d\n”, v); *p = 20; printf(“v=%d\n”, v); return EXIT_SUCCESS; } $ gcc -o pointer_const1a -std=c99 -pedantic pointer_const1a.c $ ./pointer_const1a v=10
v=20 &v is a pointer to const int. Therefore, the statement int *p = (int *)&v makes an explicit cast to int *. We can see though the variable v was qualified as const, it could be altered through the
pointer p. The program shows that the const qualifier may not protect against writes. The program pointer_const1a.c worked in our computer but you should never do something like this: the behavior is classified as undefined by the C standard, which means its result is unpredictable and then not portable. Our program was compiled with no error message because we used an explicit cast. If you remove the explicit cast and write int *p =&v (implicit cast), you will get a warning message: $ cat pointer_const1b.c #include #include int main(void) { const int v = 10; int *p = &v; printf(“v=%d\n”, v); *p = 20; printf(“v=%d\n”, v); return EXIT_SUCCESS; } $ gcc -o pointer_const1b -std=c99 -pedantic pointer_const1b.c pointer_const1b.c: In function ‘main’: pointer_const1b.c:6:12: warning: initialization discards qualifiers from pointer target type
The const qualifier can also be used with a pointer either to make the referenced objet readonly or to make the pointer itself read-only. To make a pointer read only, just place the modifier const after the asterisk *. For example, the declaration int *const p makes the pointer p read-only while const int *p or int const *p means p is a pointer to const int. The following example makes the pointer p read-only. That is, the pointer p cannot be modified: $ cat pointer_const2.c #include #include int main(void) { int * const p = malloc(10 * sizeof(int) );
int v = 10; if ( p == NULL ) { /* memory allocation failed */ printf(“malloc() cannot allocate memory\n”); return (EXIT_FAILURE); } p=&v; printf(“%s\n”, p); free(p); return EXIT_SUCCESS; } $ gcc -o pointer_const2 -std=c99 -pedantic pointer_const2.c pointer_const2.c: In function ‘main’: pointer_const2.c:13:3: error: assignment of read-only variable ‘p
The compilation failed because we attempted to modify the pointer p that was declared as a constant pointer. The following example makes the object pointed to by the pointer q read-only (q points to elements of type const int): $ cat pointer_const3.c #include #include int main(void) { int *p = malloc(2*sizeof(int) ); const int *q = p;/* q points to const int */ if ( p == NULL ) { /* memory allocation failed */ printf(“malloc() cannot allocate memory\n”); return (EXIT_FAILURE); } p[1] = 20; printf(“q[1]=%d\n”, q[1]); p[1] = 40; printf(“q[1]= %d\n”, q[1]);
free(p); return EXIT_SUCCESS; } $ gcc -o pointer_const3 -std=c99 -pedantic pointer_const3.c $ ./pointer_const3 q[1]=20 q[1]=40
It works fine as long as we make modification through the pointer p but if we try to make modifications through the pointer q, we get an error: $ cat pointer_const4.c #include #include int main(void) { int *p = malloc(2*sizeof(int) ); const int *q = p; if ( p == NULL ) { /* memory allocation failed */ printf(“malloc() cannot allocate memory\n”); return EXIT_FAILURE; } q[1] = 20; printf(“q[1]=%d\n”, q[1]); free(p); return EXIT_SUCCESS; } $ gcc -o pointer_const4 -std=c99 -pedantic pointer_const4.c $ ./pointer_const4 pointer_const4.c: In function ‘main’: pointer_const4.c:14:3: error: assignment of read-only location ‘*(q + 4u)’
The example shows that the same object can be modified through the pointer p while it cannot through the pointer q. Generally, the const qualifier is used in function declarations to tell the programmer the function will not modify the object pointed to by the pointer you pass to it. For example,
the declaration int myfunc(char *s2, const char *s1) indicates the string pointed to by s1 will not be modified by the function myfunc().
III.3.10 Arrays and pointers You have guessed that, in C, pointers and arrays are closely connected. The rationale is the compiler translates arrays to pointers except in the following cases: o The array is an operand of the sizeof operator. If the array arr contains n element of type obj_type, sizeof arr evaluates to n * sizeof(obj_type). In contract, if p is a pointer, sizeof p evaluates to size of the pointer whatever is the type it points to. o The identifier appearing on the left side of the assignment operator (=): p = something. This is not allowed for arrays while permitted for pointers. Thus, the identifier of an array appearing in expressions is converted to a pointer to the first element: int arr[10]; int *p; p = arr; /* arr converted to &arr[0] */ p = arr + 1; /* arr converted to &arr[0] and p points to the second element */
Which is equivalent to: int arr[10]; int *p; p = &arr[0]; p = &arr[0] + 1;
An array is also converted to a pointer if it is an argument of a function. In the following example, the array is translated to a pointer to its first element: int arr[10]; strcpy(arr, “copy this”);
The example above is then equivalent to: int arr[10]; strcpy(&arr[0], “copy this”);
and equivalent to: int arr[10]; int *p = arr; strcpy(p, “copy this”);
As already mentioned, an element denoted by s[i] is translated to *(s+i) whether s is a
pointer or an array.
III.4 Strings III.4.1 Definition Now, let us talk about an import concept related to arrays and pointers: strings. A string is a sequence of characters terminated by the null character. What is a null character? In computing, a character is in fact represented by a code fitting in one or more bytes. The null character has the character code 0, denoted by the character literal \0: all its bits are set to the value of 0. Therefore, a string is character string terminated by the null character \0. It is important to note that in C, the length of a string is the number of characters preceding the null character. For example, the string “hello” has a length of five characters. A string literal is a string composed of character literals enclosed within double-quotes (”) such as “C Programming”.
III.4.2 Strings and arrays We have already talked about strings in chapter two. We said a string could be declared as char *. This is true but it can also be declared as an array of characters. The type string is not a basic type but a sequence of char. Let us start with a string as an array of char. When you work with strings, always remember that they terminate with the string terminator, called a null character, denoted by \0. You have two methods to initialize an array of char with char literals: by enclosing character literals between braces or using string literals. The following example initializes the array s with the string “hello”. $ cat string1.c #include #include int main(void) { char msg[6] = {‘h’, ‘e’, ‘l’, ‘l’, ‘o’, ‘\0’ }; printf(“msg=%s\n”, msg); return EXIT_SUCCESS; } $ gcc -o string1 -std=c99 -pedantic string1.c $ ./string1 msg=hello
In the example string1.c, we declared an array of six elements of type int. The array msg is large enough to hold the string “hello”. The following example is not correct because the array msg is too small:
$ cat string2.c #include #include int main(void) { char msg[5] = {‘h’, ‘e’, ‘l’, ‘l’, ‘o’, ‘\0’ }; printf(“msg=%s\n”, msg); return EXIT_SUCCESS; } $ gcc -o string2 -std=c99 -pedantic string2.c string2.c: In function ‘main’: string2.c:5:4: warning: excess elements in array initializer string2.c:5:4: warning: (near initialization for ‘msg’)
The compiler generated the executable but with warnings: the array is too small. The last character is ignored (\0). The code above is same as the following one: $ cat string3.c #include #include int main(void) { char msg[5] = {‘h’, ‘e’, ‘l’, ‘l’, ‘o’}; printf(“msg=%s\n”, msg); return EXIT_SUCCESS; }
The example string3.c is not correct. There is no warning but the code contains a bug: we used the msg array as a string while it is not terminated by the null character. If you run it, you will see strange characters on your screen because the printf() function displays the characters of the array until it meets the null character. Instead of specifying the size of our array, we could let the compiler compute it for us: $ cat string4.c 1 #include 2 #include 3 #include 4 int main(void) { 5 char msg[] = {‘h’, ‘e’, ‘l’, ‘l’, ‘o’, ‘\0’ }; 6 size_t msg_nb_elt = sizeof msg;
7 size_t string_len = strlen(msg); 8 9 printf(“Array msg holds %s\n”, msg); 10 printf(“Size of array msg=%d\n”, msg_nb_elt); 11 printf(“Length of string %s=%d\n”, msg, string_len); 12 13 return EXIT_SUCCESS; 14} $ gcc -o string4 -std=c99 -pedantic string4.c $ ./string4 Array msg holds hello Size of array msg=6 Length of string hello=5
Explanation: o Line 1: we include the header file stdio.h that declares the function printf(). o Line 2: we include the header file string.h that declares the function strlen(). o Line 5: we define msg as an array of char holding six character literals. Its size is evaluated by the compiler since it is fully initialized. o Line 6: we get the number of characters in the msg array. You have noticed we did not write msg_nb_elt = sizeof msg/sizeof(char) but msg_nb_elt = sizeof msg because sizeof(char) is always 1. Thus, the size of an array of char (in bytes) is the number of characters it contains: the size is 6. o Line 7: the strlen() function counts the number of characters (preceding the null character) of the given array. It returns 5.
Figure III‑12 Initialization of an array with a string literal
The C language also lets you initialize an array with a string literal: $ cat string5.c #include #include int main(void) { char msg[6] = “hello”;
printf(“msg=%s\n”, msg); return EXIT_SUCCESS; } $ gcc -o string5 -std=c99 -pedantic string5.c $ ./string5 msg=hello
This method is more convenient but as explained earlier your array must by large enough to contain all the character of the string including the null character. The following example is not correct because the null character cannot be placed in the array (too small): $ cat string6.c #include #include int main(void) { char msg[5] = “hello”; printf(“msg=%s\n”, msg); return EXIT_SUCCESS; }
You can let the compiler compute the size of the array itself: $ cat string7.c #include #include int main(void) { char msg[] = “hello”; printf(“msg=%s\n”, msg); return EXIT_SUCCESS; } $ gcc -o string7 -std=c99 -pedantic string7.c $ ./string7 msg=hello
The statements char msg[] = “hello” and char msg[] = {‘h’, ‘e’, ‘l’, ‘l’, ‘o’, ‘\0’ } are equivalent: they copies the literal characters into the array (see Figure III‑12). The example string7.c is also equivalent to the following:
$ cat string8.c #include #include int main(void) { char msg[6]; msg[0] = ‘h’; msg[1] = ‘e’; msg[2] = ‘l’; msg[3] = ‘l’; msg[4] = ‘o’; msg[5] = ‘\0’; printf(“msg=%s\n”, msg); return EXIT_SUCCESS; } $ gcc -o string8 -std=c99 -pedantic string8.c $ ./string8 msg=hello
In this example, we copied ourselves the character literals to the array.
III.4.3 Strings and pointers If a string is a sequence of characters terminated by the null character, it can be also viewed as a pointer to char. We just need to allocate enough memory to store the characters as shown below: $ cat string9.c #include #include int main(void) { char *msg = malloc(6*sizeof(char)); if ( msg == NULL ) { /* memory allocation failed */ printf(“malloc() cannot allocate memory\n”); return (EXIT_FAILURE); } msg[0] = ‘h’; msg[1] = ‘e’; msg[2] = ‘l’;
msg[3] = ‘l’; msg[4] = ‘o’; msg[5] = ‘\0’; printf(“msg=%s\n”, msg); free(msg); return EXIT_SUCCESS; } $ gcc -o string9 -std=c99 -pedantic string9.c $ ./string9 msg=hello
Since sizeof(char) is always 1 then, the code string9.c could have written as follows: $ cat string10.c #include #include int main(void) { char *msg = malloc(6); if ( msg == NULL ) { /* memory allocation failed */ printf(“malloc() cannot allocate memory\n”); return (EXIT_FAILURE); } msg[0] = ‘h’; msg[1] = ‘e’; msg[2] = ‘l’; msg[3] = ‘l’; msg[4] = ‘o’; msg[5] = ‘\0’; printf(“msg=%s\n”, msg); free(msg); return EXIT_SUCCESS; } $ gcc -o string10 -std=c99 -pedantic string10.c $ ./string10
msg=hello
You have now understood what a pointer is and how to work with them. Do you think the following example is equivalent to the examples string9.c and string10.c? $ cat string11.c #include #include int main(void) { char *msg = “hello”; printf(“msg=%s\n”, msg); return EXIT_SUCCESS; } $ gcc -o string11 -std=c99 -pedantic string11.c $ ./string11 msg=hello
Figure III‑13 Initialization of a pointer with a string literal
We got the same output and yet they are completely different! Why? A pointer is a reference to an object. It is a variable holding an address pointing to an object. Remember that a pointer can be initialized with an address of an existing object or with malloc(). In the example above, we initialized the pointer with a string literal: a string literal is not an address but the C language allows it to ease programming. This means the compiler assigns the address of the string literal to the pointer (see Figure III‑13).
Since the pointer msg was not initialized with malloc(), it must not be freed. Since, it has been initialized with a string constant, the object it references should not be modified either. In other words, you have to avoid doing something like this: $ cat string12.c #include #include int main(void) { char *msg = “hello”; msg[0]= ‘H’; printf(“msg=%s\n”, msg); return EXIT_SUCCESS; } $ gcc -o string12 -std=c99 -pedantic string12.c $ ./string12 Segmentation Fault (core dumped)
In our computer, our program crashed. The behavior depends on the implementation. In C, you must not attempt to modify a literal even if pointers let you think you can do it. Certainly, the C language saves you time by initializing a pointer with a string literal but it is assumed you understand what you can do and not do with it.
III.4.4 Manipulating strings III.4.4.1 Introduction The C language itself does not provide facilities to work with strings: this task is performed by libraries. A library can be viewed as a set of objects and functions performing specific actions provided externally. When you install a compiler in your system, a number of libraries comes bundled with it. However, only the C standard library is actually required. Programmers often create their own libraries. As far as we are concerned, for now, we will just use the C standard library. Later, we will learn how to build libraries and how to use external libraries. The C standard library is actually made of several modules (we will talk about them later in the book): there is a module for manipulating strings, another one for managing errors…For each module, there is a header file declaring the functions and objects that are implemented by the module. In this section, we will work with some functions declared in the header file string.h. III.4.4.2 strcpy()
The C standard function strcpy(), declared in the standard header file string.h, copies the string pointed to by src into the memory block pointed to by the pointer dest, and returns dest: char *strcpy(char *dest, const char *src);
The prototype of the function above is easy to understand: the src pointer points to const char, which indicates the programmer that the string pointed to by the pointer src will not [28] be altered by the function. You can pass safely pointers or arrays to the function. The following example copies the characters in the array s1 into the array s2: $ cat strcpy1.c #include #include #include int main(void) { char s1[100] = “hello”; char s2[8]; strcpy(s2, s1); printf(“s1 holds %s and s2 holds %s\n”, s1, s2); printf(“size of s1=%d, size of s2=%d\n”, sizeof s1, sizeof s2); printf(“Length of string held s1=%d, length of string held s2=%d\n”, strlen(s1), strlen(s2)); return EXIT_SUCCESS; } $ gcc -o strcpy1 -std=c99 -pedantic strcpy1.c $ ./strcpy1 s1 holds hello and s2 holds hello size of s1=100, size of s2=8 Length of string held s1=5, length of string held s2=5
The example declared two arrays of char. Both were large enough to hold the string “hello”. At least a size of six bytes was required (do not forget the null character). As you can see, the strcpy() function copied the contents of the array s1 into the array s2. Of course, you could also work with pointers in place of arrays as shown below: $ cat strcpy2.c #include #include #include
int main(void) { char s1[100] = “hello”; char *s2 = malloc(8); if ( s2 == NULL ) { /* memory allocation failed */ printf(“malloc() cannot allocate memory\n”); return (EXIT_FAILURE); } strcpy(s2, s1); printf(“s1 holds %s and s2 holds %s\n”, s1, s2); printf(“size of s1=%d, size of s2=%d\n”, sizeof s1, sizeof s2); printf(“Length of string held s1=%d, length of string held s2=%d\n”, strlen(s1), strlen(s2)); free(s2); return EXIT_SUCCESS; } $ gcc -o strcpy2 -std=c99 -pedantic strcpy2.c $ ./strcpy2 s1 holds hello and s2 holds hello size of s1=100, size of s2=4 Length of string held s1=5, length of string held s2=5
We got the same output with the exception of size of s2. As we fully explained in the previous sections, the size of s2 is the size of a pointer. What happens if the target array is not large enough? $ cat strcpy3.c #include #include #include int main(void) { char s1[100] = “hello”; char s2[2]; strcpy(s2, s1); printf(“s1 holds %s and s2 holds %s\n”, s1, s2);
return EXIT_SUCCESS; } $ gcc -o strcpy3 -std=c99 -pedantic strcpy3.c $ ./strcpy3 s1 holds llo and s2 holds hello
The example strcpy3.c showed that whether the target array was too small to hold a string was not a problem for the strcpy() function, it performed the copy anyway. No boundary check is done by the function. The rationale is you can pass an array or a pointer. Therefore, the function cannot guess the size of memory area that is pointed to. This means, if you pass an array (or a pointer) that is not large enough, the function strcpy() will incorrectly modify memory blocks that it should not access. There is an undetermined behavior each time illegal memory addresses are modified. In our example, you can notice that s1 array was corrupted by the strcpy() function: it held the string llo.
Before passing an array to the strcpy() function, check the target array is large enough for the copy.
The strcpy() function is supposed to deal with strings. So, do not provide a source array that contains something else. Therefore, the source array has to contain the null character. Otherwise, the strcpy() function will read and copy all the characters it finds until it meets a null character. The following example contains an error causing an undetermined behavior: $ cat strcpy4.c #include #include #include int main(void) { char s1[100]; char s2[8]; strcpy(s1, “hello”); s1[5] = ‘!’; strcpy(s2, s1); printf(“s1 holds %s and s2 holds %s\n”, s1, s2); return EXIT_SUCCESS;
}
Have you guessed where the error is located? Yes, the statement s1[5]=’!’ replaces the null character with the exclamation mark. The program was compiled with no error, yet it contains a bug. Here is another error that you must avoid: giving two overlapping pointers: $ cat strcpy5.c #include #include #include int main(void) { char s1[100] = “hello”; strcpy(s1+1, s1); printf(“s1 holds %s\n”, s1); return EXIT_SUCCESS; } $ gcc -o strcpy5 -std=c99 -pedantic strcpy5.c $ ./strcpy5 s1 holds hhelll
The target and source pointers should not overlap. That is why, C99 specifies a new qualifier known restrict. As of C99, the prototype of strcpy() has been updated: char *strcpy(char *restrict dest, const char *restrict src);
The function prototype is valid only as of the C99 standard. Compilers that do not implement the C99 standard cannot use it and use the previous function prototype. What does the keyword restrict mean? The C99 standard introduced it to qualify a pointer only. It means that the passed pointer is the only pointer that has access to the memory area it points to: there is no other pointer that will attempt to access it. A declaration with the restrict qualifier warns programmers: if the requirement is not met, the function may not work properly. The compiler does not check if the requirement is met, it is the responsibility of the programmer to ensure it before using the function. For efficiency reasons, some functions require that the passed pointers have an exclusive
access to the memory blocks they point to. Of course, it is possible to implement a function that does the same job as strcpy() without such a requirement. However, such a function would be less efficient. We will explain how to implement it in Chapter VII. III.4.4.3 strncpy() Another interesting function that copies strings is strncpy(). It does the same job as strcpy() except it copies at most n characters. Until C95: char *strncpy(char *dest, const char *src, size_t n)
As of C99: char *strncpy(char *restrict dest, const char *restrict src, size_t n);
If the source string pointed to by src has a length less than n, it copies the whole string including the null character to the memory block pointed to by dest. Characters following the null character are not copied. Moreover, extra null characters are appended to the target string until the total number of characters written reaches the value n. If the source string has a length greater than n, the memory area pointed to by dest is not terminated by the null character. The following example copies the string “hello world” entirely because the null character has been met before writing at most 19 characters. $ cat strcpy6.c #include #include #include int main(void) { char s1[100] = “hello world”; char s2[20]; size_t n = 19; /* number of character to copy */ strncpy(s2, s1, n); printf(“s1 holds %s and s2 holds %s\n”, s1, s2); return EXIT_SUCCESS; } $ gcc -o strcpy6 -std=c99 -pedantic strcpy6.c $ ./strcpy6
s1 holds hello world and s2 holds hello world
The following example copies a part of the string “hello world”: five characters. It seems to be correct, yet it contains an error. Find it: $ cat strcpy7.c #include #include #include int main(void) { char s1[100] = “hello world”; char s2[20]; size_t n = 5; /* number of character to copy */ strncpy(s2, s1, n); printf(“s1 holds %s and s2 holds %s\n”, s1, s2); return EXIT_SUCCESS; }
Its behavior is undetermined because the array s2 had not the null character. We have to copy it. So, the previous example should rewritten like this: $ cat strcpy8.c #include #include #include int main(void) { char s1[100] = “hello world”; char s2[20]; size_t n = 5; /* number of character to copy */ strncpy(s2, s1, n); s2[n] = ‘\0’; printf(“s1 holds %s and s2 holds %s\n”, s1, s2); return EXIT_SUCCESS; } $ gcc -o strcpy8 -std=c99 -pedantic strcpy8.c $ ./strcpy8
s1 holds hello world and s2 holds hello
What we said about strcpy() holds true for strncpy(): o Ensure your character strings are terminated with the null character o Do not use overlapping pointers o The target array must be large enough to store the characters that will be copied III.4.4.4 strcat() and strncat() The function strcat() and strncat() concatenate two strings. For example, let us assume we have an array storing the string “some” and another one storing the string “thing”, we can concatenate them to get the string “something”. Let us start with strcat(): Until C95: char *strcat(char *dest, const char *src);
As of C99: char *strcat(char *restrict dest, const char *restrict src);
It copies the string (including the null character) pointed to by src to the end of the string pointed to by dest, overwriting the null character of the string pointed to by dest. The resulting concatenated string (terminated with the null character) will be stored in the memory block pointed to by dest. The contents of src are left untouched. Of course, the memory block pointed to by dest must be large enough to hold the concatenated string. The following example concatenates the string held the array s1 to the string held in the array s2: $ cat strcat1.c #include #include #include int main(void) { char s1[100] = “some”; char s2[20] = “thing good”; strcat(s1, s2); printf(“s1: %s and s2: %s\n”, s1, s2 ); return EXIT_SUCCESS; } $ gcc -o strcat1 -std=c99 -pedantic strcat1.c
$ ./strcat1 s1: something good and s2: thing good
The strncat() has a prototype that looks like this: char *strncat(char *dest, const char *src, size_t n);
The function strncat() also concatenates two strings. It copies n characters of the string pointed to by src to the end of the string pointed to by dest, overriding the null character of the string pointed to by src. If n is greater than length of the string pointed to by src, all the characters of the string are copied. The resulting concatenated string will be terminated with the null string (unlike strncpy()), and stored in the memory block pointed to by dest. The contents of src are left untouched: The following example concatenates the string held by the array s1 to the string held in the array s2: $ cat strcat2.c #include #include #include int main(void) { char s1[100] = “some”; char s2[20] = “thing good”; strncat(s1, s2, 5); printf(“s1: %s and s2: %s\n”, s1, s2 ); return EXIT_SUCCESS; } $ gcc -o strcat1 -std=c99 -pedantic strcat1.c $ ./strcat1 s1: something and s2: thing
What we said about strcpy() and strncpy() holds true for strcat() and strncat(). To avoid an undetermined behavior of your programs: o Ensure the character strings pointed to by src and dest are terminated with the null character
o Do not use pointers that overlap o The target array must be large enough to store the characters that will be copied As of C99, the prototype of strcat() and strncat() have the following prototype: char *strcat(char *restrict dest, const char *restrict src); char *strncat(char *restrict dest, const char *restrict src, size_t n);
The restrict qualifier does not change the behavior of the functions. III.4.4.5 strcmp() and strncmp() In the C language, the operator that compares two objects and tells if they are equal is denoted by two equals signs ==. Do not confuse it with the assignment operator that is represented by one equals sign =. The expression x == y returns 1 (true) if x equals y, and 0 (false) otherwise. This will be detailed in the next chapter, we give, here, a little overview so that you could understand why the function strcmp() should be invoked to compare strings. The following example compares two variables x and y: $ cat strcmp1.c #include #include #include int main(void) { int x ; int y ; int z ; x = 10 ; y = 20 ; z = x == y ; printf(“x=%d, y=%d. z=%d\n”, x, y, z ); /* x and y are not equal => Returns 0 */ x = 10 ; y = 10 ; z = x == y ; printf(“x=%d, y=%d. z=%d\n”, x, y, z ); /* x and y are equal => Returns 1 */ return EXIT_SUCCESS; } $ gcc -o strcmp1 -std=c99 -pedantic strcmp1.c $ ./strcmp1 x=10, y=20. z=0 x=10, y=10. z=1
The expression z = x == y seems to be quite strange but it is valid. The == operator takes precedence over the assignment operator =: it is evaluated first. In the example above, if x holds the value 10 and y holds the value 20, the expression x == y evaluates to the value of 0 that is then assigned to the variable z. Let us now compare two strings: $ cat strcmp2.c #include #include #include int main(void) { char s1[] = “hello” ; char s2[] = “hello”; int z ; z = s1 == s2 ; printf(“s1=%s, s2=%s. z=%d\n”, s1, s2, z ); return EXIT_SUCCESS; } $ gcc -o strcmp2 -std=c99 -pedantic strcmp2.c $ ./strcmp2 s1=hello, s2=hello. z=0
The arrays s1 and s2 contains the same string, yet they are evaluated to be different. If you remember what we said earlier, an array name appearing without the array symbol [] is converted to the address to its first element (i.e. a pointer to its first element). This implies the statement s1 == s2 compares two addresses, which are, of course different. We would have the same problem with pointers: $ cat strcmp3.c #include #include #include int main(void) { char *s1 = malloc(6) ; char s2[] = “hello”; int z ; if ( s1 == NULL ) { /* memory allocation failed */
printf(“malloc() cannot allocate memory\n”); return (EXIT_FAILURE); } strcpy(s1, s2); z = s1 == s2 ; printf(“s1=%s, s2=%s. z=%d\n”, s1, s2, z ); free(s1); return EXIT_SUCCESS; } $ gcc -o strcmp3 -std=c99 -pedantic strcmp3.c $ ./strcmp3 s1=hello, s2=hello. z=0
The functions strcmp() and strncmp() compares the strings pointed to by the pointers s1 and s2 and returns 0 if they hold the same characters. Here is the prototype of strcmp(): int strcmp(const char *s1, const char *s2);
It is very important to remember the strcmp() returns the value of 0 if the strings pointed to by the passed pointers contain the same characters. Consider the function strcmp() as a comparison function, it should not be viewed as an equal-to operator for strings. The function reads the first character of s2 (let c1s2 be this character) and the first character of s1 (let c1s1 be this character): if c1s2 is greater than c1s1, it returns a positive integer, if c1s2 is less than c1s1, it returns a negative integer. Otherwise, it continues the comparison of strings according to the same process (if the second character c2s2 is greater than c2s1, it returns a positive integer…). If the strings contain the same characters, the value of 0 is returned. Now, we can correct our example strcmp2.c as follows: $ cat strcmp4.c #include #include #include int main(void) { char s1[] = “hello”; char s2[] = “hello”; int z ; z = strcmp(s1, s2); printf(“s1=%s, s2=%s. z=%d\n”, s1, s2, z );
return EXIT_SUCCESS; } $ gcc -o strcmp4 -std=c99 -pedantic strcmp4.c $ ./strcmp4 s1=hello, s2=hello. z=0
In the following example, the strcmp() function returns a negative integer because the character ‘h’ is less than the character ‘H’. $ cat strcmp5.c #include #include #include int main(void) { char s1[] = “Hello”; char s2[] = “hello”; int z ; z = strcmp(s1, s2); printf(“h=%d, H=%d\n”, ‘H’, ‘h’ ); printf(“s1=%s, s2=%s. z=%d\n”, s1, s2, z ); return EXIT_SUCCESS; } $ gcc -o strcmp5 -std=c99 -pedantic strcmp5.c $ ./strcmp5 h=72, H=104 s1=Hello, s2=hello. z=-32
Generally, the function used to determine if two strings are equal. The strncmp() does the same job as strcmp() except it compares at most n characters: int strncmp(const char *s1, const char *s2, size_t n);
For example: $ cat strcmp6.c #include #include #include
int main(void) { char s1[] = “hello!”; char s2[] = “hello”; int z1,z2 ; z1 = strcmp(s1, s2); z2 = strncmp(s1, s2, 5); printf(“s1=%s, s2=%s. z1=%d and z2=%d\n”, s1, s2, z1, z2 ); return EXIT_SUCCESS; } $ gcc -o strcmp6 -std=c99 -pedantic strcmp6.c $ ./strcmp6 s1=hello!, s2=hello. z1=33 and z2=0
In our example strcmp.c, the strcmp() function compares all the characters preceding the null character while strncmp() compares only the first five characters. III.4.4.6 atoi() The atoi() function converts a string s to the integer number it contains: int atoi(const char *s);
For example: $ cat atoi1.c #include #include int main(void) { printf(“atoi(\“10\”)=%d\n”, atoi(“10”) ); printf(“atoi(\“V10\”)=%d\n”, atoi(“V10”) ); printf(“atoi(\“10.7\”)=%d\n”, atoi(“10.7”) ); return EXIT_SUCCESS; } $ gcc -o atoi1 -std=c99 -pedantic atoi1.c $ ./atoi1 atoi(“10”)=10 atoi(“V10”)=0 atoi(“10.7”)=10
In the example, we used the escape character \ preceding the double quotation marks “ to prevent the compiler from interpreting it, which allowed us to print it. We can notice two things: o If the argument of the atoi() function contains a non-numeric character, it returns 0 o If the argument of the atoi() function contains a floating-point value with a fractional part, only the integral part is returned. III.4.4.7 atof() The atof() function converts a string s to the floating-point number it contains: double atof(const char *s);
For example: $ cat atof1.c #include #include int main(void) { printf(“atof(\“10\”)=%f\n”, atof(“10”) ); printf(“atof(\“V10\”)=%f\n”, atof(“V10”) ); printf(“atof(\“10.7\”)=%f\n”, atof(“10.7”) ); return EXIT_SUCCESS; } $ gcc -o atof1 -std=c99 -pedantic atof1.c $ ./atof1 atof(“10”)=10.000000 atof(“V10”)=0.000000 atof(“10.7”)=10.700000
The example shows that if the argument of the atof() function contains a non-numeric character, it returns 0.
III.5 Arrays are not pointers One question arises: is a string an array or a pointer? Both can be used indifferently. A pointer is an object holding the address of an object while an array is an object holding other objects (see Figure III‑14).
Figure III‑14 Representation of an array and a pointer
Figure III‑14 represents an array and a pointer. An array is an object holding objects whose size is the sum of the size of its item. A pointer just points to the beginning of a memory area it references. That is, from the pointer’s perspective, the number of elements contained in the referenced memory area cannot be guessed unlike an array. In other way to say it, an array can be viewed as a set of objects grouped into the same box holding a name. From the perspective of a pointer, a memory area allocated by malloc() is a set of independent contiguous objects, the first element of which is referenced and actually known by the pointer.
The following example shows that the array a_msg and the pointer p_msg can be used in the same way: $ cat array_vs_pointer1.c #include #include #include int main(void) { char a_msg[3]; char *p_msg = malloc(3); if ( p_msg == NULL ) { /* memory allocation failed */ printf(“malloc() cannot allocate memory\n”); return (EXIT_FAILURE); } p_msg[0] = a_msg[0] = ‘O’; p_msg[1] = a_msg[1] = ‘K’; p_msg[2] = a_msg[2] = ‘\0’; size_t a_string_len = strlen(a_msg); size_t p_string_len = strlen(p_msg); printf(“Array a_msg holds %s and pointer p_msg holds %s\n”, a_msg, p_msg); printf(“Length of string in a_msg %s=%d\n”, a_msg, a_string_len); printf(“Length of string in p_msg %s=%d\n”, p_msg, p_string_len); free(p_msg); return EXIT_SUCCESS; } $ gcc -o array_vs_pointer1 -std=c99 -pedantic array_vs_pointer1.c $ ./array_vs_pointer1 Array a_msg holds OK and pointer p_msg holds OK Length of string in a_msg OK=2 Length of string in p_msg OK=2
We can see the only difference between the array a_msg and the pointer p_msg is their declaration: a_msg was declared as an array of three elements of type char and p_msg was declared as a pointer to char pointing to a memory area (allocated by malloc()) that can hold three elements. Therefore, you can store your strings into arrays or pointers. If you work
with pointers, do not forget to allocate memory and then free it… However, their behavior is completely different if you use a string literal to initialize them. Assigning a string literal to an array triggers a copy of the character literals composing the string literal to the array. Assigning a string literal to a pointer just copies the address of the string to the pointer. Why such a different behavior? Because when you declare an array, a memory space is reserved for it: int a[5] allocates a chunk of memory that can hold five elements of type int. When you declare a pointer, only a memory space for storing an address is reserved not for the object itself: for example, the statement int *p allocates a piece of memory called p that can hold an address only. This point is very important to understand. When you write something like this: int v =10; int *p =&v,
A piece of memory is reserved to store the address of the object v into the pointer p; the object v has been created before by the statement int v = 10. When you write char *p_msg = malloc(3), a memory block, whose size is three bytes, is allocated and its address is stored in p_msg. That is, the statement allocates two pieces of memory: one for holding the address of the object and one holding the object itself (of three bytes). Now you can guess an array is not a pointer. An array is a named memory area. A pointer is a reference to a memory area that can exist or not; if it does not exit, it points to nothing that can be used. Let us examine through examples the difference between an array and a pointer. o Difference one: an array cannot be altered $ cat array_vs_pointer2.c 1 #include 2 #include 3 #include 4 5 int main(void) { 6 char a_msg[] = “hello”; 7 char *p_msg = “hello”; 8 9 printf(“a_msg=%s and p_msg=%s\n”, a_msg, p_msg); 10 11 p_msg = “OK”; 12 a_msg = “OK”; 13 printf(“a_msg=%s and p_msg=%s\n”, a_msg, p_msg); 14 return EXIT_SUCCESS;
15 } $ gcc -o array_vs_pointer2 -std=c99 -pedantic array__vs_pointer2.c array_vs_pointer2.c: In function ‘main’: array_vs_pointer1.c:12:10: error: incompatible types when assigning to type ‘char[6]’ from type ‘char *
Explanation: ▪ Line 6-7: we initialize both the array and the pointer to the string literal “hello”. ▪ Line 9: we display the contents of the array and the string pointed to by the pointer ▪ Line 11: we set the array to a new string ▪ Line 12: we set the pointer to a new string This code failed at compilation time at line 12! The reason is we cannot modify an array but only its contents. An array is not a reference to a memory block, it is a named memory block. Line 11 passed successfully the compilation: a pointer can be modified. An array is not a pointer. o Difference two: pointers and arrays are different sizes: $ cat array_vs_pointer3.c #include #include int main(void) { char a_msg[100]; char *p_msg = malloc(100); if ( p_msg == NULL ) { /* memory allocation failed */ printf(“malloc() cannot allocate memory\n”); return EXIT_FAILURE; } printf(“sizeof a_msg=%d and sizeof p_msg=%d\n”, sizeof a_msg, sizeof p_msg); free(p_msg); return EXIT_SUCCESS; } $ gcc -o array_vs_pointer3 -std=c99 -pedantic array_vs_pointer3.c $ ./array_vs_pointer3 sizeof a_msg=100 and sizeof p_msg=4
In our example, our array is 100 bytes (100 elements of type char) and our pointer is 4
bytes. The returned size of the array comprises all elements of the array. Now, let us list their similarities: o Case one: both can use the operator [] to access elements $ cat array_vs_pointer4.c #include #include int main(void) { char *p=“hello”; char a[]=“hello”; printf(“Second char in array=%c\n”, a[1]); printf(“Second char in string pointed to by pointer=%c\n”, p[1]); return EXIT_SUCCESS; } $ gcc -o array_vs_pointer4 -std=c99 -pedantic array_vs_pointer4.c $ ./array_vs_pointer4 Second char in array=e Second char in string pointed to by pointer=e
The compiler converts the array notation X[i] to the pointer notation X+i. o Case two: both can use the dereference operator * to access elements $ cat array_vs_pointer5.c #include #include int main(void) { char *p=“hello”; char a[]=“hello”; printf(“Fifth char in array=%c\n”, *(a+4)); printf(“Fifth char in string pointed to by pointer=%c\n”, *(p+4)); return EXIT_SUCCESS; } $ gcc -o array_vs_pointer5 -std=c99 -pedantic array_vs_pointer5.c $ ./array_vs_pointer5
Fifth char in array=o Fifth char in string pointed to by pointer=o
o Case three: the address of the first element is also the address of the memory area holding the elements $ cat array_vs_pointer6.c #include #include int main(void) { char *p=“hello”; char a[]=“hello”; printf(“ARRAY: addr a=%p, addr first element=%p\n”, a, &a[0]); printf(“POINTER: addr p=%p, addr first element=%p\n”, p, &p[0]); return EXIT_SUCCESS; } $ gcc -o array_vs_pointer6 -std=c99 -pedantic array_vs_pointer6.c $ ./array_vs_pointer6 ARRAY: addr a=feffea66, &a=feffea66, addr first element=feffea66 POINTER: addr p=8050d8c, addr first element=8050d8c
The C compiler converts the array name to its address in expressions. The following example shows it clearly: $ cat array_vs_pointer7.c #include #include int main(void) { char a[]=“hello”; printf(“a=%p, and &a=%p\n”, a, &a); return EXIT_SUCCESS; } $ gcc -o array_vs_pointer7 -std=c99 -pedantic array_vs_pointer7.c $ ./array_vs_pointer7 a=feffea6a, and &a=feffea6a
A pointer can simulate an array, but the reverse is not true. You can then assign an array to a pointer and work with it as you would do with the array itself. Thus, the pointer can modify the contents of the array as shown below: $ cat array_vs_pointer8.c #include #include int main(void) { char msg[]=“hello”; char *p = msg; p[0] = ‘W’; p[1] = ‘O’; p[2] = ‘R’; p[3] = ‘L’; p[4] = ‘D’; printf(“msg=%s\n”, msg); return EXIT_SUCCESS; } $ gcc -o array_vs_pointer8 -std=c99 -pedantic array_vs_pointer8.c $ ./array_vs_pointer8 msg=WORLD
The statement char *p = msg assigns the address of the array msg to the pointer p. Of course, the assignment is allowed because the array msg contains elements of type char. However, be aware that the statement p = msg does not mean that the pointer p and the array msg are the same: p contains a reference to the array msg but is not an array. If you use the array msg, you access directly the memory block that holds the characters but if you use the pointer, you do not access it directly: the computer first accesses the address in the pointer and then the referenced memory block holding the characters. That means, internally, it is faster to access data through an array than a pointer. Often, programmers use the pointer p as if it was an array and conversely. That is fine if you keep in mind the differences. Here is another example: $ cat array_vs_pointer9.c #include #include #include int main(void) { char msg[] = “hello”; /* containes 6 characters including \0 */
char *p = “hello”; /* containes 6 characters including \0 */ int len_msg = strlen( msg ); int len_p = strlen( p ); printf(“Array msg. Nb of char preceding the null character=%d\n”, len_msg); printf(“Pointer p. Nb of char preceding the null character=%d\n”, len_p); printf(“Array msg. sizeof msg=%d\n”, sizeof msg); printf(“Pointer. sizeof p=%d\n”, sizeof p); return EXIT_SUCCESS; } $ gcc -o array_vs_pointer9 -std=c99 -pedantic array_vs_pointer9.c $ . array_vs_pointer9 Array msg. Nb of char preceding the null character=5 Pointer p. Nb of char preceding the null character=5 Array msg. sizeof msg=6 Pointer. sizeof p=4
We can notice that since sizeof(char) always returns 1, sizeof s returns the number of character in the array. So, from now, never consider an array is a pointer though they have a similar behavior in some cases.
III.6 malloc(), realloc() and calloc() As previously said, the malloc() function does not initialize the allocated memory block as shown below: $ cat malloc1.c #include #include #include int main(void) { int nb_elt = 3; int *p = malloc( nb_elt * sizeof(int) ); if ( p == NULL ) { /* memory allocation failed */ printf(“malloc() cannot allocate memory\n”); return (EXIT_FAILURE); }
printf(“p[0]=%d, p[1]=%d, p[2]=%d\n”, p[0], p[1], p[2]); free(p); return EXIT_SUCCESS; } $ gcc -o malloc1 -std=c99 -pedantic malloc1.c $ ./malloc1 p[0]=134615120, p[1]=0, p[2]=0
The objects in the memory space pointed to by p had undefined values: on your computer, you may have different values than our example. Instead of setting each element to the value of 0, you can invoke the calloc() function that performs exactly the same job as malloc() and initializes each object of the allocated memory with the value of 0 as in the following example: $ cat calloc1.c #include #include #include int main(void) { int nb_elt = 3; int *p = calloc( nb_elt, sizeof(int) ); if ( p == NULL ) { /* memory allocation failed */ printf(“calloc() cannot allocate memory\n”); return (EXIT_FAILURE); } printf(“p[0]=%d, p[1]=%d, p[2]=%d\n”, p[0], p[1], p[2]); free(p); return EXIT_SUCCESS; } $ gcc -o calloc1 -std=c99 -pedantic calloc1.c $ ./calloc1 p[0]=0, p[1]=0, p[2]=0
The prototype of the function calloc() is given below: void *calloc(size_t nb_elt, size_t obj_size);
Where nb_elt is the number of items whose size is obj_size. The calloc() function allocates a memory space having the size nb_elt*obj_size, sets each element to the value of 0, and returns a pointer to the allocated memory area. If the function cannot allocate memory, a null
pointer is retuned. Assume we allocated for our pointer p ten bytes with malloc() or calloc() and then we wished to grow it so that it could store more objects. How could we have done? The malloc() function cannot help us as it is because if we call it again, it just allocates a new bigger piece of memory and we will lose our data. So, we could call the malloc() function to allocate a bigger memory space, then copy our data into it, and free the original memory space. This is a good idea but it is time consuming: the best solution is to invoke realloc(). The realloc() function allocates a bigger memory area and copies data if required: if it can just enlarge the existing memory area, it keeps the original pointer, but if it cannot do it, it creates a new one, copies the objects from the old memory space into the new one, and releases the old memory space. The function returns a pointer to the new memory area. Generally, the realloc() function is used to reallocate more space in order to store additional objects but it can also be used to release memory by requesting a smaller memory space. Even in this case, it works in the same way: it returns a pointer to a memory block having the requested size, and frees the old memory space. If realloc() cannot allocate a memory space having the requested size, it returns a null pointer, leaving the original pointer untouched. The prototype of the function looks like this: void *realloc(void *p_orig, size_t s);
If the pointer p_orig is a null pointer, the function is equivalent to malloc(). That is, if s is a size in bytes, realloc(NULL, s) and malloc(s) have the same behavior. If the function cannot allocate memory, it returns a null pointer, leaving the memory area pointed to by p_orig unchanged. Otherwise, it allocates a memory space having the size s, copies data pointed to by p_orig into it if needed, releases the memory space pointed to by the pointer p_orig, and returns a pointer to the new memory block. Of course, the passed pointer p_orig must have been previously allocated by malloc(), calloc() or realloc(). The following example is not correct (find out the reason), it is supposed to grow the pointer p by adding ten elements of type int: $ cat realloc1.c #include #include #include int main(void) { int nb_elt = 2; int nb_elt_new = 12;
int *p = calloc( nb_elt, sizeof(int) ); if ( p == NULL ) { /* memory allocation failed */ printf(“calloc() cannot allocate memory\n”); return (EXIT_FAILURE); } p[0] = 10; p[1] = 20; printf(“p[0]=%d, p[1]=%d\n”, p[0], p[1]); p = realloc( p, nb_elt_new * sizeof(int) ); p[2] = 30; p[3] = 40; printf(“\nAfter realloc():\n”); printf(“p[0]=%d, p[1]=%d\n”,p[0], p[1]); printf(“p[2]=%d, p[3]=%d \n”,p[2], p[3]); free(p); return EXIT_SUCCESS; } $ gcc -o realloc1 -std=c99 -pedantic realloc1.c $ ./realloc1 p[0]=10, p[1]=20 After realloc(): p[0]=10, p[1]=20 p[2]=30, p[3]=40
The example realloc1.c shows how to call the realloc() function but contains a programming error. The example works as long as the realloc() function can allocate memory: what happens if realloc() cannot allocate a bigger memory block? In this case, the realloc() function returns a null pointer assigned to the pointer p and does not release the initial memory block. This means the initial memory block remains but and no more accessible while the p pointer takes a null pointer… Here is a better version of the previous example: $ cat realloc2.c #include #include
#include int main(void) { int nb_elt = 2; int nb_elt_new = 12; int *p = calloc( nb_elt, sizeof(int) ); /* initial allocation*/ int *new_p; if ( p == NULL ) { /* memory allocation failed */ printf(“calloc() cannot allocate memory\n”); return (EXIT_FAILURE); } p[0] = 10; p[1] = 20; printf(“Original address=%p\n”, p); printf(“p[0]=%d, p[1]=%d\n”, p[0], p[1]); /* grow the original allocated memory block pointed to by p */ new_p = realloc( p, nb_elt_new * sizeof(int) ); if ( new_p == NULL ) { /* memory allocation failed We cannot grow our dynamic array */ printf(“realloc() cannot allocate memory\n”); printf(“However the pointer p is still valid and contains:\n”); printf(“p[0]=%d, p[1]=%d\n”, p[0], p[1]); free(p); return (EXIT_FAILURE); } else { /* Memory successfully allocated. The dynamic array has been grown The new memory area is pointed to by new_p. The pointer p is no longer valid. */ /* since new_p is valid, we can make assignement. Pointer new_p becomes useless */ p = new_p; }
p[2] = 30; p[3] = 40; printf(“\nAfter realloc():\n”); printf(“new address=%p\n”, p); printf(“p[0]=%d, p[1]=%d\n”, p[0], p[1]); printf(“p[2]=%d, p[3]=%d \n”, p[2], p[3]); free(p); return (EXIT_SUCCESS); } $ gcc -o realloc2 -std=c99 -pedantic realloc2.c $ ./realloc2 Original address=8061268 p[0]=10, p[1]=20 After realloc(): new address=8061C68 p[0]=10, p[1]=20 p[2]=30, p[3]=40
In this code, even if the realloc() function returns a null pointer (statement if ( new_p == NULL )), we will not lose the reference to the original memory block pointed to by p. Conversely, if realloc() returns a valid pointer (else statement), the pointers new_p and p will point to it. This ensures us that our pointers are always valid and then can be used. The following example shrinks the original allocated memory area: $ cat realloc3.c #include #include #include int main(void) { int nb_elt = 12; int nb_elt_new = 2; int *p = calloc( nb_elt, sizeof(int) ); /* initial allocation*/ int *new_p; if ( p == NULL ) { /* memory allocation failed */
printf(“calloc() cannot allocate memory\n”); return (EXIT_FAILURE); } p[0] = 10; p[1] = 20; p[2] = 30; p[3] = 40; printf(“Original address=%p\n”, p); printf(“p[0]=%d, p[1]=%d p[2]=%d\n”, p[0], p[1], p[2]); new_p = realloc( p, nb_elt_new * sizeof(int) ); /* shrink to 2 elements */ if ( new_p == NULL ) { /* memory allocation failed We cannot shrink our dynamic array */ printf(“realloc() cannot allocate memory\n”); printf(“However the pointer p is still valid and contains:\n”); printf(“p[0]=%d, p[1]=%d p[2]=%d\n”, p[0], p[1], p[2]); free(p); return (EXIT_FAILURE); } else { /* Memory successfully allocated */ /* Memory area has been shrinked. It can hold now only nb_elt_new element */ /* since new_p is valid, the pointer p is no longer valid After assignment, p can now point to the new allocated memory area */ p = new_p; } printf(“\nAfter realloc()\n”); printf(“New address=%p\n”, p); printf(“p[0]=%d, p[1]=%d\n”,p[0], p[1]); free(p); return (EXIT_SUCCESS); } $ gcc -o realloc3 -std=c99 -pedantic realloc3.c
$ ./realloc3 Original address=8061268 p[0]=10, p[1]=20 p[2]=30 After realloc() New address=8061338 p[0]=10, p[1]=20
In the example above, we can see, the realloc() function did not keep the original memory block, it allocated a new one, copied the piece of memory of size nb_elt_new * sizeof(int) into it, and freed the old memory block. This implies, the pointer p became invalid after the invocation of realloc().
III.7 Emulating multidimensional arrays with pointers We talked earlier about arrays of arrays but we did not explain how to emulate them with pointers: o A simple array holding elements of type obj_type is declared as obj_type arr[n]. A onedimensional dynamic-length array can be implemented by a pointer declared as obj_type *p. o A two-dimensional array holding elements of type obj_type is declared as obj_type arr[n][p]. A two-dimensional dynamic-length array can be implemented by a pointer declared as obj_type **p. o A three-dimensional array holding elements of type obj_type is declared as obj_type arr[n][p] [q]. A three-dimensional dynamic-length array can be implemented by a pointer declared as obj_type ***p. o And so on.
Figure III‑15 Pointer to pointer to int: int **p
The following example shows how to work with a pointer to pointer emulating a dynamic two-dimensional array (see Figure III‑15): $ cat pointer2pointers1.c #include #include #include
int main(void) { /* - p is a pointer to pointer to int: p references an object of type *int - *p is a pointer to int: it has type * int - **p has type int */ int **p = calloc( 2, sizeof *p ); /* p[i] is a pointer to 3 elements of type int */ p[0] = calloc( 3, sizeof **p ); p[1] = calloc( 3, sizeof **p ); p[0][0] = 1; p[0][1] = 2; p[0][2] = 3; p[1][0] = 11; p[1][1] = 12; p[1][2] = 13; printf(“p=%p p[0]=%p p[1]=%p\n”, p, p[0], p[1]); free(p[0]); free(p[1]); free(p); return (EXIT_SUCCESS); } $ gcc -o pointer2pointers1 -std=c99 -pedantic pointer2pointers1.c $ ./pointer2pointers1 p=8061088 p[0]=8061490 p[1]=80614a8
You can do the same with an array: $ cat pointer2pointers2.c #include #include #include int main(void) { int p[2][3]; p[0][0] = 1; p[0][1] = 2; p[0][2] = 3; p[1][0] = 11; p[1][1] = 12; p[1][2] = 13; printf(“p=%p p[0]=%p p[1]=%p\n”, p, p[0], p[1]); return (EXIT_SUCCESS); }
Here are some interesting comments on the example pointer2pointers1.c. The first one is about the invocation of calloc() (or malloc()): o The statement int **p = calloc(2, sizeof(int *)) can also be written int **p = calloc(2, sizeof *p)30. The compiler will automatically translates sizeof *p to sizeof (int *). Do not be confused by the notations: the statement means we allocate memory that will be able to hold two pointers to int. Once allotted, the pointer p will point to the first object of the memory area (a pointer to int). That is, p is a pointer to type int *: p[0] denotes the first element and p[1] the second element. Both p[0] and p[1] point to type int. Since, p[0] and p[1] are also pointers, we have to allocate memory for them as well. o The statements calloc(3, sizeof(int)) can also be written calloc(2, sizeof **p) will automatically convert sizeof **p to sizeof(int).
[29]
. The compiler
Remember that if p_obj is a pointer to a memory area holding nb objects of type obj_type, declared as obj_type *p_obj, you allocate memory for it as follows: o malloc( nb * sizeof(obj_type) ) or calloc( nb, sizeof(obj_type) ) o malloc( nb * sizeof *p_obj ) or calloc( nb, sizeof *p_obj) Remember the argument of the sizeof operator is the type of the referenced object or an object. In pointer2pointers1.c, p points to the object *p of type int *, and *p points to the object **p of type int. The second note is it is important not to forget that you have to allocate memory for the first indirection p and for the second indirection *p. The first indirection p references an address to a memory location that stores two pointers, each of which (second indirection) has to be also initialized with malloc() or calloc(). You can use a pointer to pointer to store a list of dynamic strings as below (Figure III‑16): $ cat pointer2pointers3.c #include #include #include int main(void) { int nb = 3; /* str holds 3 strings */ char **str = calloc( nb, sizeof *str );
str[0] = calloc( 10, sizeof **str); str[1] = calloc( 10, sizeof **str ); str[2] = calloc( 10, sizeof **str ); strcpy(str[0], “string 1” ); strcpy(str[1], “string 2” ); strcpy(str[2], “string 3” ); printf(“str[0]=%s, str[1]=%s and str[2]=%s\n”, str[0], str[1], str[2] ); free(str[0]); free(str[1]); free(str[2]); free(str); return (EXIT_SUCCESS); } $ gcc -o pointer2pointers3 -std=c99 -pedantic pointer2pointers3.c $ ./pointer2pointers3 str[0]=string 1, str[1]=string 2 and str[2]=string 3
Figure III‑16 Pointer to pointer to strings
As explained earlier, the compiler converts p[i] to *(p+i) whether p is an array or a pointer. OK, it is easy to catch but how do you think p[i][j] and p[i][j][k] are translated by the compiler? According to the same rule: p[i][j] is converted to *( *(p+i) + j ). If we write q = p[i] = *(p+i), then p[i][j] = q[j] = *(q+j) = *(*(p+i)+j). Likewise, p[i][j][k] is converted to *( *( *(p+i) + j ) + k).
III.8 Array of pointers, pointer to array and pointer to pointer
Figure III‑17 Representation of char arr[2][3]
We have learned, in C, a multidimensional array is in fact an array of array. For example, the array arr[3][10] is an array of 3 arrays of 10 characters. The main constraint on arrays is we cannot resize them, which leads programmers to resort to pointers. Suppose we need to store strings composed of 64 characters at most. If the maximum number of strings is
known, say 100, we could use the array arr[100][64] (see Figure III‑17). Thus, each array arr[i] holds a string having not more than 64 characters. Suppose now we have to deal with bigger strings whose length is unknown. In this case, we have to use pointers. The object we need to store our strings can be viewed as a 100 x n table: 100 lines and n rows. We can express it as an array of variable-length strings or symbolically (this is our own notation for easing the understanding) by arr[100][?]. We could read it as an array of 100 pointers (see Figure III‑20). In C, we would declare it as char *arr[100]. Suppose now the string size is not more 64 characters and the maximum number of strings to store is unknown. Here again, we have to use pointers. The object we need to store our strings can be viewed as an n x 64 table: n lines and 64 rows. Using our educational notation, we can express it symbolically as arr[?][64] where ? means dynamic-length in our own notation. We can read it as arr is a pointer to array[64] or a pointer to array of 64 char (see Figure III‑19). In C, we would declare it as char (*arr)[100]. Why using parentheses around the pointer? Because arrays have precedence over pointers ([] has precedence over *). If you remove the parentheses, *arr[100] means array of 100 pointers. The last possibilities, is the length of strings and the maximum number of strings to store are both unknown: the pointer **arr can be used for such a case (see Figure III‑18).
Figure III‑18 Representation of char **arr
Figure III‑19 Representation of char (*arr)[3]
Figure III‑20 Representation of char *arr[2]
In summary, a 3x10 array can be represented by arr[3][10], *arr[10], (*arr)[10] or **arr. Similarly, a 2x3x4 array can be represented by arr[2][3][4], (*arr)[3][4], (*arr[2])[4], *arr[2][3], (**arr)[4], *(*arr)[3], **arr[2] or ***arr. You have noticed that combining arrays with pointers make things trickier…Further explanations are required to understand how to read declarations involving arrays and pointers. First, we have to talk about precedence of arrays and pointers in declarations. An array has precedence over pointer. To increase the precedence of the pointer operator, you have to
enclose it between parentheses. For example *arr[2] is an array of two pointers. In contrast, (*arr)[2] means arr is a pointer to an array of 2 objects. Another example: (*arr[2])[4] is an array of 2 pointers to an array of 4 items. The array symbol [] is always on the right hand and the pointer symbol * is always on the left side. Therefore, the successive symbols [] are read from left to right (the first [] to read is the leftmost) and the successive symbols * are read from right to left (the first * to read is the rightmost)! Here is an informal method for deciphering declarations involving pointers and arrays: a. Locate the object name. Read name is b. Read the next enclosing parentheses (starting with the innermost up to the outermost parentheses) and apply steps c and d. If there is no parenthesis, go to the next step (step c). c. Read the next [] on the right side. Read array of. d. Then read next * on the left side. Read pointer to. e. Go to step b until you finish reading the declaration. f. You finish the process by reading the leftmost type. Let us apply the method to some declarations listed in Table I‑29.
Table III‑1 Declarations mixing arrays and pointers
Conversely, how to declare a pointer to array of 3 pointers to char? We apply the reverse method taking care to enclose pointers between parentheses. Here is an example. A pointer to an array of 3 pointers to char o A pointer to: (*arr) o array of 3: (*arr)[3] o pointers to: *(*arr)[3] o char: char *(*arr)[3] Another example: arr is an array of 2 arrays of 3 pointers to char. Here are the steps dissected: o arr is an array of 2 : arr[2] o arrays of 3: arr[2][3]
o pointers to: *arr[2][3] o char: char *arr[2][3] The last example, arr is an array of 2 pointers to an array of 4 char: o arr is an array of 2: arr[2] o pointers to : (*arr[2]) o an array of 4: (*arr[2])[4] o char: char (*arr[2])[4] Now, we know how to read declarations relating to arrays and pointers, we could easily find out how to declare dynamic multidimensional arrays by using pointers. Let us consider a program that stores items in the array arr[2][3][4]. If the maximum number of items to be stored in it is known and unchanged over time, we can choose an array. Now, imagine that the first dimension varies over time because our needs have changed. The best way to proceed is to use a pointer representing the first dimension. To ease our discussion, let us adopt the following notation: we write ? for a varying dimension that will be denoted by a pointer. In our example, according to our convention, arr[?][3][4] is an array whose the first dimension may be resized over time. Such an array is an array of varying-length array of array of 3 array of 4. The variable dimension can be implemented as a pointer. Therefore, our variable array arr can be represented by a pointer to array of 3 arrays of 4: o arr is a pointer to: (*arr) o array of 3: (*arr)[3] o array of 4: (*arr)[3][4] Table III‑2 shows the different ways to implement the array arr[2][3][4] depending on the dimension you wish to be dynamic (changeable at run time).
Table III‑2 Examples of implementation of a dynamic three-dimensional array
In the following example, we declare the object p as int (*p)[3] (pointer to array of 3 ints) and we allocate a memory area than can hold two arrays of 3 ints (see Figure III‑21): $ cat pointer2array1.c #include #include int main(void) { int (*p)[3]; /* pointer to array[3] */ p = malloc( 2*sizeof *p); /* allocate memory for 2 array of 3 ints */ p[0][0] = 0; p[0][1] = 1; p[0][2] = 2; /* first array in p[0]: 3 items */ p[1][0] = 10; p[1][1] = 11; p[0][2] = 12; /* second array in p[1]: 3 items */ printf(“int (*p)[3]:\n”);
printf(“sizeof p=%d (pointer)\n”,sizeof p); printf(“ sizeof p[0]=%d (=sizeof(int)*%d)\n”,sizeof p[0], 3); printf(“ sizeof p[0][0]=%d (=sizeof(int))\n”,sizeof p[0][0]); printf(“\nFirst array: first item=%d second item=%d\n”, *(*p), *(*p)+1); printf(“First array: first item=%d second item=%d\n”, p[0][0], p[0][1]); printf(“\nSecond array: first item=%d second item=%d\n”, *(*(p+1)), *(*(p+1))+1); printf(“Second array: first item=%d second item=%d\n”, p[1][0], p[1][1]); free(p); return EXIT_SUCCESS; } $ gcc -o pointer2array1 -std=c99 -pedantic pointer2array1.c $ ./pointer2array1 int (*p)[3]: sizeof p=4 (pointer) sizeof p[0]=12 (=sizeof(int)*3) sizeof p[0][0]=4 (=sizeof(int)) First array: first item=0 second item=1 First array: first item=0 second item=1 Second array: first item=10 second item=11 Second array: first item=10 second item=11
Figure III‑21 Pointer to array and pointer to int
Have a look at Figure III‑21. The pointer p1 points to an int. It is initialized by an array of ints. However, p1 is not a pointer to an array. Why? Because p1 = s is equivalent to p1 = &s[0]. That is, p1 does not point to an array but to s[0] that is an object of type int (the first element of the array s). In the following example, we declare an array of three pointers:
$ cat pointer2array2.c #include #include int main(void) { int *p[3]; /* array of 3 pointers to int */ int i; i=0; /* p[0] is the first pointer */ p[i] = malloc( 2 * sizeof (*p[0])); /* can hold 2 ints */ p[i][0] = i*10; p[i][1] = i*10+1; i=1; /* second pointer */ p[i] = malloc( 2 * sizeof (*p[0])); /* can hold 2 ints */ p[i][0] = i*10; p[i][1] = i*10+1; i=2; /* third pointer */ p[i] = malloc( 2 * sizeof (*p[0])); /* can hold 2 ints */ p[i][0] = i*10; p[i][1] = i*10+1; printf(“int *p[3]: p contains 3 pointers:\n”); i=0 printf(“pointer %d: first item=%d second item=%d\n”, i, p[i][0], p[i][1]); i=1 printf(“pointer %d: first item=%d second item=%d\n”, i, p[i][0], p[i][1]); i=2 printf(“pointer %d: first item=%d second item=%d\n”, i, p[i][0], p[i][1]); free(p[0]); free(p[1]); free(p[2]); return EXIT_SUCCESS; } $ gcc -o pointer2array2 -std=c99 -pedantic pointer2array2.c $ ./pointer2array2 int *p[3]: p contains 3 pointers: pointer 0: first item=0 second item=1 pointer 1: first item=10 second item=11 pointer 2: first item=20 second item=21
In order to keep the examples pointer2array1.c and pointer2array2.c easier to catch, we did not
test the pointer returned by malloc(). The program can be simplified with the for loop studied in Chapter V: $ cat pointer2array2.c #include #include int main(void) { int *p[3]; /* array of 3 pointers to int */ int i; for (i=0; i < 3; i++) { p[i] = malloc( 2 * sizeof (*p[0])); /* can hold 2 ints */ p[i][0] = i*10; p[i][1] = i*10+1; } printf(“int *p[3]: p contains 3 pointers:\n”); for (i=0; i < 3; i++) printf(“pointer %d: first item=%d second item=%d\n”, i, p[i][0], p[i][1]); for (i=0; i < 3; i++) free(p[i]); return EXIT_SUCCESS; }
We learned that if s1 is array, in the expression p = s1, the array is converted to a pointer to its first element. How is the array s2 declared as int s2[10][5] converted? The C language is coherent, such an array is also converted to a pointer to its first element that is &s2[0]. Now, consider the statement p = s2. Can you guess the declaration of the pointer p? The element s2[0] (the first element) being an array of 5 int, &s2[0] is a pointer to an array of 5 int. Consequently, our pointer would be declared as int (*p)[5].
III.9 Variable-length arrays and variably modified types So far, we have learned that the size of an array must be known at compile time. To be able to work with an array whose size is unknown at compile time, we have to use a pointer. In the following example, we store the strings passed to the program in a memory area, allocated by malloc(), pointed to by the pointer ptr_list_string: $ cat vla1.c
#include #include #include #define MAX_STRING_LEN 255 int main(int argc, char **argv) { /* pointer to string of MAX_STRING_LEN characters */ char (*list_string)[MAX_STRING_LEN]; int i; size_t list_string_len; if (argc < 2) { printf(“USAGE: %s string1 string2…\n”, argv[0]); return EXIT_FAILURE; } /* number of strings */ list_string_len = argc-1; list_string = malloc(list_string_len * sizeof *list_string); /* copy strings */ for (i=0; i < list_string_len; i++) /* argv[0]: program name. argv [1]: first string… */ strcpy(list_string[i], argv[i+1]); /* display strings */ for (i=1; i < list_string_len; i++) printf(“String %d: %s\n”, i, list_string[i]); free(list_string); return EXIT_SUCCESS; } $ gcc -o vla1 -std=c99 -pedantic vla1.c $ ./vla1 “hello” “how are you?” String 1: hello String 2: how are you?
The C99 standard introduced a new type of array called variable-length array or VLA for short. It is different from fixed-sized arrays we studied in that their length is known at run-
time only. The length of a VLA does not have to be a constant expression (see Chapter IV Section IV.14) but an expression that evaluates to a positive integer (known at run time). A VLA works as a fixed-sized array and is declared in the same way. The previous example can be written using a VLA: $ cat vla2.c #include #include #include #define MAX_STRING_LEN 255 int main(int argc, char **argv) { if (argc < 2) { printf(“USAGE: %s string1 string2…\n”, argv[0]); return EXIT_FAILURE; } size_t list_string_len = argc - 1; char list_string[list_string_len][MAX_STRING_LEN]; int i; /* copy strings */ for (i=0; i < list_string_len; i++) /* argv[0]: program name. argv [1]: first string… */ strcpy(list_string[i], argv[i+1]); /* display strings */ for (i=0; i < list_string_len; i++) printf(“String %d: %s\n”, i, list_string[i]); return EXIT_SUCCESS; } $ ./vla2 “hello” “how are you?” String 0: hello String 1: how are you?
However, the size of a VLA does not vary over time. Once, the value of its length is known, the VLA keeps the same size during its lifetime: unlike pointers, it cannot be resized. In the following example, we declare a VLA whose size is an expression (composed of a variable) evaluating to a positive integer:
$ cat vla3.c #include #include int main(void) { int array_size = 5; int age[ array_size ]; return EXIT_SUCCESS; }
The size of a VLA can be known only at run time as in the following example: $ cat vla4.c #include #include int main(int c, char **argv) { int array_size = atoi(argv[1]); int age[ array_size ]; printf( “Array size is %d\n”, array_size ); return EXIT_SUCCESS; } $ gcc -o array3 -std=c99 -pedantic array3.c $ ./array3 10 Array size is 10
Such an array is called variable-length array. We will not fully describe this example now. Briefly: o The atoi() function converts a string containing digits into a number. For example, if THEa string is “123”, atoi() turns it into the number 123. o The parameters c of the main() function holds the number of arguments in the command line when you have launched the program. Here, c holds 2 because the command line is composed of the name of the program and the argument 10. o The second parameter argv of the main() function holds the name of the program, and its arguments. Here, the program name array3 is stored in argv[0] and the argument 10 is held in argv[1]. o The statement int array_size = atoi(argv[1]) stores the value you have passed to the program into the variable array_size that will be then used as the size of the array age. We have not talked about the initialization of a VLA because since the size of a VLA is
not known at compile time, you cannot initialize it as a fixed-size array. A type deriving from (i.e. constructed from) a VLA is known as a variably modified type (VM type). For example, the pointer p has a VM type: int n = 10; long long *p[n];
VLAs and objects having VM types are subject to some constraints described in Chapter VII Section VII.17.
III.10 Creating types from array and pointer types Array and pointer types are constructed from other types: they are known as derived types. Now, we suggest creating new types derived from arrays and pointers. The typedef keyword allows building new type names from existing types. The typedef keyword is used as if you declare an object. Let us find out how it works through examples: o Defining myInteger type as long type: typedef long myInteger;
o Create the string10 type as an array of 10 chars: typedef char string10[10];
For example: $ cat typedef_ptr_array1.c #include #include int main(void) { typedef char string10[10]; string10 arr; printf( “Array size is %d\n”, sizeof arr); return EXIT_SUCCESS; } $ gcc -o typedef_ptr_array1 -std=c99 -pedantic typedef_ptr_array1.c $ ./typedef_ptr_array1 Array size is 10
o Create the ptr_dbl type as a pointer to double:
typedef double *ptr_double; $ cat typedef_ptr_array2.c #include #include int main(void) { double f = 10.2; typedef double *ptr_double; ptr_double ptr_dbl = &f; printf( “%f\n”, *ptr_dbl); return EXIT_SUCCESS; } $ gcc -o typedef_ptr_array2 -std=c99 -pedantic typedef_ptr_array2.c $ ./typedef_ptr_array2 10.200000
o Create array3D_10x20x30 type as an array of 10 arrays of 20 arrays of 30 chars: typedef char array3D_10x20x30[10][20][30]; $ cat typedef_ptr_array3.c #include #include int main(void) { typedef char array3D_10x20x30[10][20][30]; array3D_10x20x30 arr; printf( “%d\n”, sizeof arr); return EXIT_SUCCESS; } $ gcc -o typedef_ptr_array3 -std=c99 -pedantic typedef_ptr_array3.c $ ./typedef_ptr_array3 6000
o Create the ptr_arr type as a pointer to array of 3 float and the type arr3 as an array of 3 float: typedef float (*ptr_arr)[3]; typedef float arr3[3];
$ cat typedef_ptr_array4.c #include #include int main(void) { typedef float (*ptr_arr)[3]; typedef float arr3[3]; arr3 s[2] = { {1.1, 1.2, 1.3}, {2.1, 2.2, 2.3} }; ptr_arr p_arr = s; printf( “%f %f\n”, p_arr[0][0], p_arr[1][2]); return EXIT_SUCCESS; } $ gcc -o typedef_ptr_array4 -std=c99 -pedantic typedef_ptr_array4.c $ ./typedef_ptr_array4 1.100000 2.300000
III.11 Qualified pointer types The C standards, until C95, specified two type qualifiers: const and volatile. C99 added a new one known as restrict. An object declared without a type qualifier has an unqualified type. If declared with a type qualifier, its type is qualified. For example, float is an unqualified type while const float is a qualified type (const-qualified type). Qualifiers do not change the representation of the type (neither its alignment). There can be several qualifiers, in any order, in a declaration. The types const volatile int, volatile const int, const int volatile… represent the same type. Keep in mind, a qualified type is different from the corresponding unqualified type: they represent different types even though they have the same representation and alignment. The qualifier applies to a type. It can be placed after or before the type it qualifies but when applied to a pointer, it must be placed after the asterisk *. For example, the pointer type char * const is qualified: a pointer of that type is made read-only. Compare the following declarations: o char * const p declares p as a read-only pointer. The pointer p has a const-qualified type. o char const * p declares p as a pointer to an object of type const char. The pointer p has an unqualified type while the object it points to has a const-qualified type. o const char * p is identical to the previous declaration.
In summary, a pointer type does not inherit the qualifiers of the types from which it is built. That is, the pointer type char const * derives from the qualified type char const but is not qualified itself.
III.12 Compatible types In Chapter II section II.10, we said two types are compatible if they are the same. Two compatible types are also compatible if they have the same qualifiers whatever their order. Thus, const float and float are not compatible while const volatile int and volatile const int are compatible. Two arrays are compatible if they are the same size and their elements have compatible type. Two pointer types are compatible if they have the same type qualifiers and they points to compatible types. The following pointer types are compatible: o short int * and short * o unsigned * and unsigned int * o int *const and signed int *const o const long *const and signed long const *const The following pointer types are not compatible: o short int * and const short int * o unsigned * and unsigned *const
III.13 Data alignment We learned that depending on the data type, the amount of storage allocated is a byte or a group of bytes. For example, an object of type int may be stored in 4 bytes. The group of bytes is located at a certain address in memory. The issue is most of the computers (even [30] in computers allowing byte-addressable memory) require that each data type to be placed at certain addresses: this is known as data alignment. That is, not all addresses can be used to place any piece of data. The constraints vary from processor to processor. The allowed addresses are multiples of some specific sizes. In older computers, data had to be placed at addresses that were a multiple of a word size (varying with the processor architecture). On modern computers, pieces of data have to be put at addresses that are multiple of their type size (known natural alignment). For example, if a short is 16-bit wide, an integer of that type will be placed at an address multiple of 16 bits (2 bytes): it is aligned on 16-bit boundaries. If an int has a size of 32 bits, an integer of that type will be placed at an address multiple of 32: it is aligned on 32-bit boundaries. Fortunately,
generally, you do not have to worry about data alignment since the compiler will do the job. On modern computers whose (memory is byte-addressable) an object fitting in a byte can be put at any address. [31] However, when dealing with object pointers (pointers to objects or another way to put it pointers to data) and performing conversion between pointers (described in Chapter III Section III.14), you have to care about data alignment constraints. In C, you can convert a data pointer, through an explicit cast, any pointer to any data pointer type, which can lead to misalignment. Not all processor can handle misalignments. To highlight the problem, let us consider two kinds of processors: SPARC® and Intel®. The following example works on Intel® based computer: $ cat pointer_align1.c #include #include int main(void) { char s[5] = { 0,0,0,0,0}; int *p = (int *)&s[0]; printf(“sizeof int=%d\n”, sizeof(int)); printf(“p=%u s=%u\n”, p, s); printf(“*p=%d\n”, *p); return EXIT_SUCCESS; } $ gcc -o pointer_align1 -std=c99 -pedantic pointer_align1.c; $ ./pointer_align1 sizeof int=4 p=2147482768 s=2147482768 *p=0
Both Intel® and SPARC® processors require a 32-bit int to be aligned on 32-bit boundaries but SPARC® processors cannot handle data misalignment while Intel processors can. If the program pointer_align1.c is executed on SPARC® systems, it may crash or work depending on the address of s[0]. To show it clearly, consider the following example: $ cat pointer_align2.c #include #include int main(void) { char s[5] = { 0,0,0,0,0 };
int *p = (int *)&s[0]; int *q = (int *)&s[1]; printf(“p=%u q=%u s=%u\n”, p, q, s); printf(“*p=%d\n”, *p); printf(“*q=%d\n”, *q); return EXIT_SUCCESS; }
On an Intel® platform, it works fine though the object pointed to by pointer p may not be strictly aligned on a 32-bit boundary: p=4278184563 q=4278184564 s=4278184563 *p=0 *q=0
On a SPARC® computer, it crashes: p=2147482768 q=2147482769 s=2147482768 *p=0 Bus Error (core dumped)
In the above example, the object pointed to by the pointer q (whose address = 2147482769 = 67108836*32 + 17) was misaligned causing the program to be halted abnormally. As long as we do not access a misaligned object, there is no problem but if we attempt to access it, on SPARC® processors, the program crashes with a Bus Error. In our example, the object (of type 32-bit int) pointed to by the pointer p was safely accessed because it was aligned on its natural boundary while the object pointer to by q was misaligned. There are two kinds of alignments with pointers: the alignment of the pointer itself and the alignment of the object it points to. In most of modern computers, all object pointers are represented as an integer and have the same size and then when converting an object pointer to any data pointer type, there is no issue regarding the pointer itself. However, the C standard has not such a requirement and then, there might be computers that have object pointer types of different sizes. That is, if you convert an object pointer of type P1 to type P2, and the object pointer types are of a different size, the conversion of the pointer might lead to an issue on some computers imposing data alignment constraints. In our example, pointer_align2.c, the alignment restrictions concerned only objects pointed to by pointers since all data pointers have the same representation on SPARC® processors.
There is no misalignment if you assign a variable org of type T1 to a variable tgt of type T2 because, the value of the variable org is converted and then copied into the variable tgt: int tgt = org. The variables tgt and org are automatically aligned at their inception: their address will not change until their destruction.
In C standard, a pointer to void has the same alignment and representation as a pointer to a character type. Pointers to qualified and unqualified compatible types have the same representation and alignment.
III.14 Conversions As explained in Chapter II Section II.11, in C, there are two kinds of conversions, also known as casts: implicit conversions and explicit conversions. A conversion occurs when the type of a value (resulting from an expression) is changed to another type. Implicit conversions may be performed by some operators such as arithmetic operators (+, -, *, /…) and the assignment operator =, while explicit conversions are under control of the programmer. The implicit cast is a conversion that the compiler is allowed to do silently if it meets the implicit conversion rules of the concerned operator. There are specific rules for implicit and explicit conversions. When a conversion is required by an operator but the compiler cannot perform silently (implicit conversion), the compiler may print a warning message and forces the conversion according to the explicit conversion rules.
III.14.1 Pointer conversions For pointers, two kinds of conversions (casts) may occur: implicit conversions performed by the assignment operation and explicit conversions through the cast operator. The C standard specifies specific rules for both of them. If obj is an object, the explicit cast (tgt_type)obj converts obj to type tgt_type. The assignment operation is composed of one operator = and two operands: one operand before the equals sign and the other after: lvalue=rvalue
Since expressions are described later, we can consider the left operand lvalue is a pointer and the right operand rvalue is a value we want to assign to the pointer.
III.14.1.1 Conversion between pointers and integers A pointer may be explicitly converted to an integer type but the result depends on the implementation. A pointer may be the same size as an integer type and have the same representation but this is not requirement. A pointer may not be representable by an integer type. In many computers, a pointer has the same representation as an integer type, and then, can be converted to an integer type and back keeping the original value. On our computer, a pointer can be converted to type unsigned int as shown below: $ cat pointer2int1.c #include #include int main(void) { double v = 10.2; double *p =&v; unsigned int u = (unsigned int)p; printf(“sizeof p=%d sizeof unsigned int=%d\n”, sizeof p, sizeof u ); printf(“p=%u u=%u\n”, p, u ); return EXIT_SUCCESS; } $ gcc -o pointer2int1 -std=c99 -pedantic pointer2int1.c $ ./pointer2int1 sizeof p=4 sizeof unsigned int=4 p=4278184560 u=4278184560
In some implementations allowing conversion between pointers and integers, two special types may be defined (in stdint.h): intprt_t and uintprt_t. They are large enough to store a pointer. If you use them, keep in mind, your program will not work on systems that do not define them. In our computer, they are defined. Our previous example can be rewritten as: $ cat pointer2int2.c #include #include #include int main(void) { double v = 10.2; double *p =&v; uintptr_t u = (uintptr_t)p; printf(“sizeof p=%d sizeof uintptr_t=%d\n”, sizeof p, sizeof u ); printf(“p=%u u=%u\n”, p, u );
return EXIT_SUCCESS; } $ gcc -o pointer2int2 -std=c99 -pedantic pointer2int2.c $ ./pointer2int2 sizeof p=4 sizeof uintptr_t=4 p=4278184560 u=4278184560
Conversely, if the implementation allows it, you can explicitly convert an integer to a pointer type. However, any implementation permits the conversion of 0 to a pointer type. An integer constant expression evaluating to 0 or an integer constant expression evaluating to 0 cast to void * is called a null pointer constant represented by the macro NULL. When you convert a null pointer constant to a pointer type, you obtain a null pointer: (char *)0, (int *)0, (double *)0 are examples of null pointers. If the representation of two null pointers may be different, they always compare equal: for instance, a null pointer to char compares equal to null pointer to float. even if their representation is different. There is no implicit conversion between pointers and integers. III.14.1.2 Conversion between pointers and void * Let us start with the implicit conversions performed by the simple assignment operation. Say the left operand of the assignment operator p_left is an object pointer to type LT and the right operand p_right is an object pointer to type RT. In an assignment operation LT *p_left = RT *p_right, an automatic conversion occurs if the following conditions are met: o the type RT or LT is a qualified or unqualified version of the type void o the type that is pointed to by the left pointer p_left contains at least the qualifiers of the type pointed to by the right pointer p_right. Otherwise, the compiler generates a warning message unless an explicit cast is used. In the following example, the second warning produces a warning message: $ cat pointer_conv_void1.c #include #include int main(void) { const void *m; const int *p = m; /* OK */ int *q = m; /* Line 7: missing const, generate warning. Be cautious */
return EXIT_SUCCESS; } $ gcc -o pointer_conv_void1 -std=c99 -pedantic pointer_conv_void1.c pointer_conv_void1.c: In function ‘main’: pointer_conv_void1.c:7:13: warning: initialization discards qualifiers from pointer target type
The compiler gcc complains but forces the cast. If we use the explicit cast, the warning disappears: $ cat pointer_conv_void2.c #include #include int main(void) { const void *m; const int *p = m; /* OK */ int *q = (int *)m; /* No warning. Be cautious: do not attempt to alter the object pointed to by q */ return EXIT_SUCCESS; } $ gcc -o pointer_conv_void2 -std=c99 -pedantic pointer_conv_void2.c
An explicit cast allows converting a pointer to a qualified or unqualified version of the type void to any pointer type and conversely. In the following example, the pointer to void is on left side of the assignment operator: $ cat pointer_conv_void3.c #include #include int main(void) { const int *m; const void *p = m; /* OK */ void *q = m; /* Line 7: generate warning, missing const */ return EXIT_SUCCESS; } $ gcc -o pointer_conv_void3 -std=c99 -pedantic pointer_conv_void3.c
pointer_conv_void3.c: In function ‘main’: pointer_conv_void3.c:7:14: warning: initialization discards qualifiers from pointer target type
We also got a warning: the implicit conversion could not be done. The compiler generated a warning but forced the cast. An explicit cast removes the warning: $ cat pointer_conv_void4.c #include #include int main(void) { const int *m; const void *p = m; /* OK */ void *q = (void *)m; /* OK. Be cautious */ return EXIT_SUCCESS; }
If the right pointer points an unqualified type, the implicit conversion occurs whether the left pointer points to a qualified or unqualified type as shown below: $ cat pointer_conv_void5.c #include #include int main(void) { int *m1; const void *p1 = m1; /* OK */ void *q1 = m1; /* OK */ void *m2; const int *p2 = m2; /* OK */ int *q2 = m2; /* OK */ return EXIT_SUCCESS; }
III.14.1.3 Conversion between pointers Let us call LTver a qualified or unqualified version of the type LT and RTver a qualified or unqualified version of the type RT (for example, the type const int is a qualified version of
the type int). In the assignment operation LTver *p_left = RTver *p_right, an implicit conversion occurs if the following conditions are met: o The types LT and RT are compatible. This means that the unqualified versions of the types of the pointed-to objects are compatible. o The type LTver as at least the qualifiers of the type RTver. This means the type of the left pointed-to object has the at least the qualifiers of the type of the right pointed-to object. Otherwise, the compiler produces a warning message unless an explicit cast is used. The rule just dictates that pointers refer to objects having the same way to interpret them (same alignment, same representation) and respecting the constraints enforced by qualifiers. For example: $ cat pointer_conv_assign3.c #include #include int main(void) { signed int m = 17; const signed int c = 19; float f = 10; const int *p2c; int *p2m; const int **pp2c; int **pp2m; p2c = &m; /* OK */ p2c = &c; /* OK */ p2m = &m; /* OK */ p2m = &c; /* Line 18. KO: const missing in left type */ p2m = &f; /* Line 20. KO: int and float not compatible */ pp2m = pp2c; /* Line 22. KO: const int * and int * not compatible */ return EXIT_SUCCESS; } $ gcc -o pointer_conv_assign3 -std=c99 -pedantic pointer_conv_assign3.c pointer_conv_assign3.c: In function ‘main’: pointer_conv_assign3.c:18:8: warning: assignment discards qualifiers from pointer target type
pointer_conv_assign3.c:20:8: warning: assignment from incompatible pointer type pointer_conv_assign3.c:22:9: warning: assignment from incompatible pointer type
The example is quite simple and it is easy to understand why the warnings are generated except for the statement in line 22: pp2m = pp2c. Symbolically, we can write it like this: int ** = const int **. If int * is called LTver and const int * is called RTver, then LTver * = RTver *. Written like this, we could deduct their unqualified version: LT is int * and RT is const int * which appear clearly not compatible, hence the output. Your question might be why RT is const int * and not int *? Take note that RT is pointer to an object of type const int: the qualifier const is related to the object pointed to by the pointer and does not qualify the pointer. If RT was int *const, we could have said its unqualified version was int *. Now, if apply explicit casts to the previous example, we get no warnings: $ cat pointer_conv_assign4.c #include #include int main(void) { signed int m = 17; const signed int c = 19; float f = 10; const int *p2c; int *p2m; const int **pp2c; int **pp2m; p2c = &m; /* OK */ p2c = &c; /* OK */ p2m = &m; /* OK */ p2m = (int *)&c; /* no warning but be cautious */ p2m = (int *)&f; /* no warning but bad idea */ pp2m = (int **)pp2c; /* no warning but be cautious */ return EXIT_SUCCESS; }
The explicit cast rules allow converting a pointer to any pointer type. Explicit casts seem to be the cure for warnings yielded by the compiler. Do not consider the goal of the compiler is to annoy you: it gives valuable information. Always check carefully your
explicit casts. Explicit casts get rid of the warnings but it does not mean there will no unexpected consequences. As an example, let us consider a read-only variable modified using a pointer: $ cat pointer_conv_assign4.c #include #include int main(void) { const int v =12; int *p = (int *)&v; *p = 20; printf(“v=%d\n”, v); return EXIT_SUCCESS; }
This code fragment seems to be correct and may work on many computers. Yet it is not compliant. The statement *p = 20 has an undefined behavior. Modifying an object of constqualified type through a pointer is not portable and should be avoided (see Chapter III). The same rule applies for the volatile qualifier. There are always good reasons for a conversion is not done automatically; you have to watch out for the warning messages of the compiler. The C standard lets you use explicit casts that are less restrictive but this does not mean you can do anything. Using an explicit cast suppose you know the consequences of what you are doing. An explicit cast lets convert a pointer type to any other type as in the following example: #include #include int main(void) { float *q; long long *p = (long long *)q; return EXIT_SUCCESS; }
This kind of conversion is not portable and even may crash your program on some systems, as described in section III.13, if you attempt to access the object pointed to by p because the type float and long long may not have the same alignment. More generally, an explicit cast (TTG)p_obj converting an object p_obj of type TORG to type TTG may lead to misalignment. If the alignment constraints for the type TTG is stricter than for the type TORG, there may be data misalignment causing an undefined behavior. That is,
if the type TORG is aligned on mod_org boundaries and the type TTG is aligned on mod_tgt boundaries, there may be misalignment if mod_tgt > mod_org. Conversely, if mod_tgt ≤ mod_org, and mod_org is a multiple of mod_tgt, data will be correctly aligned and the cast is safe. Converting any pointer type to void * or a pointer to character type and back is always safe. The rationale is the character types (fitting in a byte) have the least strict alignment constraints (no constraint on computers having byte-addressable memory) and the pointer void * has the same representation and alignment as a pointer to a character type.
III.14.2 Pointer and arithmetic conversion rules We summarize in the following two sections what we learned so far about conversions. III.14.2.1 Explicit cast Table III‑3 lists allowed explicit conversions applied on arithmetic and pointer types.
Table III‑3 Explicit conversions on pointer and arithmetic types
III.14.3 Assignment conversions Table III‑4 lists allowed assignment conversions applied on arithmetic and pointer types.
Table III‑4 Assignment conversions on pointer and arithmetic types
A conversion not listed in Table III‑4 requires an explicit cast.
III.15 Exercises Exercise 1. What are the differences between the types char s[10][64] and char *s[64]? Exercise 2. Let s be an array of char (i.e. declared as char s[]). Explain why the expression
sizeof s yields the same output as strlen(s) + 1 if s contains a string.
Exercise 3. Let s be a pointer to char (i.e. declared as char *s). Explain why the expression sizeof s does not yield the same value as strlen(s) + 1 if s contains a string. Exercise 4. Let s be an array. Is the expression s++ valid? Explain why. Exercise 5. The following program contains is wrong. Correct it. #include #include #include int main(void) { char msg[]=“Hello”; char *p; strcpy(p, msg); return EXIT_SUCCESS; }
Exercise 6. The following program contains an error. Correct it. #include #include #include int main(void) { char msg[]=“Hello”; int len = strlen(msg); char *p = malloc(len); strcpy(p, msg); return EXIT_SUCCESS; }
Exercise 7. In the following example, is p a pointer to an array? int *p; int s[10];
p=s;
Exercise 7. In the following example, p is a pointer to an array of 2 int. Why the following assignments are not valid? int (*p)[2]; int s1[2]; int s2[2]; p[0]=s1; p[1]=s2;
Exercise 8. List the different ways to declare an object p emulating a 5x7 table. Exercise 9. Explain why the following program is not correct: #include #include int main(void) { long a[2][2]; long **p; p = a; a[0][0] = 0; a[0][1] = 1; a[1][0] = 10; a[1][1] = 11; printf(“%ld\n”, p[1][0]); return (EXIT_SUCCESS); }
Exercise 10. How would declare a dynamic array that can hold objects of different types?
CHAPTER IV OPERATORS
IV.1 Introduction An operator is a symbol invoked with one or more arguments, known as operands, performing a specific calculation and returns a numeric value. A C operator can take one operand (unary operator), two operands (binary operator) or three operands (ternary operand). The number of operands is called an arity. An operand does not work with any operands: operands are expected with specific types. In the chapter, we will describe five types of operators: o Arithmetic operators o Relational operators o Logical operators o Bitwise operators o Assignment operators Operators can be combined to form expressions. An expression can be as simple as a literal such as the integer literal 10, the string literal “hello”, the variable msg, an assignment, an operation or a combination of all of those. An expression is a set of operations, variables, literals, and function calls. Here are some examples of expressions: o msg o 12 o msg=“hello” o x=12 o 12+x*8/1.1 o i=atoi(argv[1]) o v=6.2*x
IV.2 Arithmetic operators
Operation
Meaning
+E1
Unary plus
-E1
Unary minus
E1 + E2
Addition operator
E1 - E2
Subtraction operator
E1 * E2
Multiplicative operator
E1 / E2
Division operator
E1 % E2
Modulo operator Table IV‑1 Arithmetic operators
[32] Arithmetic operators take operands of arithmetic types. An arithmetic type is an integer type (char, unsigned char, short, unsigned short, int, unsigned int, long …), a real floating type (float, double, long double) or a complex type (float _Complex, double _Complex, long double _Complex). The operands of the operators are expressions that evaluates to a numeric value. The expressions E1 and E2 can be: o A numeric literal such as 1 (integer literal), or 2.8 (floating literal) o A variable of arithmetic type. For example x, where x is a numeric variable (integer, float, double…) o An operation such as 8*x o A combination of numeric literals, variables and operations such as 1*v+y-9.
IV.2.1 Unary plus The unary plus denotes the positive sign of a number. It can be omitted, it has no effect on the value to which it is applied. For example: $ cat unary_plus.c #include #include int main(void) {
int j = +10; int i = 10; printf(“i=%d and j=%d\n”, i, j); return EXIT_SUCCESS; } $ gcc -o unary_plus -std=c99 -pedantic unary_plus.c $ ./unary_plus i=10 and j=10
The general syntax of the unary plus is given below: +E
The operand E can be a numeric literal, a variable or more generally an expression. For example, 1+v*y is an expression composed of two operations: addition and multiplication. Since the unary plus does nothing, it is generally omitted. It has been specified for the consistency of the C language: since the unary minus exists (and does something), the unary plus has been specified.
IV.2.2 Unary minus The unary minus denotes the negative sign of a number: it negates its operand. For example: $ cat unary_minus1.c #include #include int main(void) { int i = -10; int j = -i; printf(“i=%d j=%d\n”, i, j); return EXIT_SUCCESS; } $ gcc -o unary_minus1 -std=c99 -pedantic unary_minus1.c $ ./unary_minus1 i=-10 j=10
The general syntax of the unary minus is given below: -E
The operand E is an expression. The following example negates the expression (multiplication):
2*i
$ cat unary_minus2.c #include #include int main(void) { int i = 10; int j = -(2*i); printf(“i=%d j=%d\n”, i, j); return EXIT_SUCCESS; } $ gcc -o unary_minus2 -std=c99 -pedantic unary_minus2.c $ ./unary_minus2 i=10 j=-20
IV.2.3 Addition IV.2.3.1 Numeric operands The addition operator denoted by the plus sign + (binary +) takes two arithmetic operands and returns a numeric value resulting of the addition of its operands. The operands can be integer or floating numbers. The following example adds integer values: $ cat addition1.c 1 #include 2 #include 3 int main(void) { 4 int i; 5 int j; 6 7 i = 2 + 2; 8 j = 1 + i; 9 10 printf(“i=%d and j=%d\n”, i, j); 11 return EXIT_SUCCESS; 12 } $ gcc -o addition1 -std=c99 -pedantic addition1.c $ ./addition1 i=4 and j=5
Explanation:
o Line 4: declaration of the i variable as type int. o Line 5: declaration of the j variable as type int. o Line 7: first, the addition 2+2 evaluates to the value of 4 that is then is assigned to the variable i. o Line 8: the variable i holds the value 4. The resulting value of the addition 1+i (i.e. 5) is stored in the variable j. Since operations can be used at declaration time (initialization), the previous example can also be written as follows: $ cat addition2.c #include #include int main(void) { int i = 2 + 2; int j = 1 + i; printf(“i=%d and j=%d\n”, i, j); return EXIT_SUCCESS; } $ gcc -o addition2 -std=c99 -pedantic addition2.c $ ./addition2 i=4 and j=5
The operands of the addition operator can be any numeric value (i.e. integer or floating type). In the following example, there is one operand of type float and one operand of type int: $ cat addition3.c #include #include int main(void) { float i = 2.1 + 2; float j = 1 + i; printf(“i=%f and j=%f\n”, i, j); return EXIT_SUCCESS; } $ gcc -o addition3 -std=c99 -pedantic addition3.c $ ./addition3
i=4.100000 and j=5.100000
Both operands can be of type floating types: $ cat addition4.c #include #include int main(void) { double i = 2.1; float j = 1.20 + i; printf(“i=%f and j=%f\n”, i, j); return EXIT_SUCCESS; } $ gcc -o addition4 -std=c99 -pedantic addition4.c $ ./addition4 i=2.100000 and j=3.300000
IV.2.3.2 Pointer operands Whether the addition operator takes two numeric operands is not very surprising but what is unusual is it also works with pointers in a particular way. It allows a single operand to be of type pointer, while the second one is an integer operand. An addition involving a pointer looks like this: p + E
Where: o p is a pointer o E is an expression evaluating to an integer number n If E is an expression evaluating to an integer number n and p is pointer to an object obj of type obj_type storing the address addr, the expression p + E evaluates to a pointer holding the address addr + n * sizeof(obj_type). Remember the expression p + E has a pointer type. Let us consider a simple example. Let assume that: o The pointer p was declared as int *p o In our computer the type int is represented by four bytes (i.e. sizeof(int) would return 4) o The address in the pointer p is 8061028.
In such a case, the expression p + 1 would return a pointer of the same type holding the address 8061028 + 1*4=806102C as shown in the following example: $ cat addition5.c #include #include int main(void) { int *p = malloc(3 * sizeof *p); p[0] = 1; p[1] = 2; p[2] = 3; printf(“address in p=%p, holds %d\n”, p, *p); printf(“address in p+1=%p, holds %d\n”, p+1, *(p+1)); printf(“address in p+2=%p, holds %d\n”, p+2, *(p+2)); return 0; } $ gcc -o addition5 -std=c99 -pedantic addition5.c $ ./addition5 address in p=8061078, holds 1 address in p+1=806107c, holds 2 address in p+2=8061080, holds 3
It worth noting that the operation p+n does not return a numeric value but a pointer of the same type as p as shown below: $ cat addition6.c #include #include int main(void) { int *p = malloc(3 * sizeof *p); int q; p[0] = 1; p[1] = 2; p[2] = 3; q = p + 1; printf(“address in q=%p, holds %d\n”, q, *q); q = p + 2; printf(“address in q=%p, holds %d\n”, q, *q); return EXIT_SUCCESS;
} $ gcc -o addition6 -std=c99 -pedantic addition6.c addition6.c: In function ‘main’: addition6.c:13:6: warning: assignment makes integer from pointer without a cast addition6.c:14:6: warning: assignment makes integer from pointer without a cast addition6.c:14:56: error: invalid type argument of unary ‘*’ (have ‘int’)
The compilation failed because q must be a pointer as in the following example: $ cat addition7.c #include #include int main(void) { int *p = malloc(3 * sizeof *p); int *q; p[0] = 1; p[1] = 2; p[2] = 3; q = p; printf(“address in p=%p, address in q=%p holds %d\n”, p, q, *q); q = p + 1; printf(“address in p=%p, address in q=p+1=%p holds %d\n”, p, q, *q); q = p + 2; printf(“address in p=%p, address in q=p+2=%p holds %d\n”, p, q, *q); return EXIT_SUCCESS; } $ gcc -o addition7 -std=c99 -pedantic addition7.c $ ./addition7 address in p=80610d8, address in q=80610d8 holds 1 address in p=80610d8, address in q=p+1=80610dc holds 2 address in p=80610d8, address in q=p+2=80610e0 holds 3
IV.2.4 Subtraction IV.2.4.1 Arithmetic operands The Subtraction operator denoted by the symbol – (binary minus) works the same way as the addition operator. It subtracts two numeric expressions and returns the resulting numeric value. The following example subtracts integer values: $ cat substract1.c #include #include
int main(void) { int i; int j; i = 2 - 3; j = 4 + i; printf(“i=%d and j=%d\n”, i, j); return EXIT_SUCCESS; } $ gcc -o subtract1 -std=c99 -pedantic subtract1.c $ ./subtract1 i=-1 and j=3
Since operations can be used at declaration time, the previous example can also be written as follows: $ cat subtract2.c #include #include int main(void) { int i = 2 - 3; int j = 4 + i; printf(“i=%d and j=%d\n”, i, j); return EXIT_SUCCESS; } $ gcc -o subtract2 -std=c99 -pedantic subtract2.c $ ./subtract2 i=-1 and j=3
The subtraction operator works with arithmetic values. In the following example, there is one operand of type float and one of type int: $ cat substract3.c #include #include int main(void) { float i = 2.1 - 2; float j = 1 - i; printf(“i=%f and j=%f\n”, i, j);
return EXIT_SUCCESS; } $ gcc -o subtract3 -std=c99 -pedantic subtract3.c $ ./subtract3 i=0.100000 and j=0.900000
IV.2.4.2 Pointer operands The subtraction operator works in the same way as the addition operation. It allows a single operand to be of type pointer, while the second one is an integer operand: p - E
Where: o p is a pointer o E is an expression evaluating to an integer number n. If E is an expression evaluating to an integer number n and p is pointer (holding the address addr), the expression p - E returns a pointer holding the address addr - n * sizeof *p. For example: $ cat subtraction4.c #include #include int main(void) { int *p = malloc(3 * sizeof *p); int *q; p[0] = 1; p[1] = 2; p[2] = 3; q = &p[2]; printf(“address in q=%p, holds %d\n”, q, *q); printf(“address in q-1=%p, holds %d\n”, q-1, *(q-1)); printf(“address in q-2=%p, holds %d\n”, q-2, *(q-2)); return 0; } $ gcc -o subtract4 -std=c99 -pedantic subtract4.c $ ./subtract4
address in q=8061090, holds 3 address in q-1=806108c, holds 2 address in q-2=8061088, holds 1
The operation returns a pointer as shown below: $ cat substract5.c #include #include int main(void) { int *p = malloc(3 * sizeof *p); int *last_element, *q; p[0] = 1; p[1] = 2; p[2] = 3; last_element = &p[2]; q=last_element; printf(“*q=%d\n”, *q); q=last_element-1; printf(“*q=%d\n”, *q); q=last_element-2, printf(“*q=%d\n”, *q); return 0; } $ gcc -o subtract5 -std=c99 -pedantic subtract5.c $ ./subtract5 *q=3 *q=2 *q=1
IV.2.5 Multiplication The multiplication operator denoted by the symbol * multiplies two arithmetic operands and returns the resulting numeric value. The following example multiplies two integer literals and stores the returning value in the variable v: $ cat mult1.c #include #include int main(void) { int v = 2*8;
printf(“v=%d\n”, v); return EXIT_SUCCESS; } $ gcc -o mult1 -std=c99 -pedantic mult1.c $ ./mult1 v=16
The following example multiplies two arithmetic literals and stores the resulting value into the variable v: $ cat mult2.c #include #include int main(void) { float v = 2 * 7.23; printf(“v=%f\n”, v); return EXIT_SUCCESS; } $ gcc -o mult2 -std=c99 -pedantic mult2.c $ ./mult2 v=14.460000
The following example multiplies an arithmetic literal by a variable and stores the resulting value in the variable w: $ cat mult3.c #include #include int main(void) { float v = 7.23; float w = 2.1 * v; printf(“w=%f\n”, w); return EXIT_SUCCESS; } $ gcc -o mult3 -std=c99 -pedantic mult3.c $ ./mult3 w=15.183000
IV.2.6 Division The division operator denoted by the symbol / divides two arithmetic operands and returns the resulting numeric value. The division operation works as you learned it in mathematics. However, we have to warn you this operation produces a result that may appear surprising if both operands are of integer type. We will explain in detail why when we talk about the rule called usual arithmetic conversions. If the operands in an operation (including division), expecting arithmetic types, are of integer types, the resulting value is also of integer type as shown below: $ cat div_op1.c #include #include int main(void) { int x = 1; int y = 3; float z = x/y; printf(“%f/%f=%f\n”, x, y, z); return EXIT_SUCCESS; }
Explanation: o int x = 1 declares the x variable as int type and sets it to 1. o int y = 3 declares the x variable as int type and sets it to 3. o float z = x/y declares the z variable as float and assigns it the output of the division x/y (i.e. 1/3). o The statement printf(“%f/%f=%.24f\n”, x, y, z) displays the result of the operation x/y held in the variable z. Intuitively, we would expect to obtain something like 0.333333. Let us run it: $ gcc -o div_op1 -std=c99 -pedantic div_op1.c $ ./div_op1 x/y=1.000000/3.000000=0.000000
We got the value of 0! Is it a bug? No. The rationale is none of the operands of the expression 1/3 were of type float but int. All happened as if we did something like this: $ cat div_op2.c #include #include
int main(void) { float z = 1/3; printf(“1/3=%f\n”, z); return EXIT_SUCCESS; } $ gcc -o div_op2 -std=c99 -pedantic div_op2.c $ ./div_op2 1/3=0.000000
The operation 1/3 divides the integral number 1 by the integral number 3: the type of the expression 1/3 is then also considered an integer (both the operands are of type int). If we used 1.0 (float type) instead of 1 (int type), we would have gotten this: $ cat div_op3.c #include #include int main(void) { float z = 1.0/3; printf(“1/3=%f\n”, z); return EXIT_SUCCESS; } $ gcc -o div_op3 -std=c99 -pedantic div_op3.c $ ./div_op3 1/3=0.333333
The same results would have been produced if we used the operand 3.0 instead of 3. What happened? The type of the operation 1.0/3 is now considered float because the type of the literal 1.0 is float. Symbolically, we could write this: type of expression 1.0/3 = float/int = float. You have two methods to tell the compiler you want to work with floating types: either by using floating literals or explicitly casting (explicit conversion) at least one of the two literals to a floating type. The following example forces the division to return a floating number by specifying literals as floating type: $ cat div_op4.c #include #include int main(void) {
float v = 3.0/2; float w = 3/2.0; float x = 3.0/2.0; printf(“v=%f, w=%f, x=%f\n”, v, w, x); return EXIT_SUCCESS; } $ gcc -o div_op4 -std=c99 -pedantic div_op4.c $ ./div_op4 v=1.500000, w=1.500000, x=1.500000
It worked as expected just by adding the fractional part 0! If in mathematics, 3.0 is same as 3, in C, there is a big difference: 3.0 has a real floating type while 3 is of integer type. In the second method (explicit conversion), we force the division to return a floating number by casting literals to type float: $ cat div_op5.c #include #include int main(void) { float v = (float)3/2; float w = 3/(float)2; float x = (float)3/(float)2; printf(“v=%f, w=%f, x=%f\n”, v, w, x); return EXIT_SUCCESS; } $ gcc -o div_op5 -std=c99 -pedantic div_op5.c $ ./div_op5 v=1.500000, w=1.500000, x=1.500000
In the following example, we divide two variables of type float: $ cat div_op6.c #include #include int main(void) { float v = 3; float w = 2; float x = v / w;
printf(“x=%f\n”, x); return EXIT_SUCCESS; } $ gcc -o div_op6 -std=c99 -pedantic div_op6.c $ ./div_op6 x=1.500000
You may think the example div_op2.c is same as div_op6.c, yet they are different. In example div_op2.c, we divided an integer number by another integer number. In example div_op6.c, we divided a floating number by another floating number. We assigned the integer literal 3 to the floating variable v: the statement float v = 3 means the integer literal 3 is converted to the target type float. The same process is done for the statement float w=2. That is, the variable v held a floating type: the division v/w returned a floating type. We would get the same result with the following code: $ cat div_op7.c #include #include int main(void) { float v = 3; int w = 2; float x = v / w; printf(“x=%f\n”, x); return EXIT_SUCCESS; } $ gcc -o div_op7 -std=c99 -pedantic div_op7.c $ ./div_op7 x=1.500000
Now, can you guess why the following example displays an incorrect value? $ cat div_op8.c #include #include int main(void) { printf(“1/3=%f\n”, 1/3); return EXIT_SUCCESS; } $ gcc -o div_op8 -std=c99 -pedantic div_op8.c $ ./div_op8
1/3=-547185123929…
The answer was given previously, the operation 1/3 outputs a number of integer type, which implies the value returned by the division 1/3 has not a floating type as expected by the printf() specifier %f. A correct code would be: $ cat div_op9.c #include #include int main(void) { printf(“1/3=%d\n”, 1/3); return EXIT_SUCCESS; } $ gcc -o div_op9 -std=c99 -pedantic div_op9.c $ ./div_op9 1/3=0
In summary, retain that a division returns a value of integer type if all of its operands have integer types.
IV.2.7 Modulo operator The modulo operator (also known as modulus operator or remainder operator) denoted by the symbol % takes two integer operands and returns an integer value that is the remainder of the integer division. A division involving two integer numbers i and j can be mathematically expressed like this: i/j=j*n+r. The remainder r is returned by the modulo operator %. For example: o 3/2 = 2*1+1. The integral part n=1 and the remainder r=1. o 7/3 = 3*2+1. The integral part n=2 and the remainder r=1. Here is a program coding this: $ cat modulo_op1.c #include #include int main(void) { int i = 3; int j = 2; int n = i / j; int r = i % j;
printf(“%d/%d=%d*%d+%d\n”, i, j, i, n, r ); return EXIT_SUCCESS; } $ gcc -o modulo_op1 -std=c99 -pedantic modulo_op1.c $ ./modulo_op1 3/2=3*1+1
The modulus operator seems to be of little interest…Can you imagine a simple method to determine if a number is odd or even? With the modulus operator, it is very easy: an even number p can be expressed as p=2*n where n is an integer number, which means if p%2 evaluates to 0, the number if even. Conversely, an odd number p can be expressed as p=2*n+1, which means if p%2 evaluates 1, the number if odd. More generally, an integer number p is multiple of an integer number q if p%q evaluates to 0. The example below reads the number you have typed, translates it into a number and tells if it is even or odd: $ cat modulo_op2.c 1 #include 2 #include 3 4 int main(int argc, char **argv) { 5 int n; 6 7 if (argc == 1) { 8 printf(“Please provide an argument\n”); 9 printf(“USAGE: %s n\n”,argv[0]); 10 return (EXIT_FAILURE); 11 } 12 13 n=atoi(argv[1]); 14 15 if ( n%2 == 0 ) { 16 printf( “%d is even\n”, n ); 17 } else { 18 printf( “%d is odd\n”, n ); 19 } 20 return (EXIT_SUCCESS); 21 } $ gcc -o modulo_op2 -std=c99 -pedantic modulo_op2.c $ ./modulo_op2 10 10 is even
Explanation: o Line 1: the header file stdio.h is included because we use the printf() function.
o Line 2: the header file stdlib.h is included because we use the function atoi() and the values EXIT_SUCCESS and EXIT_FAILURE. o Line 4: the function main() is declared with two arguments argc and argv. The integer number argc holds the number of arguments including the program name, and argv stores the arguments themselves. If you run the program with no argument, argc holds the value 1 (there is only the program name). If you pass one argument, argc stores the value 2 (program name and the argument you pass)…The pointer argv is a pointer to pointers to char (array of arrays of char). The array argv[0] stores the name of the program, argv[1] stores the first argument… o Line 5: The variable n is declared as type int. It will hold the value that the user passes to the program. o Line 7-Line 11: we test if an argument has been passed to the program. If argc has not given an argument, it holds the value of 1. In this case, we print a little help explaining how to run the program: argv[0] contains the name of the program. o Line 13: we convert the passed argument (stored as a string in argv[1]) into a number. o Line 15-16: we test if the number n is even: n%2 evaluates to 0. o Line 17-18: this code is executed if n%2 does not evaluate to 0.
IV.3 Relational operators [33] A relational operator takes two operands of real types , compares them and evaluates to an integer of type int. The operation evaluates to 1 if the comparison is true or 0 if false. In C, 0 means false, while any other value means true (whether it is negative or positive).
Table IV‑2 Relational Operators
Both operands can also be pointers to qualified or unqualified versions of compatibles object types. Here are some examples. Below, we compare integer literals: $ cat relop1.c #include #include int main(void) { int r1 = 3 > 2; int r2 = 2 > 3; int r5 = 2 >= 2; int r6 = 6 != 2; printf(“3>2 evaluates to %d\n”, r1 ); printf(“2>3 evaluates to %d\n”, r2 ); printf(“2>=2 evaluates to %d\n”, r5 ); printf(“6!=2 evaluates to %d\n”, r6 ); return EXIT_SUCCESS;
} $ gcc -o relop1 -std=c99 -pedantic relop1.c $ ./relop1 3>2 evaluates to 1 2>3 evaluates to 0 2>=2 evaluates to 1 6!=2 evaluates to 1
We can notice the relational operations are evaluated first, then, the resulting numeric value is assigned to the variable: relation operators take precedence over the assignment operator (=). The following example compares numeric values of different types: $ cat relop2.c #include #include int main(void) { printf(“3.2 > 2.9 evaluates to %d\n”, 3.2 > 2.9 ); printf(“2.1 > 2 evaluates to %d\n”, 2.1 > 2 ); printf(“8.7 2 evaluates to 1 8.7 %d evaluates to %d\n”, j, 5, j > 5 );
printf(“%f 2+7/3 ); printf(“%f*1.2-2 4 is performed. Relational operators are generally used in control flow constructs (for loop, while loop, if statement…). The following example prints the first six digits: $ cat relop5.c #include #include int main(void) { int max = 5;
int i = 0; while ( i 4 is equivalent to 160/24=10. He is an example showing what have said so far: $ cat bitwise_right_shift1.c #include #include int main(void) { unsigned char b = 160; int n; n = 1; printf(“%u >> %u = %u\n”, b, n, b >> n); n = 2; printf(“%u >> %u = %u\n”, b, n, b >> n); n = 3; printf(“%u >> %u = %u\n”, b, n, b >> n); n = 4; printf(“%u >> %u = %u\n”, b, n, b >> n); return EXIT_SUCCESS; } $ gcc -o bitwise_right_shift1 -std=c99 -pedantic bitwise_right_shift1.c $ ./bitwise_right_shift1 160 >> 1 = 80 160 >> 2 = 40 160 >> 3 = 20 160 >> 4 = 10
Of course, if we continue shifting the number, we will get 0: $ cat bitwise_right_shift2.c #include #include int main(void) {
unsigned char b = 160; int n; n = 6; printf(“%u >> %u = %u\n”, b, n, b >> n); n = 7; printf(“%u >> %u = %u\n”, b, n, b >> n); n = 8; printf(“%u >> %u = %u\n”, b, n, b >> n); n = 9; printf(“%u >> %u = %u\n”, b, n, b >> n); return EXIT_SUCCESS; } $ gcc -o bitwise_right_shift2 -std=c99 -pedantic bitwise_right_shift2.c $ ./bitwise_right_shift2 160 >> 6 = 2 160 >> 7 = 1 160 >> 8 = 0 160 >> 9 = 0
If the right operand n of the operation b >> n is negative, the result depends on the implementation. If the right number n of the operation b >> n is greater than or equal to its width, the resulting value is undefined: the implementation may choose to generate an error, ignore it leading to an unpredictable value or specify a specific behavior.
IV.6.4 Bitwise AND A & B
Where A and B are expressions evaluating to an integer value. The bitwise AND denoted by the ampersand symbol & is similar to the logical AND. It takes two integer numbers and applies the bitwise AND at bit-level according to the truth Table IV‑8.
Table IV‑8 Bitwise AND
Let us consider the decimal numbers 160 and 116. The bitwise AND operation 160 & 116 would yield 32. You cannot guess the result if you work with the decimal representation because the bitwise operation processes at bit-level. To understand how the operation works, you have to use the binary representation of the numbers. Let the numbers 160 and 116 be two integers of type unsigned char (fitting in eight bits). Since in our convention the most significant bit is on the left side, their binary representations are then respectively 101000002 and 011101002. In this case, the bitwise AND operation 16010 & 11610=101000002 & 011101002 would produce 001000002 that represents the decimal number 32 as depicted in Figure IV‑4.
Figure IV‑4 Bitwise AND
More generally, let A be an integer number represented by the binary number an-1an-2…a1a0 and B an integer number represented by the binary number bn-1bn-2…b1b0. Both the numbers fit in n bits. The operation A&B yields the binary number cn-1cn-2…c1c0, where cn1= an-1&bn-1, cn-1= an-1&bn-1 ,…, c0= a0&b0 according to the truth Table IV‑8. The following code gives some examples of bitwise AND operations: $ cat bitwise_AND.c #include
#include int main(void) { unsigned char a; unsigned char b; a = 160; b=116 ; printf(“%u & %u = %u\n”, a, b, a & b); a = 0; b=1 ; printf(“%u & %u = %u\n”, a, b, a & b); a = 1; b=1 ; printf(“%u & %u = %u\n”, a, b, a & b); return EXIT_SUCCESS; } $ gcc -o bitwise_AND -std=c99 -pedantic bitwise_AND.c $ ./bitwise_AND 160 & 116 = 32 0 & 1 = 0 1 & 1 = 1
IV.6.5 Bitwise inclusive OR A | B
Where A and B are expressions evaluating to an integer value.
Figure IV‑5 Bitwise OR
The bitwise OR denoted by the symbol | takes two integer numbers and operates on bits of each operand according to Table IV‑9. if A and B are two integer numbers fitting n bits represented respectively by the binary number an-1an-2…a1a0 and bn-1bn-2…b1b0, the operation A|B yields the binary number cn-1cn-2…c1c0, where cn-1= an-1|bn-1, cn-1= an-1|bn-1 , …, c0= a0|b0 according to the truth Table IV‑9.
Table IV‑9 Bitwise OR
For example, the OR operation 160 | 116 produces the value 244 as depicted in Figure IV‑5. The following code gives some examples of bitwise OR operations: $ cat bitwise_OR.c #include #include int main(void) { unsigned char a; unsigned char b; a = 160; b=116 ; printf(“%u | %u = %u\n”, a, b, a | b); a = 0; b=1 ; printf(“%u | %u = %u\n”, a, b, a | b); a = 1; b=1 ; printf(“%u | %u = %u\n”, a, b, a | b); return EXIT_SUCCESS; } $ gcc -o bitwise_OR -std=c99 -pedantic bitwise_OR.c $ ./bitwise_OR 160 | 116 = 244 0 | 1 = 1 1 | 1 = 1
IV.6.6 Bitwise exclusive OR (XOR) A ^ B
Where A and B are expressions evaluating to an integer value. The bitwise operator XOR denoted by the symbol ^ takes two integer numbers and operates on bits of operands according to Table IV‑10. if A and B are two integer numbers fitting n bits represented respectively by the binary number an-1an-2…a1a0 and bn-1bn-2…b1b0, the operation A^B yields the binary number cn-1cn-2…c1c0, where cn-1= an-1^bn-1, cn-1= an-1^bn-1 ,…, c0= a0^b0 according to the truth Table IV‑10.
Table IV‑10 Bitwise XOR
Figure IV‑6 depicts the operation 160 ^ 116 that produces the value 212.
Figure IV‑6 Bitwise XOR
The following code gives some examples of bitwise XOR operations: $ cat bitwise_XOR.c #include #include int main(void) {
unsigned char a; unsigned char b; a = 160; b=116 ; printf(“%u ^ %u = %u\n”, a, b, a ^ b); a = 0; b=1 ; printf(“%u ^ %u = %u\n”, a, b, a ^ b); a = 1; b=1 ; printf(“%u ^ %u = %u\n”, a, b, a ^ b); return EXIT_SUCCESS; } $ gcc -o bitwise_XOR -std=c99 -pedantic bitwise_XOR.c $ ./bitwise_XOR 160 ^ 116 = 212 0 ^ 1 = 1 1 ^ 1 = 0
IV.7 Address and dereferencing operators The operators * and & allow programmers to deal with pointers and arrays. If p is a pointer, p is variable holding a memory address to a storage area. Which implies you can have direct access to the memory address of the object pointed to by the pointer p but you cannot access directly the object pointed to by the pointer p. The indirect access (to the object itself) can be done through the unary operator *: *p represents the objet itself through the pointer p. The address of the object is first accessed, then, the object is accessed. Dereferencing the pointer p means accessing the object *p . You may have noticed the symbol * is used in three different ways that might lead to confusion: o It is used as a multiplication operator (binary operand) taking two operands. This operator has nothing to do with pointers. o It is used to declare a pointer such as int *p. The symbol * indicates the name following it is the identifier of the pointer. This has nothing to do with dereferencing. o It is used to dereference a pointer such as in the statement obj = *p. The unary operator * is used to access the object the pointer points to. The second operator related to pointers is the address-of operator denoted by a single ampersand &. Here again, we can see the C language uses the same symbol for different meanings: it denotes both the bitwise AND (binary operator) that takes two integer operands and the address-of operator that takes a single operand. When used as a unary operand, it evaluates to the address of its operand. That is, it converts an object to a pointer to this object: if obj is an object of type obj_type, &obj evaluates to a pointer of type
obj_type *. Of course, *(&obj) = obj…
Here is an example: $ cat pointers_op.c #include #include int main(void) { long u = 100L; long *p = &u; long v = *p; printf(“address p=%p, address &u=%p, v=%ld\n”, p, &u, v); return EXIT_SUCCESS; } $ gcc -o pointer_op -std=c99 -pedantic pointer_op.c $ ./pointer_op address p=feffeaa4, address &u=feffeaa4, v=100
IV.8 Increment and decrement operators IV.8.1 Prefix increment operator The prefix increment operator denoted by ++ is a unary operator placed before an [37] [38] operand of real or pointer type . It has the following form: ++var
If var is a variable, it increments it and evaluates to the resulting value. For example, if v=5, the expression ++v evaluates to 6 and v is set to this value as shown below: $ cat prefix_inc1.c include #include int main(void) { int v = 5; int w = ++v; printf(“v=%d and w=%d\n”, v, w);
return EXIT_SUCCESS; } $ gcc -o prefix_inc1 -std=c99 -pedantic prefix_inc1.c $ ./prefix_inc1 v=6 and w=6
The operand can be a real floating number: $ cat prefix_inc2.c #include #include int main(void) { float v = 5.2; float w = ++v; printf(“v=%f and w=%f\n”, v, w); return EXIT_SUCCESS; } $ gcc -o prefix_inc2 -std=c99 -pedantic prefix_inc2.c $ ./prefix_inc2 v=6.200000 and w=6.200000
If the operand is a pointer, the meaning is quite the same but not exactly. The unary operator ++ evaluates to the pointer to the next object and stores that address into the pointer. A another way to put it is if p is a pointer, the expression ++p is identical to p=p+1: if p holds the value addr, it sets the pointer p to the new address addr + sizeof *p and evaluates to that new pointer as depicted below: $ cat prefix_inc3.c 1 #include 2 #include 3 4 int main(void) { 5 int n = 3; 6 int *var = malloc(n * sizeof *var) ; 7 int *p; 8 9 var[0] = 10; 10 var[1] = 11; 11 var[2] = 17; 12 13 printf(“sizeof int=%d\n”, sizeof *var); 14 p=var; printf(“p=%p and var=%p. *p=%d and *v=%d\n”, p, var, *p, *var);
15 p=++var; printf(“p=%p and var=%p. *p=%d and *v=%d\n”, p, var, *p, *var); 16 p=++var; printf(“p=%p and var=%p. *p=%d and *v=%d\n”, p, var, *p, *var); 17 18 return EXIT_SUCCESS; 19} $ gcc -o prefix_inc3 -std=c99 -pedantic prefix_inc3.c $ ./prefix_inc3 sizeof int=4 p=80610d0 and var=80610d0. *p=10 and *v=10 p=80610d4 and var=80610d4. *p=11 and *v=11 p=80610d8 and var=80610d8. *p=17 and *v=17
Explanation: o Line 5: the variable n is the number of elements in the memory area we allocate in the next line. o Line 6: we declare var as a pointer to int and we initialize it with the address of the memory space allocated by the malloc() function. The allocated memory area can store n (set to 3) values of type int. o Line 7: we declare p as a pointer to int. It will be used to get the value returned by the expression ++var. o Line 9-11: we initialize the elements in the memory area allocated by malloc(). o Line 13: the size of the objects (int) pointed to by the pointer var is displayed: in our computer, a value of type int fits in 4 bytes (32 bits). o Line 14: the pointer p is assigned the value held in the pointer var. We display the addresses held in both the pointers through the printf() specifier %p along with the values they point to. In our computer, the pointer var stored the address 80610d0. o Line 15: the postfix expression ++var increments the pointer var by the size of the type it points to (int) and returns the newly computed address: it is the same as var = var + 1. In our computer, the operation produced the value 80610d0+4=80610d4 that is also assigned to the pointers p and var. The printf() function displays the addresses and the values the pointers var and p point to.
IV.8.2 Prefix decrement operator The prefix decrement operator denoted by — is a unary operator placed before an [39] operand of real or pointer type. It has the following form: —var
It decrements the value of the operand and evaluates to the resulting value. For example, if v=5, the expression —v evaluates to 4 and v is set to this value as shown below:
$ cat prefix_dec1.c #include #include int main(void) { int v = 5; int w = —v; printf(“v=%d and w=%d\n”, v, w); return EXIT_SUCCESS; } $ gcc -o prefix_dec1 -std=c99 -pedantic prefix_dec1.c $ ./prefix_dec1 v=4 and w=4
The operand can be a real floating number: $ cat prefix_dec2.c #include #include int main(void) { float v = 5.2; float w = —v; printf(“v=%f and w=%f\n”, v, w); return EXIT_SUCCESS; } $ gcc -o prefix_dec2 -std=c99 -pedantic prefix_dec2.c $ ./prefix_dec2 v=4.200000 and w=4.200000
If the operand is a pointer, the prefix decrement operation alters it to the address of the previous object and evaluates to a pointer holding that address: the expression —var is the same as the expression var=var-1. It sets the pointer var to the address var-sizeof *var and returns a pointer holding that value as depicted below: $ cat prefix_dec3.c 1 #include 2 #include 3 4 int main(void) { 5 int n = 3; 6 int *var = malloc(n * sizeof *var) ;
7 int *p_elt, *p; 8 9 var[0] = 10; 10 var[1] = 11; 11 var[2] = 17; 12 p_elt = &var[2]; 13 14 printf(“sizeof int=%d\n”, sizeof *var); 15 p=p_elt; printf(“p=%p and p_elt=%p. *p=%d and *p_elt=%d\n”, p, p_elt, *p, *p_elt); 16 p=—p_elt; printf(“p=%p and p_elt=%p. *p=%d and *p_elt=%d\n”, p, p_elt, *p, *p_elt); 17 p=—p_elt; printf(“p=%p and p_elt=%p. *p=%d and *p_elt=%d\n”, p, p_elt, *p, *p_elt); return EXIT_SUCCESS; } $ gcc -o prefix_dec3 -std=c99 -pedantic prefix_dec3.c $ ./prefix_dec3 sizeof int=4 p=80610d0 and p_elt=80610d0. *p=17 and *p_elt=17 p=80610cc and p_elt=80610cc. *p=11 and *p_elt=11 p=80610c8 and p_elt=80610c8. *p=10 and *p_elt=10
Explanation: o Line 5: the variable n is the number of elements in the memory area we allocate in the next line. o Line 6: we declare var as a pointer to type int and we initialize it with the address of the memory space allocated by the malloc() function. The allocated memory area can store n (set to 3) values of type int. o Line 7: we declare p and p_elt as a pointers to int. o Line 9-11: we initialize the elements in the memory area allocated by malloc(). o Line 12: the pointer p_elt is initialized to the address of the last element var[2]; o Line 14: the size of the object (of type int) pointed to by the pointer var is displayed: in our computer, a value of type int fits in 4 bytes (32 bits). o Line 15: the pointer p is assigned the value stored in p_elt. We display the addresses held in both the pointers p and p_elt. In our computer, the pointer var stored the value 80610d0. o Line 16: the postfix expression —p_elt decrements the pointer p_elt by the size of the type it points to (int) and evaluates to the resulting pointer: it is equivalent to the expression p_elt = p_elt - sizeof(int). In our computer, the operation produced the value 80610d0-4=80610cc that is then also assigned to the pointers p. The printf() function displays the addresses and the values the pointers p_elt and p point to.
Obviously, do not use invalid pointers. The following example contains an error: the last pointers are invalid: $ cat prefix_dec4.c #include #include int main(void) { int nb_element = 2; int *var = malloc(nb_element * sizeof *var) ; int *p_elt, *p; var[0] = 10; var[1] = 11; p_elt = &var[1]; printf(“sizeof int=%d\n”, sizeof *var); p=p_elt; printf(“p=%p and var=%p. *p=%d and *p_elt=%d\n”, p, p_elt, *p, *p_elt); p=—p_elt; printf(“p=%p and var=%p. *p=%d and *p_elt=%d\n”, p, p_elt, *p, *p_elt); /* the following pointers p and p_elt are invalid */ p=—p_elt; printf(“p=%p and var=%p. *p=%d and *p_elt=%d\n”, p, p_elt, *p, *p_elt); return EXIT_SUCCESS; } $ gcc -o prefix_dec4 -std=c99 -pedantic prefix_dec4.c $ ./prefix_dec4 sizeof int=4 p=80610cc and var=80610cc. *p=11 and *p_elt=11 p=80610c8 and var=80610c8. *p=10 and *p_elt=10 p=80610c4 and var=80610c4. *p=0 and *p_elt=0
IV.8.3 Postfix increment operator The postfix increment operator is a unary operator taking one operand pointer type. It follows its operand as shown below:
[40] having real or
var++
The expression var++ evaluates to the value stored in the operand var and then increments the value of var. For instance, if v=5, the expression v++ evaluates to the value 5 and then
alters the variable v to 6 as shown below: $ cat postfix_inc1.c #include #include int main(void) { int v = 5; int w = v++; printf(“v=%d and w=%d\n”, v, w); return EXIT_SUCCESS; } $ gcc -o postfix_inc1 -std=c99 -pedantic postfix_inc1.c $ ./postfix_inc1 v=6 and w=5
If the operand is a pointer, the operation evaluates to the value of its operand and then changes it to the address of the next object. That is, if var is a pointer, the expression var++ evaluates to the pointer var and then sets the value of the pointer var to var + sizeof *var as shown below: $ cat postfix_inc2.c #include #include int main(void) { int nb_element = 3; int *var = malloc(nb_element * sizeof *var) ; int *p; var[0] = 10; var[1] = 11; var[2] = 17; printf(“sizeof int=%d\n”, sizeof *var); printf(“var[0]=%d at address %p\n”, var[0], &var[0]); printf(“var[1]=%d at address %p\n”, var[1], &var[1]); printf(“var[2]=%d at address %p\n”, var[2], &var[2]); printf(“\nBefore postfix expression. var=%p. *v=%d\n”, var, *var); p=var++; printf(“After p=var++. p=%p and var=%p. *p=%d and *v=%d\n”, p, var, *p, *var); p=var++; printf(“After p=var++. p=%p and var=%p. *p=%d and *v=%d\n”, p, var, *p, *var);
p=var++; printf(“After p=var++. p=%p and var=%p. *p=%d\n”, p, var, *p); return EXIT_SUCCESS; } $ gcc -o postfix_inc2 -std=c99 -pedantic postfix_inc2.c $ ./postfix_inc2 sizeof int=4 var[0]=10 at address 8061200 var[1]=11 at address 8061204 var[2]=17 at address 8061208 Before postfix expression. var=8061200. *v=10 After p=var++. p=8061200 and var=8061204. *p=10 and *v=11 After p=var++. p=8061204 and var=8061208. *p=11 and *v=17 After p=var++. p=8061208 and var=806120c. *p=17
IV.8.4 Postfix decrement operator The postfix decrement operator works in the same way as the postfix increment operator but instead of incrementing the value of its operand its decrements it. It has the following form: var—
The expression var— evaluates to the value of var and then decrements the value of var. For instance, if v=5 then the expression v—evaluates to 5 and v contains 4 as shown below $ cat postfix_dec1.c #include #include int main(void) { int v = 5; int w = v—; printf(“v=%d and w=%d\n”, v, w); return EXIT_SUCCESS; } $ gcc -o postfix_dec1 -std=c99 -pedantic postfix_dec1.c $ ./postfix_dec1 v=4 and w=5
If the operand is a pointer, the operation evaluates to the pointer and then changes it to the
address of the previous object. That is, if var is a pointer, the expression var— evaluates to the pointer var and then sets it to the value var - sizeof *var as shown below: $ cat postfix_dec2.c #include #include int main(void) { int nb_element = 3; int *var = malloc(nb_element * sizeof *var) ; int *p, *p_elt; var[0] = 10; var[1] = 11; var[2] = 17; p_elt = &var[2]; printf(“sizeof referenced objects=%d Bytes\n”, sizeof *var); printf(“var[0]=%d at address %p\n”, var[0], &var[0]); printf(“var[1]=%d at address %p\n”, var[1], &var[1]); printf(“var[2]=%d at address %p\n”, var[2], &var[2]); printf(“\nBefore postfix expression. Last element p_elt=%p. *p_elt=%d\n”, p_elt, *p_elt); p=p_elt—; printf(“After p=p_elt—. p=%p and p_elt=%p. *p=%d and * p_elt=%d\n”, p, p_elt, *p, * p_elt); p=p_elt—; printf(“After p=p_elt—. p=%p and p_elt=%p. *p=%d and * p_elt=%d\n”, p, p_elt, *p, * p_elt); return EXIT_SUCCESS; } $ gcc -o postfix_dec2 -std=c99 -pedantic postfix_dec2.c $ ./postfix_dec2 sizeof referenced objects=4 Bytes var[0]=10 at address 80611d8 var[1]=11 at address 80611dc var[2]=17 at address 80611e0 Before postfix expression. Last element p_elt=80611e0. *p_elt=17 After p=p_elt—. p=80611e0 and p_elt=80611dc. *p=17 and * p_elt=11 After p=p_elt—. p=80611dc and p_elt=80611d8. *p=11 and * p_elt=10
IV.8.5 Subscript operator When we talked about arrays and pointers, we said there were two methods to access an
object stored in an array or in an memory area pointed to by a pointer: by using the operator [] or *. The operator denoted by [], known as a subscript operator, takes two operands: the operand preceding the left square bracket is the name of a pointer or an array, and the operand between the square brackets is an expression that evaluates to an integer number. It evaluates to an element of an array. The general form is given below: arr[E]
Where: o arr is the name of an array or a pointer o E is an expression that evaluates to an integer value. If the expression E evaluates to the integer number n, arr[n] denotes the object located at index n-1 of the array arr. If the expression E evalues to an integer n, the expression arr[n] is equivalent to *(arr + n). Here is an example: $ cat subscript1.c #include #include int main(void) { int nb_element = 3; int *iList = malloc(nb_element * sizeof *iList) ; iList[0] = 10; iList[1] = 11; iList[2] = 17; printf(“iList[0]=%d\n”, iList[0]); printf(“iList[1]=%d\n”, iList[1]); printf(“iList[2]=%d\n”, iList[2]); return EXIT_SUCCESS; } $ gcc -o subscript1 -std=c99 -pedantic subscript1.c $ ./subscript1 iList[0]=10 iList[1]=11 iList[2]=17
We can use the postfix increment operator to produce a program that is equivalent:
$ cat subscript2.c #include #include int main(void) { int nb_element = 3; int *iList = malloc(nb_element * sizeof *iList) ; int i = 0; iList[i] = 10; i++; iList[i] = 11; i++; iList[i] = 17; i=0; printf(“iList[0]=%d\n”, iList[i]); i++; printf(“iList[1]=%d\n”, iList[i]); i++; printf(“iList[2]=%d\n”, iList[i]); return EXIT_SUCCESS; } $ gcc -o subscript2 -std=c99 -pedantic subscript2.c $ ./subscript2 iList[0]=10 iList[1]=11 iList[2]=17
IV.8.6 sizeof sizeof E sizeof(obj_type)
Where: o E is an expression. Parentheses around the expression can be omitted but if E contains several operators, you may have to resort to parentheses to prevent the sizeof operator to take precedence over the operators of the expression. o obj_type is a type name. The sizeof operator takes a single operand and returns its size in byte. The type of the value returned by the sizeof operator is size_t that is an unsigned integer defined by the implementation.
The operand can be a type or an expression. If the operand is a type, it must be surrounded by parentheses. If the operand is an expression, it returns the size of the type of the expression. Take note you may have to use parentheses around the expression if it is composed of operators: the sizeof operator may have precedence over other operators. Here is an example: $ cat sizeof_op1.c #include #include int main(void) { int x =10; double f = 1.2; printf (“sizeof(int)=%d\n”, sizeof(int)); printf (“sizeof(double)=%d\n”, sizeof(double)); printf (“sizeof x=%d\n”, sizeof x); printf (“sizeof f=%d\n”, sizeof f); printf (“sizeof(x + 1)=%d\n”, sizeof(x + 1) ); printf (“sizeof(f + 1)=%d\n”, sizeof(f + 1) ); return EXIT_SUCCESS; } $ gcc -o sizeof_op1 -std=c99 -pedantic sizeof_op1.c $ ./sizeof_op1 sizeof(int)=4 sizeof(double)=8 sizeof x=4 sizeof f=8 sizeof(x + 1)=4 sizeof(f + 1)=8
In the example above, we surrounded the expression x+1 and f+1 with parentheses to prevent the sizeof operator from taking the precedence over the addition operation: the expression sizeof x + 1 operator would compute the size of the x variable, and then adds it to 1 as shown below: $ cat sizeof_op2.c #include #include
int main(void) { int x =10; printf (“sizeof(x + 1)=%d\n”, sizeof(x + 1) ); printf (“sizeof x + 1=%d\n”, sizeof x + 1 ); return EXIT_SUCCESS; } $ gcc -o sizeof_op2 -std=c99 -pedantic sizeof_op2.c $ ./sizeof_op2 sizeof(x + 1)=4 sizeof x + 1=5
It is interesting to note the operand of sizeof is evaluated only if it is a VLA (variable-length array). Otherwise, the operand is not evaluated and the value the sizeof expression is an [41] integer constant . Try this: $ cat sizeof_op3.c #include #include int main(void) { int x = 10; int y = sizeof(++x); printf (“x=%d\ny=%d\n”, x, y ); return EXIT_SUCCESS; } $ gcc -o sizeof_op3 -std=c99 -pedantic sizeof_op3.c $ ./sizeof_op3 x=10 y=4
As shown above, the expression ++x is not evaluated within the sizeof operator.
IV.9 lvalue We talked about lvalues in Chapter II Section II.9. Here, we refine our definition. Usually, in programming, the word lvalue refers to a modifiable variable that can appear on the left side of the assignment operator =. An rvalue is any expression that appears on the right side of the assignment operator: lvalue=rvalue. This implies an lvalue can be altered. In C,
such a definition is insufficient: an expression can be an lvalue and an lvalue may not alterable! An lvalue is an expression that refers to an object. That is, it refers to a storage region identified by an address that can hold a piece of data. Practically, if you can get the address of the resulting value of an expression that represents an object, it is an lvalue. For example: o a variables is an lvalue o a pointer is an lvalue o if p is a pointer, *p is an lvalue o an array is an lvalue o If p is pointer, the expression *(p+1) is an lvalue since *(p+1) refers to an object. The following items are not lvalues: o The constant 12 is not an lvalue o If v is a variable, the expression v+1 is not an lvalue: v+1 does not refer to an object but to a value of an expression. If you try to do something like this &(v+1), you will get an error. o If f is a function, f is not an lvalue: it does not refer to an object but a piece of code. o If v is an lvalue, &v is not an lvalue but the value of an expression that is the address of the lvalue. o If v is an lvalue, sizeof v is not an lvalue but the value of an expression that is the size of the lvalue. The following example fails to compile: $ cat lvalue1.c #include #include int main(void) { int v; v+1=10; /* fails: not lvalue */ 12 = 1; /* fails: not a lvalue */ &v=10; /* fails: not a lvalue */ return EXIT_SUCCESS; }
$ gcc -o lvalue1 -std=c99 -pedantic lvalue.c lvalue.c: In function ‘main’: lvalue.c:7:3: error: lvalue required as left operand of assignment lvalue.c:8:3: error: lvalue required as left operand of assignment lvalue1.c:9:3: error: lvalue required as left operand of assignment
In C, some lvalues are not alterable: o Arrays cannot be altered o Constant variables and pointers (declared with the type qualifier const) o Structures and unions having members declared with the type qualifier const are not modifiable (see Chapter VI) o lvalues that have incomplete type other than void (see Chapter VIII Section VIII.6.3.2) The following example attempts to modify lvalues that are not modifiable: $ cat lvalue2.c #include #include int main(void) { int const v; /* constant variable: read-only lvalue */ /* structure my_int containing a read-only member called i */ struct my_int { int const i; } str; v=10; /* fails: not modifiable lvalue */ str.i = 10 ; /* fails: not modifiable lvalue */ return EXIT_SUCCESS; } $ gcc -o lvalue2 -std=c99 -pedantic lvalue2.c lvalue2.c: In function ‘main’: lvalue2.c:12:3: error: assignment of read-only variable ‘v’ lvalue2.c:13:3: error: assignment of read-only member ‘i’
There is an important rule that you have to keep in mind in order to understand the underlying logics of conversions: qualifiers are discarded from the type of the value of an lvalue. An lvalue has a type and evaluates to a value. If the lvalue has a qualified type, its
value has an unqualified version of that type. Otherwise, if the lvalue has not a qualified type, both the lvalue and its value have the same type. For example: int x = 10; int y = x ; // x is an lvalue, its value 10 has the same type int const int v = 10; int w = v ; /* v is an lvalue, it has the const-qualified type const int, but its value is of type int */ int *const p = &x; int *q = p ; /* p is an lvalue, it has the const-qualified type int *const, but its value is of type int * */
IV.10 Assignment operators The C language specifies several ways to assign a value resulting from the evaluation of expressions to a variable. We first start with the simple assignment that we have already studied.
IV.10.1 Simple assignment Assigning a value of an expression to an lvalue takes the following form: var=expr
Where: o var is an lvalue such as the name of a variable, element of an array or a pointer… Anything that stores a value can be put on the left side of the assignment operator. o expr is an expression The simple assignment is composed of three elements: the operator =, an lvalue located on the left hand of the operator and an rvalue on the right hand of the operator. Keep in mind, the simple assignment operation performs two tasks: o It evaluates the rvalue and assigns its value to the lvalue. o It evaluates to the value of the rvalue. This means that the assignment expression evaluates to the value of expr.
As a consequence, since c=1 also evaluates the value of 1, we could write something like a=b=c=1 as shown below: $ cat assign_op1.c #include #include int main(void) { int a,b,c,d; a=b=c=d=10; printf (“a=%d, b=%d, c=%d, d=%d\n”, a, b, c, d); return EXIT_SUCCESS; } $ gcc -o assign_op1 -std=c99 -pedantic assign_op1.c $ ./assign_op1 a=10, b=10, c=10, d=10
The rvalue can be an expression much more sophisticated than a simple variable or literal: it can be composed of several operations. $ cat assign_op2.c #include #include int main(void) { float f; float v = 1.9; f=10*2.7/v-2; printf (“f=%f\n”, f); return EXIT_SUCCESS; } $ gcc -o assign_op2 -std=c99 -pedantic assign_op2.c $ ./assign_op2 f=12.210526
While assigning a value to an lvalue, an implicit cast may occur. The assignment operation evaluates the rvalue, casts its value (if it can) according to the type of the lvalue, then assigns the value to the lvalue and returns it. In the following example, the value of the expression v+1.2 is converted to type int that is the type of the variable j:
$ cat assign_op3.c #include #include int main(void) { float f; float v = 1.3; int i; i = f = v + 1.2; printf( “f=%f and i=%d\n”, f, i ); f = i = v + 1.2; printf( “f=%f and i=%d\n”, f, i ); return EXIT_SUCCESS; } $ gcc -o assign_op3 -std=c99 -pedantic assign_op3.c $ ./assign_op3 f=2.500000 and i=2 f=2.000000 and i=2
Can you see the difference between the two simple assignment operations? o Let us consider the first expression i = f = v + 1.2. First, the expression v + 1.2 evaluated to the floating number 2.5. In the second step, that value was assigned to the variable f having the type float (no cast). The simple assignment itself evaluates to the value 2.5. Then, that value was cast to type int to yield the integer number 2 that was finally assigned to the variable i of type int. o The same process occurred for the second expression f = i = v + 1.2. First, the expression v + 1.2 evaluated to the floating number 2.5. In the second step, that value was cast to type int to yield the integer number 2 before being assigning to the variable i having the type int (implicit cast). That assignment returned the integer number 2 that was finally assigned to the variable f. In the following program, we assign a variable and we test the value of another variable in the same relational expression: $ cat assign_op4.c #include #include int main(void) { int const val = 4; int x; int y = 8;
(x=val) < y ? printf(“y=%d and x = %d. y > x\n”, y, x) : printf(“y=%d and x = %d. y < x\n”, y, x) ; return EXIT_SUCCESS; } $ gcc -o assign_op4 -std=c99 -pedantic assign_op4.c $ ./assign_op4 y=8 and x = 4. y > x
The simple assignment operator can work with other types than arithmetic values such as pointers, strings, or user-defined types we will describe later. In the following example, the lvalue is an array: $ cat assign_op5.c #include #include int main(void) { char a[20] = “Wonderful”; printf(“a=%s\n”, a); return EXIT_SUCCESS; } $ gcc -o assign_op4 -std=c99 -pedantic assign_op5.c $ ./assign_op5 a=Wonderful
As we explained it in details, you can assign a string literal to an array only at the time of declaration. The following example is not equivalent to the previous one. It is erroneous and cannot be compiled: $ cat assign_op6.c #include #include int main(void) { char a[20]; a = “Wonderful”; printf(“a=%s\n”, a); return EXIT_SUCCESS;
} $ gcc -o assign_op6 -std=c99 -pedantic assign_op6.c assign_op6.c: In function ‘main’: assign_op6.c:7:5: error: incompatible types when assigning to type ‘char[20]’ from type ‘char *’ a = “Wonderful”; ^
After the declaration of an array, you can no longer assign it a value: you can only assign its elements individually or invoking a copy function such as strcpy() to copy data into it. Pointers in assignment operations work as variables. The following assignment involves a pointer: $ cat assign_op7.c #include #include int main(void) { char *p = “Wonderful”; printf(“p=%s\n”, p); return EXIT_SUCCESS; }
In the example, the pointer p pointed to the string literal “Wonderful”. That is, the address of the string literal was assigned to the pointer p. This should not be confused with the previous example in which the string literal “Wonderful” was copied into the array a. You may be tempted to write cryptic programs as you master the C language. Remember, it is always better to have a program easy to be read…The C language allows you do perform several tasks in a very condensed way and this could be a problem when you will have to debug your programs if you abuse of this facility.
IV.10.2 Compound assignments The C language specifies several compound assignments that are just handy shortcuts. They take the following form: var op= expr
Where: o op is one of the following arithmetic operators: +, -, /, %, *, ^, |, &, >.
o expr is an expression. o var is an lvalue that can be a variable, an element of array or a pointer… The syntax is equivalent to var = var op expr. For example, x += 1 is the same as x = x + 1 that means incrementing the value of the variable x and placing the result in it, which is also the value of the expression. In the examples given in Table IV‑11, the x variable holds the value of 2 before the assignments.
Table IV‑11 Compound assignments
Here is an example: $ cat compound_assign_op1.c #include #include int main(void) { int x; x = 2; x += 5; printf(“x = 2; x += 5; x=%d\n”, x); x = 2; x *= 2; printf(“x = 2; x *= 2; x=%d\n”, x); x = 2; x %= 2; printf(“x = 2; x %%= 2; x=%d\n”, x);
return EXIT_SUCCESS; } $ gcc -o compound_assign_op1 -std=c99 -pedantic compound_assign_op1.c $ ./compound_assign_op1 x = 2; x += 5; x=7 x = 2; x *= 2; x=4 x = 2; x %= 2; x=0
IV.11 Ternary conditional operator The ternary conditional operation takes three operands and returns the value of an operand. It has the following syntax: condition ? expr:alternate_expr
Where: o The first operand condition is an expression that evaluates to true (nonzero value) or false (zero). However, be aware that the expression cannot contain assignment operators unless they lie in parentheses (see section IV.13). o expr is an expression. o alternate_expr is an expression but not any expression as the second operand. It cannot contain assignment operators unless they are between parentheses because they ternary operator has precedence over assignment operators as we will find it out in section IV.13. o The value of the ternary expression is either the value of expr or alternative_expr depending on the expression condition o Blanks around ? and : are permitted o Newlines after ? and after : are permitted. Thus, if the expression condition is true (any nonzero value), the expression expr is evaluated and the ternary expression takes this value. Otherwise, the value of the expression is alternate_expr is taken. Here is a very basic example: $ cat ternary_cond_op1.c #include #include int main(void) { char *s;
int x; x=0; s = x ? “TRUE” : “FALSE” ; printf (“if x=%d, s=%s\n”, x, s); x=10; s = x ? “TRUE” : “FALSE” ; printf (“if x=%d, s=%s\n”, x, s); x=-1; s = x ? “TRUE” : “FALSE” ; printf (“if x=%d, s=%s\n”, x, s); } $ gcc -o ternary_cond_op1 -std=c99 -pedantic ternary_cond_op1.c $ ./ternary_cond_op1 if x=0, s=FALSE if x=10, s=TRUE if x=-1, s=TRUE
In the example above, we notice the ternary condition operator has precedence over the simple assignment operator. That is, it is evaluated before the assignment occurs. In our example, the ternary condition operator evaluates to a string but it can return any value depending on its operand. In the following example, it may return a float or an int: $ cat ternary_cond_op2.c 1 #include 2 #include 3 #include 4 5 int main(int argc, char **argv) { 6 char *program_name = argv[0]; 7 char *type_pi; 8 float pi; 9 10 if (argc < 2) { 11 printf(“USAGE: %s {int|float}\n”, program_name ); 12 printf(“argument can be int or float\n”); 13 return EXIT_FAILURE; 14 } 15 16 type_pi = argv[1]; 17 if ( strcmp(type_pi, “int”) && strcmp(type_pi, “float”) ) { 18 printf(“USAGE: %s {int|float}\n”, program_name ); 19 printf(“Unknown argument %s. Argument must be int or float\n”, type_pi); 20 return EXIT_FAILURE; 21 } 22 23 pi = !strcmp(type_pi, “int”) ? 3 : 3.14159; 24 printf (“pi=%f\n”, pi); 25
26 return EXIT_SUCCESS; 27 } $ gcc -o ternary_cond_op2 -std=c99 -pedantic ternary_cond_op2.c $ ./ternary_cond_op2 int pi=3.000000 $ ./ternary_cond_op2 float pi=3.141590
Explanation: o Line 5: the main() function is defined with two arguments. The first one argc is meant for storing the number of arguments of the program including the program name. The second argument argv is an array of strings that will store the arguments: argv[0] holds the program name, argv[1] the first argument… o Lines 10-14: since the program expects one argument, we check the user has actually provided one. Otherwise, a little help is displayed explaining how to use the program. o Line 16: We store the first argument argv[1] in the variable type_pi. o Lines 17-21: The logical relation strcmp(type_pi, “int”) && strcmp(type_pi, “float”) returns 0 if the variable type_int holds a string different from “int” and “float”. In this case, we display a message indicating the expected argument has to be the string float or int. o Line 23: the ternary operation returns 3 if the passed argument is int. Otherwise, it returns 3.14159. The returned value is assigned to the pi variable. o Line 24: we display the value of the variable pi. Keep in mind that the first and the third operand are particular expressions. Assignment operations are part of them only if they are enclosed between parentheses. Let us consider the following example: $ cat ternary_cond_op3.c #include #include int main(void) { int x, y=10; float f; f = x = y ? 3.14159 : 3 ; printf (“x=%d,y=%d and f=%f\n”, x, y, f); return EXIT_SUCCESS; } $ gcc -o ternary_cond_op3 -std=c99 -pedantic ternary_cond_op3.c
$ ./ternary_cond_op3 x=3,y=10 and f=3.000000
In our example above, the first operand is not x = y as you may think but y. The expression f = x = y ? 3.14159 : 3 is equivalent to f = x = (y ? 3.14159 : 3). Since y is different from zero, the ternary operation evaluates to 3.14159 and since x has an integer type, an implicit cast is performed. Thus, the value 3 is stored in x and then in f. Compare with the following code: $ cat ternary_cond_op4.c #include #include int main(void) { int x, y=10; float f; f = (x = y) ? 3.14159 : 3 ; printf (“x=%d,y=%d and f=%f\n”, x, y, f); return EXIT_SUCCESS; } $ gcc -o ternary_cond_op4 -std=c99 -pedantic ternary_cond_op4.c $ ./ternary_cond_op4 x=10,y=10 and f=3.141590
In example ternary_cond_op4.c, the first operand of the ternary operator is (x = y). The first operand is evaluated, the variable x is assigned the value of the variable y and the expression evaluates to the value taken from y. Since the expression evaluates to 10, a value different from zero, the ternary operation evaluates to the value of the second expression 3.14159 that is finally assigned to the variable f. You can use assignment operations in the second operand without resorting to parentheses: $ cat ternary_cond_op5.c #include #include int main(void) { int x, y=10; float f;
f = y ? x = 3 : 3.14159; printf (“x=%d,y=%d and f=%f\n”, x, y, f); return EXIT_SUCCESS; } $ gcc -o ternary_cond_op5 -std=c99 -pedantic ternary_cond_op5.c $ ./ternary_cond_op5 x=3,y=10 and f=3.000000
IV.12 Comma operator expr1,expr2,…,expr3
Where: o expr1, expr2,…, exprN are expressions. The expressions expr1, expr2,…, and exprN are executed sequentially. The value of the comma expression is the value of the last expression exprN. The comma operator has the lowest precedence (see next section). The comma operator has nothing to do with the comma separator used in declarations. In the following example, we declare three variables as int using a comma that is not a comma operator. $ cat comma_op1.c #include #include int main(void) { int x, y=10, z=9; return EXIT_SUCCESS; }
In the following example, we use the comma operator between two expressions executed sequentially: $ cat comma_op2.c #include #include int main(void) {
int i, x, y; i = ( x=1+2, y=2*10 ); /* comma operator */ printf(“x=%d, y=%d, i=%d\n”, x, y, i); return EXIT_SUCCESS; } $ gcc -o comma_op2 -std=c99 -pedantic comma_op2.c $ ./comma_op2 x=3, y=20, i=20
We used the parentheses because the assignment operator has precedence over the comma operator. The comma operator is not often used. It is sometimes used in the for loop described in the next chapter.
IV.13 Operator precedence The C language allows you to build expressions involving several operators. The problem is in which order will the computer perform the calculations? For example, without any specific rule, the expression 2*6+2 may be evaluated in two ways: o If the addition is performed first, the expression evaluates to 16: 2*6+2=2*8=16. o If the multiplication is carried out first, the expression evaluates to 14: 2*6+2=12*2=14 Accordingly, in the same way as we do it in mathematics, we define precedence for operators. In C, we have precedence rules indicating the evaluation order of operations. For example, in C, as in mathematics, the multiplication operator has precedence over addition, so, 2*6+2 evaluates to 14. Table IV‑12 lists the operators from the highest to lowest precedence.
Table IV‑12 Operator precedence in decreasing order
In Table IV‑12, E1, E2, E are expressions and var is an lvalue (variable, element of an array…). You can notice we introduced two new operators that will talk about at Chapter VI: the member-access operators . and ->. They allow accessing members of unions and structures. The following example shows the increment operators take precedence over the multiplication operator: $ cat precedence_op1.c #include #include int main(void) { int a = 1 ; int b = 2 * a++;
int c = 1; int d = 2 * ++c; printf(“a=%d and b = %d\n”, a, b); printf(“c=%d and d = %d\n”, c, d); return EXIT_SUCCESS; } $ ./precedence_ip1 a=2 and b = 2 c=2 and d = 4
The parentheses allow you to modify the operator precedence. For example, 2 * 6 + 2 evaluates to 14. With parentheses, you can change the precedence by evaluating the addition first. Thus, 2 * (6+2) evaluates to 16.
If you are in doubt about evaluation order in expressions, use parentheses. Also reset to parentheses to ease the reading
How do expressions evaluate if operators have the same precedence? For certain operators such as addition, this is not a problem: it evaluates to the same value whatever the evaluation order may be (for example, 1+2+9). However, the evaluation order is relevant for other operations such as the division: for example 12/2/2/2. To resolve the issue, the associativity is used to specify the evaluation order: from left to right (left associativity) or from right to left (right associativity). For instance, since the associativity of the division operator is left, the expression 12/3/2/2 is equivalent to ((12/3)/2)/2 which evaluates to 1. Let us consider another example: $ cat precedence_op2.c #include #include int main(void) { int a = 1; int b; int d = 2 * (b=a); printf(“a = %d, b = %d and d = %d\n”, a, b, d);
return EXIT_SUCCESS; } $ gcc -o precedence_op2 -std=c99 -pedantic precedence_op2.c $ ./precedence_op2 a = 1, b = 1 and d = 2
The expression d = 2 * (b=a) is evaluated in several steps: o Parentheses takes precedence over the multiplication: the expression b=a is evaluated first. The variable b is assigned the value of the variable a. Then, the expression evaluates to the value of the variable a that is 1. Thus, b holds the value 1 and the expression b=a evaluates to 1. o The multiplication operation d = 2 * (b=a) evaluates to 2 * 1 = 2. Therefore, d holds the value 2. You could wonder why we have used the parentheses. Try the same example without parentheses: $ cat precedence_op3.c #include #include int main(void) { int a = 1; int b; int d = 2 * b=a; printf(“a = %d, b = %d and d = %d\n”, a, b, d); return EXIT_SUCCESS; } $ gcc -o precedence_op3 -std=c99 -pedantic precedence_op3.c precedence_op3.c: In function ‘main’: precedence_op3.c:7:4: error: lvalue required as left operand of assignment
The compilation failed. Can you see why? The compiler gave an explanation…If you have a look at Table IV‑12, you can notice the assignment operators have the lowest precedence and has a right associativity, which means the expression d = 2 * b=a is equivalent to d = ( (2 * b) = a ). The problem is the expression 2*b is not an lvalue. Consequently, the statement (2*b)=a is invalid. The error in the example above appears now more obvious. The following example shows the same symptom, yet it is not glaringly obvious:
$ cat precedence_op4.c #include #include int main(void) { int x = 6; int y = 7; int res; res = x > y ? x : x = y; printf(“x=%d y=%d res=%d\n”, x, y, res); return EXIT_SUCCESS; } $ gcc -o precedence_op4 -std=c99 -pedantic precedence_op4.c precedence_op4.c: In function ‘main’: precedence_op4.c:9:4: error: lvalue required as left operand of assignment
In the example above, the expression res = x > y ? x : x = y seems to be the same as: if ( x > y) { res = x; } else { res = x = y; }
However, this is not the case. Why? Because the third operand of the ternary operator is not x = y but x! Remember that the = operator is an assignment operator and its precedence is less than that of the ternary operator. Which means that x > y ? x : x = y is equivalent to (x > y ? x : x) = y. As you may have guessed, the ternary operation cannot be an lvalue and then generates a compilation error. Why is the expression res = x > y ? x : x = y not equivalent to ( res = (x > y ? x : x) ) = y but to res = ( (x > y ? x : x) ) = y)? The associativity of the simple assignment operator is right… Now, we can write a correct version of the example precedence_op4.c: $ cat precedence_op5.c #include #include
int main(void) { int x = 6; int y = 7; int res; res = x > y ? x : (x = y); printf(“x=%d y=%d res=%d\n”, x, y, res); return EXIT_SUCCESS; } $ gcc -o precedence_op5 -std=c99 -pedantic precedence_op5.c $ ./precedence_op5 x=7 y=7 res=7
OK, you have gotten it but why does the following code work without parentheses? $ cat precedence_op6.c #include #include int main(void) { int x = 6; int y = 7; int res; res = x < y ? x = y : x; printf(“x=%d y=%d res=%d\n”, x, y, res); return EXIT_SUCCESS; } $ gcc -o precedence_op6 -std=c99 -pedantic precedence_op6.c $ ./precedence_op6 x=7 y=7 res=7
A clue? If you remember what we said about the ternary condition operator, the first and third operands are not any expression: unlike the second operand, they cannot contain assignment operators unless they are between parentheses. The second operand can work with assignment operators without using parentheses.
IV.14 Type conversion We end the chapter with a very important point: the conversion of types. The subject may appear as tricky for beginners not because it is difficult but mainly because several kinds
of type conversions may be involved. Let us start with the integer conversion ranks and integer promotions.
IV.14.1 Integer conversion rank The C language has several integer types: char, signed char, unsigned char, short, unsigned short, int, unsigned int, long, unsigned long, long long, unsigned long long. In some specific conditions, described in the next section, the compiler automatically converts an integer type to another integer type of higher rank according to the conversion rank order depicted in Figure IV‑7.
Figure IV‑7 Integer conversion rank
In Figure IV‑7, we can see the type _Bool has the lowest conversion rank and the types char, signed char and unsigned char have same conversion rank… If an implementation introduces new types (extended types), they also have a conversion rank described by a documentation.
IV.14.2 Integer promotions [42] [43] In expressions expecting operands of arithmetic types, integer types of lower rank than that of type int are converted to int if their value can be held in an int or to unsigned int otherwise: this is known as integer promotions. In the following example, the operands a and b of type char are first promoted to type int before carrying out the addition: $ cat integer_promotion1.c #include #include int main(void) { char a = 120; char b = 120; int c; c = a + b; printf(“a=%d, b=%d, c=a+b=%d+%d=%d\n”, a, b, a, b, c); return EXIT_SUCCESS; } $ gcc -o integer_promotion1 -std=c99 -pedantic integer_promotion1.c $ ./integer_promotion1 a=120, b=120, c=a+b=120+120=240
In our computer, the type char is represented by one byte while int is represented by four bytes. The following example shows the addition promotes its operand to int and then evaluates to an int: $ cat integer_promotion2.c #include #include int main(void) { char a = 120; char b = 120; printf(“sizeof a=%d, sizeof b=%d, sizeof(a+b)=%d \n”, sizeof a, sizeof b , sizeof(a+b)); return EXIT_SUCCESS; } $ gcc -o integer_promotion2 -std=c99 -pedantic integer_promotion2.c $ ./integer_promotion2 sizeof a=1, sizeof b=1, sizeof(a+b)=4
Of course, the integer promotions are silently performed and you do not have to worry
about it. It is only the very first step of a process known as integer conversions. However, you must watch out for the integer conversions described in the next section because it may lead to unexpected behaviors when you mix unsigned and signed operands in your expressions.
IV.14.3 Conversions and unary operators Only the integer promotions apply to unary operators since they have a single operand: unary plus +, unary minus -, and unary bitwise not ~ (bitwise complement). If the operand has a type with lower rank than that of int, the integer promotions promote the operand to int or unsigned int as appropriate, which is also the type of the result. Though the bitwise shift operator is binary, only the integer promotions apply to its operands. The resulting value has the type of the left operand after the integer promotions. In the following example, the unary operator – promotes the integer types unsigned short and unsigned char to int before carrying out the operation. In all cases, the type of the expression is the type of the operand after the integer promotions. $ cat unary_promot1.c #include #include int main(void) { unsigned short h = 1; unsigned int i = 1; unsigned char j = 1; long long x; x = -h; printf(“x=%lld\n”, x); //h promoted to int, type of –h is int x = -i; printf(“x=%lld\n”, x); //No conversion. type of -i is unsigned int x = -j; printf(“x=%lld\n”, x); //j promoted to int, type of –j is int return EXIT_SUCCESS; } $ gcc -o unary_promot1 -std=c99 -pedantic unary_promot1.c $ ./unary_promot1 x=-1 x=4294967295 x=-1
IV.14.4 Conversions and binary operators Integer conversions, more generally usual arithmetic conversions, occur within expressions composed of binary operators. Consider the following example: $ cat integer_conversion1.c #include #include int main(void) { unsigned int a = 100; signed int b = -1; if (b < a) { printf(“%d < %d\n”, b, a); } else { printf(“%d > %d\n”, b, a); } return EXIT_SUCCESS; }
Could you guess the output? Here is it: $ gcc -o integer_conv1 -std=c99 -pedantic integer_conv1.c $ ./integer_conv1 -1 > 100
Incredible, isn’t it? Let us explain why…The cause: the integer conversions automatically yielded by the compiler. As explained earlier, the integer promotions convert an integer number smaller than int to int or unsigned int. After the integer promotions, integer conversions may take place: this happens within expressions mixing integer numbers of different types. After the integer promotions, the following rules are applied: o Rule 1: If the operands have the same type, no conversion is done and the resulting value has this type. o Rule 2: Otherwise, if the operands are all signed or all unsigned, the operand having a type with lower conversion rank is converted to the type of the operand having greater conversion rank that is also the type of the resulting value. o Otherwise, if the types unsigned and signed integer are mixed: ▪ Rule 3: If the unsigned integer operand has a type with conversion rank greater or
equal to that of the signed integer operand, the signed integer operand is converted
to the type of the unsigned integer operand that is also the type of the resulting value of the operation. ▪ Rule 4: Otherwise, if the signed integer operand has a type with greater conversion
rank than that of the unsigned integer operand, and can represent all the values of the type of the unsigned integer operand, the unsigned integer operand is converted to the type of the signed integer operand that is also the type of the resulting value of the operation. ▪ Rule 5: Otherwise, (if the signed integer operand has a type with greater
conversion rank than that of the unsigned integer operand, but cannot represent all the values of the type of the unsigned integer operand), both operands are converted to the unsigned version of the signed integer operand. The integer conversion rule given above is part of a more general rule known as usual arithmetic conversions (described in the next section). As the integer conversions are rather tricky, we have split it to ease the understanding. Once understood, the general rule for converting arithmetic operands will appear clearer. Let us give some examples depicting the integer conversions: o Rule 1: If the operands have the same type after the integer promotions, no conversion is done and the resulting value has this type. In the following, the integer promotions and integer conversions do not occur since both operands have the same type that has same rank than int. $ cat integer_conversion2.c #include #include int main(void) { unsigned int a = 100; unsigned int b = 1; if (b < a) { printf(“%d < %d\n”, b, a); } else { printf(“%d > %d\n”, b, a); } return EXIT_SUCCESS; }
o Rule 2: If the operands are all signed or unsigned, the operand having a type with lower conversion rank is converted to the type of the operand having greater conversion rank that is also the type of the resulting value.
$ cat integer_conversion3.c #include #include int main(void) { unsigned int a = 100; unsigned long long b = 1; printf(“sizeof a=%d sizeof b=%d sizeof(a+b)=%d\n”, sizeof a, sizeof b, sizeof(a+b)); printf(“%u + %llu = %llu\n”, a, b, a+b); return EXIT_SUCCESS; } $ gcc -o integer_conv3 -std=c99 -pedantic integer_conv3.c $ ./integer_conv3 sizeof a=4 sizeof b=8 sizeof(a+b)=8 100 + 1 = 101
The operand a of the expression a + b is converted to unsigned long long that is also the type of the returned value. o If unsigned and signed integer types are mixed: ▪ Rule 3: if the unsigned integer operand has a type with conversion rank greater or
equal to that of the signed integer operand, the signed integer operand is converted to the type of the unsigned integer operand that is also the type of the resulting value of the operation. In the following example, the operand b (operation a > b) is converted to unsigned int: $ cat integer_conversion4.c #include #include int main(void) { unsigned int a = 5; int b = -3; unsigned int c = (unsigned int)b; if ( a > b ) { /* a and b have type unsigned int */ printf(“%u > %d\n”, a, b); } else { printf(“%u < %d\n”, a, b);
} printf(“operand b=%d takes the value %u when converted to unsigned int\n”, b, c); return EXIT_SUCCESS; } $ gcc -o integer_conv4 -std=c99 -pedantic integer_conv4.c $ ./integer_conv4 5 < -3 operand b=-3 takes the value 4294967293 when converted to unsigned int
The operand b is negative, when converted to unsigned int, it takes the value 232[44] 3=4294967295 in our computer , which explains why the a variable seems to be less than the variable b. In fact, the evaluated expression is 5 > 4294967295 that is false. Of course, if the value of b was positive, all would be fine as shown below: $ cat integer_conversion5.c #include #include int main(void) { unsigned int a = 5; int b = 3; unsigned int c = (unsigned int)b; if ( a > b ) { /* a and b have type unsigned int */ printf(“%u > %d\n”, a, b); } else { printf(“%u < %d\n”, a, b); } printf(“operand b=%d takes the value %u when converted to unsigned int\n”, b, c); return EXIT_SUCCESS; } $ gcc -o integer_conv5 -std=c99 -pedantic integer_conv5.c $ ./integer_conv5 5 > 3 operand b=3 takes the value 3 when converted to unsigned int
A positive number of a signed integer type can be represented as an unsigned integer
type with no change but a negative number in a signed integer type is changed to a positive integer number after converting it to an unsigned integer type. Here is another example showing another unexpected behavior when mixing signed and unsigned integer types in a C expression: $ cat integer_conversion6.c #include #include int main(void) { unsigned int a = 1; int b = -2; unsigned int c = (unsigned int)b; long long int d = a + b; /* b converted to unsigned int */ long long int e = a + c; /* a and c have same type unsigned int */ printf(“d=a+b=%u+%d=%lld\n”, a, b, d); printf(“e=a+c=%u+%u=%lld\n”, a, c, e); return EXIT_SUCCESS; } $ gcc -o integer_conv6 -std=c99 -pedantic integer_conv6.c $ ./integer_conv6 d=a+b=1+-2=4294967295 e=a+c=1+4294967294=4294967295
In the expression d = a + b, the compiler performs two different conversions: − The integer promotions convert the operand b to unsigned int (the value of b
becomes 4294967295 in our computer), then the expression a + b is evaluated to 1 + 4294967295=4294967296 that is of type unsigned int − The resulting value (of type unsigned int) is implicitly converted to the type of the
lvalue d (long long int) that will store it (implicit cast). ▪ Rule 4: If the signed integer operand has a type with greater conversion rank than
that of the unsigned integer operand, and can represent all the values of the type of the unsigned integer operand, the unsigned integer operand is converted to the type of the signed integer operand that is also the type of the resulting value of the operation.
Unlike example integer_conversion4.c, the following example yields the expected result: $ cat integer_conversion7.c #include #include int main(void) { unsigned int a = 5; long long int b = -1; if ( a > b ) { /* a and b have type long long int */ printf(“%u > %d\n”, a, b); } else { printf(“%u < %d\n”, a, b); } return EXIT_SUCCESS; } $ gcc -o integer_conv7 -std=c99 -pedantic integer_conv7.c $ ./integer_conv7 5 > -1
It works as expected because the unsigned integer variable a is converted to type long long int. The conversion rank of long long int is greater than that of unsigned int. Moreover, in our computer, it is represented by eight bytes, which is enough to store the values of the type unsigned int (fitting in four bytes in our computer). As a consequence, the value of the variable b (negative number) remains unchanged while the operation a > b is evaluated. ▪ Rule 5: Otherwise, (if the signed integer operand has a type with greater
conversion rank than that of the unsigned integer operand, but cannot represent all the values of the type of the unsigned integer operand), both operands are converted to the unsigned version of the signed integer type. In the following example, we will meet the same problem as revealed by example integer_conversion8.c. $ cat integer_conversion8.c #include #include int main(void) { unsigned int a = 5;
long int b = -3; if ( a > b ) { printf(“%u > %d\n”, a, b); } else { printf(“%u < %d\n”, a, b); } return EXIT_SUCCESS; } $ gcc -o integer_conv8 -std=c99 -pedantic integer_conv8.c $ ./integer_conv8 5 < -3
Take note, only the integer promotions apply to operands of the bitwise shift operators. The type of the result is the type of the left operand after the integer promotions.
In summary, we can conclude that we may have expected behaviors when we mix signed and unsigned types and when signed operands have negative values. This means that you should avoid mixing signed and unsigned values unless you actually know what you are doing.
IV.14.5 Usual arithmetic conversions Now, you have understood the integer conversions, the general arithmetic conversion rule, known usual arithmetic conversions, will be very easy to catch. In C, an expression may involve several arithmetic operands of different types. For example, an addition operation can have one operand of type int and another one of type float as in the following example: $ cat arithmetic_conv1.c #include #include int main(void) { int a = 120; float b = 12.23;
printf(“a+b=%d+%f=%f\n”, a, b , a+b); return EXIT_SUCCESS; }
In such a case, we could wonder what could be the type of the value resulting from the addition involving an integer value and a floating value. The C standard gives specific rules known as usual arithmetic conversions. The process consists in converting all the arithmetic operands to a common type. This common type is also the type of the evaluated [45] value of the expression with the exception of the relational and equality operations (operators =, == and !=) that evaluates to type int. The usual arithmetic conversion affects arithmetic operations, relational operations, bitwise operations, logical operations and the ternary operation. When such operations involve operands having different arithmetic types, the following rules apply: o If an operand has type long double, the common type is long double. o Otherwise, if an operand has type double, the common type is double. o Otherwise, if an operand has type float, the common type is float. o Otherwise (operands have integer types), the integer promotions take place followed by the integer conversions. In the following example, the operand a is converted to type double: $ cat usual_conv1.c #include #include int main(void) { unsigned int a = 5; double b = -3; if ( a > b ) { /* a and b have type float */ printf(“%u > %f\n”, a, b); } else { printf(“%u < %f\n”, a, b); } return EXIT_SUCCESS; }
Both the operands a and b have the common type double before evaluating the expression a > b.
Now, let us check that you have understood the usual arithmetic conversions. Assume we had declared two variables a and b as integer types: a as short and b as char. Could you guess the type of the resulting value of the following operations? o Type of a + b? The resulting value has type int as shown below: $ cat usual_conv2.c #include #include int main(void) { short a = 120; char b = 120; printf(“%d + %d = %d\n”, a, b, a + b); printf(“sizeof(int)=%d, sizeof(char)=%d, sizeof(short)=%d, sizeof(a+b)=%d\n”, sizeof(int), sizeof(char), sizeof(short), sizeof(a+b)); return EXIT_SUCCESS; } $ gcc -o usual_conv2 -std=c99 -pedantic usual_conv2.c $ ./usual_conv2 120 + 120 = 240 sizeof(int)=4, sizeof(char)=1, sizeof(short)=2, sizeof(a+b)=4
o Type of a * b? The resulting value has type int as shown below: $ cat usual_conv3.c #include #include int main(void) { short a = 120; char b = 12; printf(“%d * %d = %d\n”, a, b, a * b); printf(“sizeof(int)=%d, sizeof(char)=%d, sizeof(short)=%d, sizeof(a*b)=%d\n”, sizeof(int), sizeof(char), sizeof(short), sizeof(a*b)); return EXIT_SUCCESS; }
$ gcc -o usual_conv3 -std=c99 -pedantic usual_conv3.c $ ./usual_conv3 120 * 120 = 14400 sizeof(int)=4, sizeof(char)=1, sizeof(short)=2, sizeof(a*b)=4
o What is the type of a / b? The resulting value has type int as shown below: $ cat usual_conv4.c #include #include int main(void) { short a = 30; char b = 20; printf(“%d / %d = %d\n”, a, b, a / b); printf(“sizeof(int)=%d, sizeof(char)=%d, sizeof(short)=%d, sizeof(a/b)=%d\n”, sizeof(int), sizeof(char), sizeof(short), sizeof(a*b)); return EXIT_SUCCESS; } $ gcc -o usual_conv4 -std=c99 -pedantic usual_conv4.c $ ./usual_conv4 30 / 20 = 1 sizeof(int)=4, sizeof(char)=1, sizeof(short)=2, sizeof(a/b)=4
In all of the three previous examples, the integer promotions convert the operands a and b to int, which is also the type of the resulting value of the operations. Same question if the variable a is declared as float and the variable b declared as char: o Type of a + b? After the integer promotions, b takes the type int. After the usual arithmetic conversions, both the operands a and b and the resulting value of the operation have type float. o What is the type of a * b? Same as above. o What is the type of a / b? Same as above.
IV.15 Constant expressions A constant expression is an expression that evaluates to a constant value known before the
startup of the program. It can be a constant or an operation composed of constant operands and operators. Since its value is evaluated at compile time, it is subject to some constraints. Not all operators can be used: are not allowed function calls and the operators increment (++), decrement (—), assignment (=), and comma (-) except when they are part of [46] an expression that is not interpreted . That is, a constant expression is a constant (literal or enumeration constant) or an operation composed of constants and allowed operators. Here are some constant expressions: o 10 o 1+28 o 2*9 o 2/7+1-7 o 2.9*7 o “Hello” o ‘H’ o sizeof(char) o sizeof(v) where v is a variable o &v where v is a variable A constant expression can evaluate to two kinds of constants: arithmetic constants and address constants.
IV.15.1 Arithmetic constant expression An arithmetic constant expression may evaluate to: o An integer constant such a 2 o A floating constant such as 1.207 An arithmetic constant expression can be an integer constant, a floating constant, a character literal (e.g. ‘H’), an enumeration constant (described in Chapter VI), sizeof expressions, or an operation composed of those constants as operands. Here is a piece of code with arithmetic constant expressions: #include #include enum bool_val { FALSE, TRUE }; // enumeration int b = TRUE; int c = ‘H’;
int i1 = 10; int i2 = 10*2; int i3 = 5 * sizeof(long); int i4 = sizeof(i1); float f = 3.14; int main(void) { printf(“%d %d %d %c %d %d %f\n”, i1, i2, b, c, i3, i4, f); return EXIT_SUCCESS; }
The sizeof operator evaluates to an integer constant unless the operand is a VLA (variablelength array). For example, before the main() function starts, at the end of the compilation, sizeof(char) is replaced by an integer constant while sizeof(arr) is evaluated at run time if arr is a VLA.
IV.15.2 Address constant [47] An address constant is a null pointer, a pointer to a static object , a pointer to a function. Here are five examples: #include #include char *p1 = “Literal string”; int *p2 = NULL; float *p3 = (float *)0; int v = 10; int *p4 = &v; int main(void) { printf(“%p %p %p %p\n”, p1, p2, p3, p4); return EXIT_SUCCESS; }
IV.16 Exercises Exercise 1. If x=5, y=6 and z=7, what is the value of the expression y < z = x ? Exercise 2. If x=7, y=6 and z=7, what is the value of the expression y < z == x ?
Exercise 3. If x=6, y=6 and z=5, what is the value of the expression x y\n”); } else { printf(“x < y\n”); }
Exercise 12. What would be the output of the following code snippets? unsigned long x = 2; signed char y = -1; if ( x > y ) { printf(“x > y\n”); } else { printf(“x < y\n”); }
Exercise 13. What would be the output of the following code snippets? unsigned long x = 2; float y = -1; if ( x > y ) { printf(“x > y\n”); } else { printf(“x < y\n”);
}
CHAPTER V CONTROL FLOW
V.1 Introduction Control flow statements are statements that break the normal flow of execution that consists in executing statements in the order they appear. Instead, they execute a set of statements if some conditions are met (if, while, for, switch) or just branch to another point in the program unconditionally (break, continue return). They will allow you to write programs that can perform the right actions depending on some conditions.
V.2 Statements A statement is a task telling the computer what to do. A set of statements can be grouped into braces (between { and }) to form a logical unit known a block or a compound statement: { statement1; statement2; … statementN; }
Where o statement1,…, statementN are statements. o Blanks (newlines, spaces and tabs) can be added before or after the braces ({ and }). o Blanks (newlines, spaces and tabs) can be added before or after any statement.
V.3 if statement The if statement executes a set of statements depending on a given condition. In its simplest form, it is composed of two parts: if (condition) block
Where: o condition is an expression. It is the selection condition.
o block is a set of statements between braces. However, if there is only one statement, braces can be omitted. If the expression condition evaluates to a value different from zero (meaning true), the set of statements block is executed. Here are some examples. o Example 1: In C, the value of 0 is treated as false. Any other value is considered true as shown below: $ cat if_statement1.c #include #include int main(void) { if (-1) printf(“-1 IS TRUE\n”); if (10) printf(“10 IS TRUE\n”); if (0) printf(“0 IS TRUE\n”); if (0.9) printf(“0.9 IS TRUE\n”); return EXIT_SUCCESS; } -1 IS TRUE 10 IS TRUE 0.9 IS TRUE
o Example 2: The selection condition can be a variable. $ cat if_statement2.c #include #include int main(void) { int v = 10; if (v) printf(“v=%d IS TRUE\n”, v); return EXIT_SUCCESS; } $ gcc -o if_statement2 -std=c99 -pedantic if_statement2.c $ ./if_statement2 v=10 IS TRUE
o Example 3: The selection condition can be an arithmetic operation.
$ cat if_statement3.c #include #include int main(void) { int v = 10; int w = -5; if (v + w) printf(“v+w=%d IS TRUE\n”, v+w); return EXIT_SUCCESS; } $ gcc -o if_statement3 -std=c99 -pedantic if_statement3.c $ ./if_statement3 v+w=5 IS TRUE
o Example 4: The selection condition can be a relational operation. $ cat if_statement4.c #include #include int main(void) { int v = 10; int w = -5; if ( v > w ) printf(“%d > %d IS TRUE\n”, v, w); return EXIT_SUCCESS; } $ gcc -o if_statement4 -std=c99 -pedantic if_statement4.c $ ./if_statement4 10 > -5 IS TRUE
o Example 5: The selection condition can be a logical operation. $ cat if_statement5.c #include #include int main(void) { int v = 10; int w = -5;
if ( v > 0 && v > w ) printf(“%d > 0 && %d > %d IS TRUE\n”, v, v, w); return EXIT_SUCCESS; } $ gcc -o if_statement5 -std=c99 -pedantic if_statement5.c $ ./if_statement5 10 > 0 && 10 > -5 IS TRUE
o Example 6: The selection condition can be an assignment. $ cat if_statement6.c #include #include int main(void) { int v = 5; int w = -5; if ( v = w ) printf(“v holds now value %d\n”, v); return EXIT_SUCCESS; } $ gcc -o if_statement6 -std=c99 -pedantic if_statement6.c $ ./if_statement6 v holds now value -5
In the example above, the expression v = w assigns the value of the variable w (i.e. -5) to the variable v and then evaluates that value. Thus, if w holds a value different from zero, the condition v = w is considered true. Example it_statement6.c must not be confused with the following one that compares the value of v with the value of w: $ cat if_statement7.c #include #include int main(void) { int v = 5; int w = -5; if ( v == w ) printf(“v holds value %d\n”, v);
return EXIT_SUCCESS; }
The block of the if statement may contain several statements. In this case, the statements must be enclosed between braces: $ cat if_statement8.c #include #include #include int main(void) { char s1[40] = “IF statement”; char s2[80] = “IF statement”; if ( !strcmp(s1, s2) ) { printf(“The arrays s1 and s2 hold the same string\n”); printf(“s1=%s\n”, s1); } return EXIT_SUCCESS; } $ gcc -o if_statement8 -std=c99 -pedantic if_statement8.c $ ./if_statement8 The arrays s1 and s2 hold the same string s1=IF statement
The second form of the if statement allows executing an alternative block if the selection condition is false: if (condition) block else alternative_block
If the selection expression condition evaluates to a value different from zero, the set of statements block is executed. Otherwise, the set of statements of alternative_block is executed. If block and alternative_block are composed of several statements, braces ({}) must enclose the statements. If there is only one statement, the braces can be omitted. Here is an example: $ cat if_statement9.c #include #include #include
int main(void) { char s1[40] = “IF statement”; char s2[80] = “WHILE statement”; if ( !strcmp(s1, s2) ) { printf(“The arrays s1 and s2 hold the same string\n”); printf(“s1=%s\n”, s1); } else { printf(“The arrays s1 and s2 hold different strings\n”); printf(“s1=%s\n”, s1); printf(“s2=%s\n”, s2); } return EXIT_SUCCESS; } $ gcc -o if_statement9 -std=c99 -pedantic if_statement9.c $ ./if_statement9 The arrays s1 and s2 hold different strings s1=IF statement s2=WHILE statement
The third form of the if statement allows using several selection conditions: if (condition1) block1 else if (condition2) block2 … else if (conditionN) blockN else alternative_block
If condition1 evaluates to a value different from zero, block is executed. Otherwise, if condition2 evaluates to a value different from zero, block2 is executed… Otherwise, if conditionN evaluates to a value different from zero, blockN is executed. Otherwise, alternative_block is executed. If a block composed of several statements, braces ({}) must enclose the statements. If there is only one statement, the braces can be omitted. The following program is an implementation of a basic calculator that computes the results of the operations: +, -, * and /. The executable expects three arguments of the form n1 op n2 where n1 and n2 are arithmetic values and op an arithmetic operator (+, -, * or /); it outputs the result of the operation. If the user passes unexpected arguments, a help is displayed. $ cat if_statement10.c 1 #include 2 #include
3 #include 4 5 int main(int argc, char **argv) { 6 float n1, n2; 7 char op; 8 9 if ( argc != 4 ) { 10 printf(“USAGE: %s number op number\n”, argv[0]); 11 printf(“Where op is +, -, *, /\n\n”); 12 13 return EXIT_FAILURE; 14 } 15 16 n1 = atof(argv[1]); 17 op = *argv[2]; /* first character of string argv[2] */ 18 n2 = atof(argv[3]); 19 20 if ( op == ‘+’ ) 21 printf(“%f + %f = %f\n”, n1, n2, n1 + n2); 22 else if ( op == ‘-‘ ) 23 printf(“%f - %f = %f\n”, n1, n2, n1 - n2); 24 else if ( op == ‘*’ ) 25 printf(“%f * %f = %f\n”, n1, n2, n1 * n2); 26 else if ( op == ‘/’ ) 27 printf(“%f / %f = %f\n”, n1, n2, n1 / n2); 28 else { 29 printf(“Unknown operator %c\n”, op); 30 printf(“USAGE: %s number op number\n”, argv[0]); 31 printf(“Where op is +, -, *, /\n\n”); 32 33 return EXIT_FAILURE; 34 } 35 36 return EXIT_SUCCESS; 37 } $ gcc -o if_statement10 -std=c99 -pedantic if_statement10.c $ ./if_statement10 USAGE: ./if_statement10 number op number Where op is +, -, *, / $ ./if_statement10 10 / 7 10.000000 / 7.000000 = 1.428571
$ ./if_statement10 10 + 7 10.000000 + 7.000000 = 17.000000 $ ./if_statement10 5 % 10 Unknown operator % USAGE: ./if_statement10 number number Where op is +, -, *, /
Explanation: o Line 6: the variable n1 and n2 are declared as float. They will store the operands. o Line 7: the variable op, declared as char, will hold the character representing the operator: +, -, * or /. o Lines 9-14: the relational expression argc != 4 tests if the number of arguments (argc) is different from 4 (4 arguments are expected). If it is true, a help is displayed explaining how to run the program. Remember the array argv[0] holds the program name. o Line 16: argv[1] is a string. It is the first operand of the operation. It is converted to a number of type float through the C standard function atof() and then assigned to the variable n1. o Line 17: argv[2] is a string. Since an operator is represented by a character, only the very first character of the string is taken and assigned to the variable op. o Line 18: argv[3] is a string. It is the second operand of the operation. It is converted to a number of type float through the C standard function atof() and then assigned to the variable n2. o Lines 20-34: The if statement check the value of the variable op. If an expected operator is found (+, -, *, or /), the corresponding operation is executed but if the variable op does not hold an expected operator, a help is displayed (lines 28-34).
V.3.1 Switch statement The switch statement is similar to the if statement. If also executes a set of statements depending on the resulting value of the selection expression. It takes the following general form: switch (expr) { case const1: statement1_1; statement1_2; … statement1_P1; case const2: statement2_1; statement2_2; …
statement2_P2; … case constN: statementN_1; statementN_2; … statementN_PN; … default: statementAlt_1; statementAlt_2; … StatementAlt_Palt; }
Where: o expr is an expression that evaluates to integer type. o const1, const2,…, constP are integer constant expressions (see Chapter IV Section IV.15). o statementX_Y are statements. o The default case is optional. The expression expr evaluates to the value of integer type that we will call val: o If val equals const1, the set of statements statement1_1,…, statement1_P1 is executed. If the break statement is encountered, the processing of the switch statement stops. Otherwise, all the statements statement2_1,.., statement2_P1 ,…, statementN_P,…, statementN_PN, statementAlt_1,…, statementN_Palt are also executed. o Otherwise, if val equals const2, the set of statements statement2_1,…, statement2_P2 is executed. If a statement is break, the processing of the switch statement stops. Otherwise, all the statements statement3_1,.., statement3_P3,…, statementN_P,…, statementN_PN, statementAlt_1,…, statementN_Palt are also executed. o … o Otherwise, if val equals constN, the set of statements statementN_1,…, statementN_PN is executed. If one of the statements is break, the processing of the switch statement stops. Otherwise, all the statements statementAlt_1,.., statementAlt_Palt are also executed. o Otherwise, the statements statementAlt_1,.., statementAlt_Palt also executed. To put it more concisely, if the integer value of the selection expression corresponds to the value of a case, all the statements following it are executed until the end of the switch
statement or until the first break statement is met. When the break statement is met, the switch statement terminates. In the following example, we have intentionally forgotten the break statement. See what it yields: $ cat switch1.c #include #include int main(int argc, char **argv) { int n; if ( argc != 2 ) { printf(“USAGE: %s numner\n”, argv[0]); return EXIT_FAILURE; } n = atoi( argv[1] ); switch ( n % 2 ) { case 0: printf(“Number %d is even\n”, n); case 1: printf(“Number %d is odd\n”, n); } return EXIT_SUCCESS; } $ gcc -o switch1 -std=c99 -pedantic switch1.c $ ./switch1 10 Number 10 is even Number 10 is odd $ ./switch1 11 Number 11 is odd
The selection expression n % 2 evaluates to 0 (if the passed argument is even) or 1 (if the passed argument is odd). Now, if insert the break statement, only the statements of case 0 are executed if the n is even: $ cat switch2.c #include #include
int main(int argc, char **argv) { int n; if ( argc != 2 ) { printf(“USAGE: %s numner\n”, argv[0]); return EXIT_FAILURE; } n = atoi( argv[1] ); switch ( n % 2 ) { case 0: printf(“Number %d is even\n”, n); break; case 1: printf(“Number %d is odd\n”, n); } return EXIT_SUCCESS; } $ gcc -o switch2 -std=c99 -pedantic switch2.c $ ./switch2 10 Number 10 is even $ ./switch2 11 Number 11 is odd
The following example is equivalent to example if_statement10.c: $ cat switch3.c #include #include int main(int argc, char **argv) { float n1, n2; char op; if ( argc != 4 ) { printf(“USAGE: %s number op number\n”, argv[0]); printf(“Where op is +, -, *, /\n\n”); return EXIT_FAILURE; } n1 = atof(argv[1]);
op = *argv[2]; /* first character of string argv[2] */ n2 = atof(argv[3]); switch ( op ) { case ‘+’: printf(“%f + %f = %f\n”, n1, n2, n1 + n2); break; case ‘-‘: printf(“%f - %f = %f\n”, n1, n2, n1 - n2); break; case ‘*’: printf(“%f * %f = %f\n”, n1, n2, n1 * n2); break; case ‘/’: printf(“%f / %f = %f\n”, n1, n2, n1 / n2); break; default: printf(“Unknown operator %c\n”, op); printf(“USAGE: %s number op number\n”, argv[0]); printf(“Where op is +, -, *, /\n\n”); return EXIT_FAILURE; } return EXIT_SUCCESS; }
Remember that the selection expression must evaluate to an integer type. The following example is not correct and cannot be compiled: $ cat switch4.c #include #include int main(void) { char *operation=“addition”; switch ( operation ) { case “+”: printf(“Addition\n”); break; case “-“: printf(“Subtraction\n”); break;
case “*”: printf(“Multiplication\n”); break; case “/”: printf(“Division\n”); break; default: printf(“Unknown operator %c\n”, op); return EXIT_FAILURE; } return EXIT_SUCCESS; } $ gcc -o switch4 -std=c99 -pedantic switch4.c switch4.c: In function ‘main’: switch4.c:7:13: error: switch quantity not an integer switch4.c:8:9: error: case label does not reduce to an integer constant switch4.c:11:9: error: case label does not reduce to an integer constant switch4.c:14:9: error: case label does not reduce to an integer constant switch4.c:17:9: error: case label does not reduce to an integer constant switch4.c:21:44: error: ‘op’ undeclared (first use in this function) switch4.c:21:44: note: each undeclared identifier is reported only once for each function it appears in
Do not confuse the character literal ‘+’ that has integer type with the string “+”. The value of a case must be an integer literal or an expression evaluating to an integer constant. The following example yields an error: $ cat switch5.c #include #include int main(void) { int c = 10; int x = 10; switch (c) { case x: printf(“case %d\n”, x); } return EXIT_SUCCESS; } $ gcc -o switch5 -std=c99 -pedantic switch5.c
switch5.c: In function ‘main’: switch5.c:9:7: error: case label does not reduce to an integer constant
V.3.2 While loop The while statement executes a set of statements several times depending on a condition. while (expr) block
Where: o expr is an expression. o block is a set of statements also known as while block or while body. Statements are enclosed between braces ({}) . Braces can be omitted if there is a single statement. The while body is executed until the expression expr evaluates to zero (false). Thus, as long as the expression expr evaluates to a non-zero value, the compound statement block is executed. The following example displays the first ten digits: $ cat while_loop1.c 1 #include 2 #include 3 4 int main(void) { 5 int i = 0; 6 int max = 10; 7 8 while ( i < max ) { 9 printf(“i=%d “, i); 10 i++; 11 } 12 printf(“\n”); 13 14 return EXIT_SUCCESS; 15 } $ gcc -o while_loop1 -std=c99 -pedantic while_loop1.c $ ./while_loop1 i=0 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9
Explanation: o Lines 8-11: before entering the while loop, the variable i holds the value 0.
▪ At the first iteration, i holds the value 0, and the relational expression i < max (i.e. 0 < 10) is true. Which causes the while body to be executed: the value of i is displayed (0), then i is incremented. At the end of the iteration, i holds the value 1. ▪ At the second iteration, i holds the value 1 and the relational expression i < max (i.e. 1 < 10) is still true. The while body is executed: the value of i is displayed (1), then i is incremented. At the end of the iteration, i holds the value 2. ▪ And so on ▪ At the 10th iteration, i holds the value 9, and the relational expression i < max (i.e. 9 < 10) remains true. The while body is executed: the value of i is displayed (9), then i is incremented. At the end of the iteration, i holds the value 10. ▪ At the 11th iteration, i holds the value 10 and the relational expression i < max (i.e. 1 < 10) becomes false. The while statement ends. In the following example, we display the strings held in the array s: $ cat while_loop2.c #include #include int main(void) { char *s[] = { “ONE”, “TWO”, “THREE”, “FOUR” }; int i = 0; int nb_elt = sizeof s / sizeof(char *); /* number of elements in array s */ while ( i < nb_elt ) { printf(“s[%d]=%s\n”, i, s[i] ); i++; } return EXIT_SUCCESS; } $ gcc -o while_loop2 -std=c99 -pedantic while_loop2.c $ ./while_loop2 s[0]=ONE s[1]=TWO s[2]=THREE s[3]=FOUR
In the following example, we also display the strings held in the array s: $ cat while_loop3.c 1 #include
2 #include 3 4 int main(void) { 5 char *s[] = { “ONE”, “TWO”, “THREE”, “FOUR”, NULL }; 6 char **p; 7 8 p = s; 9 while ( *p != NULL ) { 10 printf(“%s\n”, *p ); 11 p++; 12 } 13 14 return EXIT_SUCCESS; 15 } $ gcc -o while_loop3 -std=c99 -pedantic while_loop3.c $ ./while_loop3 ONE TWO THREE FOUR
Explanation: o Line 5: the object s is an array of strings. It is composed of five elements but the last element, NULL, is used only for indicating the end of the list. o Line 6: p is declared as pointer to pointer to char. o Line 8: before entering the while loop, the pointer p is initialized to s. The pointer p points to the very first object of the array s (the string “ONE”). o Lines 9-12: as long as the pointer p does not point to a null pointer (i.e. *p != NULL), the while body is executed. First, the string to which the pointer p points is displayed, then the pointer p is incremented so that is points to the next object. ▪ At the beginning, p points to the string “ONE”. Since the expression *p != NULL is true, the statements of its body are executed. The string “ONE” is displayed and p is incremented. The pointer p points now to the string “TWO”. ▪ At the second iteration, p points to the string “TWO”. Since the expression *p != NULL is true, the statements of its body are executed. The string “TWO” is displayed and p is incremented. The pointer p points now to the string “THREE”. ▪ And so on ▪ At the fourth iteration, p points to the string “FOUR”. Since the expression *p != NULL is true, the statements of its body are executed. The string “FOUR” is displayed and p is incremented. The pointer p points now to the string “FOUR”. ▪ At the fifth iteration, p points to a null pointer (NULL). Since the expression *p !=
NULL become false, the while statement terminates.
Since the macro NULL is synonym for 0 or (void *)0, the expression *p != NULL is the same as *p != 0 and then is equivalent to the expression *p. The example while_loop3.c can be rewritten as follows: $ cat while_loop4.c #include #include int main(void) { char *s[] = { “ONE”, “TWO”, “THREE”, “FOUR”, NULL }; char **p = s; while ( *p ) { printf(“%s\n”, *p ); p++; } return EXIT_SUCCESS; } $ gcc -o while_loop4 -std=c99 -pedantic while_loop4.c $ ./while_loop4 ONE TWO THREE FOUR
Here is another example related to pointers. In the following example, we copy the string of the array s into a memory area, allocated by malloc(), pointed to by the pointer copy_s. $ cat while_loop5.c 1 #include 2 #include 3 #include 4 5 int main(void) { 6 char s[] = “Hello world”; 7 int len = strlen( s ); 8 char *copy_s = malloc( len + 1 ); 9 char *p1; 10 char *p2; 11
12 if ( ! copy_s ) { /* check if the pointer copy_s is valid */ 13 printf(“Fatal Error. Cannot allocate memory\n”); 14 return EXIT_FAILURE; 15 } 16 17 p1 = s; p2 = copy_s; 18 while ( *p1 != ‘\0’ ) { 19 *p2 = *p1; 20 p2++; 21 p1++; 22 } 23 24 *p2 = ‘\0’; 25 printf(“copy_s=%s\n”, copy_s); 26 27 return EXIT_SUCCESS; 28 } $ gcc -o while_loop5 -std=c99 -pedantic while_loop5.c $ ./while_loop5 copy_s=Hello world
Explanation: o Line 6: the array s is initialized to the string “Hello world” o Line 7: the len variable is initialized to the number of characters in the array s. o Line 8: A memory block is allocated by the malloc() function. The requested size is the number of characters in the array s plus one to include the terminating null character ‘\0’. o Lines 12-15: we display an error message and terminate the program if the pointer copy_s is not valid. o Line 17: the pointer p1 is initialized to s (source data) and p2 to copy_s. o Lines 18-22: as long as the current character is different from the null character, the while body is executed. ▪ Line 19: the character pointed to by p1 is copied to the piece of memory pointed to by p2. ▪ Line 20: move the pointer p1 to the next character ▪ Line 21: move the pointer p2 to the next piece of address memory that can hold a character ▪ The while loop ends when the current character pointed to by p1 is the null character. o Line 24: since the null character has not been copied, the character string pointed to by p2 is ended by the null character.
o Line 25: the string pointed to by copy_s is displayed. The following example performs the same task as the previous one: $ cat while_loop6.c #include #include #include int main(void) { char *s = “Hello world”; int len = strlen( s ); /* number of characters in the array s */ char *copy_s = malloc( len + 1 ); char *p1; char *p2; /* check the pointer copy_s is valid */ if ( ! copy_s ) { printf(“Cannot allocate memory for copy_s\n”); return EXIT_FAILURE; } /* copy string from array s to memory pointed to by copy_s */ p1 = s; p2 = copy_s; while ( (*p2++ = *p1++) != ‘\0’ ) ; /* while body is empty */ printf(“copy_s=%s\n”, copy_s); return EXIT_SUCCESS; } $ gcc -o while_loop6 -std=c99 -pedantic while_loop6.c $ ./while_loop6 copy_s=Hello world
The expression *p2++ = *p1++ carries out the following tasks: o The piece of memory pointed to by p2 (a character) represented by *p2 takes the object (current character) pointed to by the pointed p1 (represented by *p1). o Then, the pointer p2 is incremented by the postfix operator: p2++. o The pointer p1 is also incremented by the postfix operator: p1++.
o The assignment evaluates to the value pointed to by p2 (represented by *p2): the current character pointed to by p2. Then, as long as the assignment evaluates to a value different from the null character, the while body is executed (here, the body is empty). At the last iteration: o p2 holds the null character ‘\0’. It is assigned to the piece of memory pointed to by p1. o The assignment *p2++ = *p1++ evaluates to the null character ‘\0’ . o The expression (*p2++ = *p1++) != ‘\0’ becomes false and then terminates the while loop. The while loop allows you to execute indefinitely a set of statements (infinite loop): while (1) { statement1; statement2; … statementN; }
The following program executes until you press the letter c while holding the CTRL key (). $ cat while_loop7.c #include #include #include int main(void) { const int num_len = 32; char s[num_len]; int n; float f; while (1) { printf(“\nPlease type an integer number: “); fgets(s, num_len, stdin); /* read characters typed */ n = atoi( s ); /* convert s to integer */ f = atof( s ); /* convert s to float */ if (f != n) { printf(“The given number is not integer\n”); return EXIT_FAILURE;
} switch ( n % 2 ) { case 0: printf(“%d is even\n”, n); break; case 1: printf(“%d is odd\n”, n); } } } $ gcc -o while_loop7 -std=c99 -pedantic while_loop7.c $ ./while_loop7 Please type an integer number: 10 10 is even Please type an integer number: 17 17 is odd Please type an integer number:
It prints the message “Please type an integer number: “ and waits for you to type a number terminated by the key. Then, it tells you if the number is odd or even. In the program, there is a new function that we have not talked about so far: fgets(). We will say more about it when we talk about the most frequently used C standard functions. For now, we use it to retrieve the characters typed by the user. That is, the call fgets(s, num_len, stdin) will retrieve the characters typed and store them in the array s and terminates it with the null characters \0’. The function reads what is typed until at most num_len-1 characters have been read or the newline character has been read (yielded by the key). The second argument num_len tells the function to read at most num_len-1 characters because our array s can hold only num_len characters, the last character being reserved for the null character ‘\0’. The third argument stdin represents the standard input that is associated with the keyboard: it tells the function to read what is typed.
V.3.3 Do…While loop The do/while loop works in the same way as the while loop except it executes at least once the loop body. The condition is tested only after running the loop body. Its general syntax is given below (do not forget the semicolon at the end of the statement): do block while (expr);
Where: o block is a set of statements o expr is an expression The do body (loop body) is executed until the condition expr becomes false. The loop body is executed first. Then, the condition expr is tested. The following example displays the first ten digits: $ cat do_while1.c #include #include int main(void) { int max = 10; int i = 0; do { printf(“%d “, i); i++; } while ( i < max ); printf(“\n”); return EXIT_SUCCESS; } $ gcc -o do_while1 -std=c99 -pedantic do_while1.c $ ./do_while1 0 1 2 3 4 5 6 7 8 9
The loop body is executed at least once. In the following example, the very first value of i is 0, yet the loop body is executed: $ cat do_while2.c #include #include int main(void) { int max = 10; int i = 0; do { printf(“%d “, i);
i++; } while ( i < max && i > 0); printf(“\n”); return EXIT_SUCCESS; } $ gcc -o do_while2 -std=c99 -pedantic do_while2.c $ ./do_while2 0 1 2 3 4 5 6 7 8 9
V.3.4 For loop The for loop does the same thing as the while loop. It is only a concise form of the while loop easing programming. The for statement executes a set of statements several times depending on a condition. for (expr1;expr2;expr3) block
Where: o expr1, expr2, and expr3 are expressions. o block is a set of statements also known as loop body or for body. Statements are enclosed between braces ({}) . Braces can be omitted if there is a single statement. The expression expr1 is executed first (initialization) and only once. The expression expr2 is evaluated, if it is true, the for body (block) is executed. Then, the expression expr3 is executed. Next, we reboot the same process: the expression expr2 is evaluated, if it is true the for body is executed, followed by the evaluation of the expression exp3…the for loop continues until the expression expr2 becomes false The following example displays the first ten digits: $ cat for_loop1.c 1 #include 2 #include 3 4 int main(void) { 5 int max = 10; 6 int i; 7 8 for (i=0; i < max; i++) 9 printf(“%d “, i); 10
11 printf(“\n”); 12 return EXIT_SUCCESS; 13 } $ gcc -o for_loop1 -std=c99 -pedantic for_loop1.c $ ./for_loop1 0 1 2 3 4 5 6 7 8 9
Explanation: o Lines 8-9: ▪ The variable i is initialized to the value 0. This is the initialization step. ▪ First iteration. Since i holds the value 0, the expression i < max is true and then the loop body (line 9) is executed. The value of i is printed (0). The expression i++ is executed, i holds now the value 1. ▪ Second iteration. Since i holds the value 1, the expression i < max is true and then the loop body (line 9) is executed. The value of i is printed (1). The expression i++ is executed, i holds now the value 2. ▪ … ▪ Tenth iteration. Since i holds the value 9, the expression i < max is true and then the loop body (line 9) is executed. The value of i is printed (9). The expression i++ is executed, i holds now the value 10. ▪ Eleventh iteration. Since i holds the value 10, the expression i < max becomes false and the for loop ends without executing the for body. o Line 11: a newline is displayed. The following example is equivalent to the program while_loop2.c previously given. It displays the strings of the array s: $ cat for_loop2.c #include #include int main(void) { char *s[] = { “ONE”, “TWO”, “THREE”, “FOUR” }; int i; int nb_elt = sizeof s / sizeof(char *); /* number of elements in array s */ for ( i = 0; i < nb_elt; i++ ) printf(“s[%d]=%s\n”, i, s[i] ); return EXIT_SUCCESS; }
$ gcc -o for_loop2 -std=c99 -pedantic for_loop2.c $ ./for_loop2 s[0]=ONE s[1]=TWO s[2]=THREE s[3]=FOUR
The following example is equivalent to while_loop4.c. It displays the strings of the array s by using pointers. $ cat for_loop3.c #include #include int main(void) { char *s[] = { “ONE”, “TWO”, “THREE”, “FOUR”, NULL }; char **p; for ( p = s; *p; p++ ) printf(“%s\n”, *p ); return EXIT_SUCCESS; } $ gcc -o for_loop3 -std=c99 -pedantic for_loop3.c $ ./for_loop3 ONE TWO THREE FOUR
The following example is equivalent to while_loop5.c. It copies a string to a memory block allocated by malloc() and pointed to by the pointer copy_s; $ cat for_loop4.c #include #include #include int main(void) { char *s = “Hello world”; int len = strlen( s ); /* number of characters in the array s */ char *copy_s = malloc( len + 1 ); char *p1; char *p2;
/* check the pointer copy_s is valid */ if ( copy_s == NULL ) { printf(“Cannot allocate memory for copy_s\n”); return EXIT_FAILURE; } /* copy string from array s to memory pointed to by copy_s */ for ( p1 = s, p2 = copy_s; *p1 != ‘\0’; p1++, p2++ ) *p2 = *p1; *p2 != ‘\0’; /* a character string is terminated by a null character */ printf(“copy_s=%s\n”, copy_s); return EXIT_SUCCESS; } $ gcc -o for_loop4 -std=c99 -pedantic for_loop4.c $ ./for_loop4 copy_s=Hello world
An infinite loop executes indefinitely a set of statements. for (;;) { statement1; statement2; … statementN; }
The following example is equivalent to while_loop7.c. The user types an integer number and the program tells if it is even or odd. The program executes until you hit . $ cat for_loop5.c #include #include #include int main(void) { const int num_len = 32; char s[num_len]; /* array to store characters typed */ int n; float f;
for (;;) { printf(“\nPlease type an integer number: “); fgets(s, num_len, stdin); /* retrieve characters typed */ n = atoi( s ); /* convert to integer */ f = atof( s ); /* convert to float */ if (f != n) { /* the given number is a float */ printf(“The given number is not integer\n”); return EXIT_FAILURE; } switch ( n % 2 ) { case 0: printf(“%d is even\n”, n); break; case 1: printf(“%d is odd\n”, n); } } } $ gcc -o for_loop5 -std=c99 -pedantic for_loop5.c $ ./for_loop5 Please type an integer number: 10 10 is even Please type an integer number: 11 11 is odd Please type an integer number: anything 0 is even Please type an integer number: $
Remember that if the given string starts with something else than a number, the function atoi() and atof() return 0. C99 introduces a very useful feature, it permits to declare a variable in the initialization clause of the for loop:
$ cat for_loop6.c #include #include int main(void) { for (int i=0; i < 5; i++) printf(“i=%d\n”, i); return EXIT_SUCCESS; } $ gcc -o for_loop6 -std=c99 -pedantic -Wall for_loop6.c $ ./for_loop6 i=0 i=1 i=2 i=3 i=4
Take note a variable declared in this way can be used only within the for loop. The variable will be destroyed and then cannot be used anymore when the closing brace } that terminates the loop is encountered.
V.4 continue The continue statement jumps to the next iteration of a loop statement (see Figure V‑1). It can be used only in a loop body (for, while or do/while statement). The following program displays the first ten digits with the exception of the digit 3: $ cat continue1.c 1 #include 2 #include 3 4 int main(void) { 5 int max = 10; 6 int i; 7 8 for (i=0; i < 10; i++) { 9 if ( i == 3 ) continue; 10 printf(“%d “, i); 11 } 12 13 printf(“\n”);
14 15 return EXIT_SUCCESS; 16 } $ gcc -o continue1 -std=c99 -pedantic continue1.c $ ./continue1 0 1 2 4 5 6 7 8 9
Explanation: o Lines 8-11: ▪ Initialization: the variable i is set to 0 before entering the loop. ▪ First iteration. i=0 and i < 10 is true. The loop body is executed. The value of i is printed. The variable i is incremented by the expression i++, i hold the value 1. ▪ Second iteration. i=1 and i < 10 is true. The loop body is executed. ▪ … ▪ Fourth iteration. i=3 and i < 10 is true. The loop body is executed. As the expression i == 3 is true, the continue statement is executed: it stops the current iteration without executing the next statements of the for body. Before starting a new iteration, the variable i is first incremented by the expression i++, i hold the value 4. ▪ And so son. In the following example, we display each element in the array s except if it is the string “THREE”: $ cat continue2.c #include #include #include int main(void) { char *s[] = { “ONE”, “TWO”, “THREE”, “FOUR” }; int nb_elt = sizeof s / sizeof(char *); int i; i = 0; while( i < nb_elt ) { if ( ! strcmp( “THREE”, s[ i ] ) ) { i++; continue; } printf(“s[ %d ] = %s\n”, i, s[ i ]);
i++; } return EXIT_SUCCESS; } $ gcc -o continue2 -std=c99 -pedantic continue2.c $ ./continue2 s[ 0 ] = ONE s[ 1 ] = TWO s[ 3 ] = FOUR
Figure V‑1 continue statement
Take note that we incremented the value of i before jumping to the next iteration with the continue statement. With the for loop, the same example would be easier to write: $ cat continue3.c #include #include #include
int main(void) { char *s[] = { “ONE”, “TWO”, “THREE”, “FOUR” }; int nb_elt = sizeof s / sizeof(char *); int i; for(i = 0; i < nb_elt; i++ ) { if ( ! strcmp( “THREE”, s[ i ] ) ) continue; printf(“s[ %d ] = %s\n”, i, s[ i ]); } return EXIT_SUCCESS; } $ gcc -o continue3 -std=c99 -pedantic continue3.c $ ./continue3 s[ 0 ] = ONE s[ 1 ] = TWO s[ 3 ] = FOUR
Figure V‑2 break statement
V.5 break The break statement terminates a loop statement or the current case of the switch statement in which it appears (see Figure V‑2). In the following example, the for loop ends when i reaches the value 3. $ cat break1.c #include
#include int main(void) { int max = 10; int i; for (i=0; i < 10; i++) { if ( i == 3 ) break; printf(“%d “, i); } printf(“\n”); return EXIT_SUCCESS; } $ gcc -o break1 -std=c99 -pedantic break1.c $ ./break1 0 1 2
The break statement is useful in infinite loops. Let us consider the example for_loop5.c we gave earlier and let us modify it so that we leave properly the program after typing the word quit. $ cat break2.c #include #include #include int main(void) { const int num_len = 32; char s[num_len]; int n; float f; for (;;) { printf(“\nPlease type an integer number: “); fgets(s, num_len, stdin); /* retrieve characters typed */ /* leave the for loop if word quit is typed */ if ( !strncmp (s, “quit”, 4 ) ) break; n = atoi( s ); /* convert to integer */ f = atof( s ); /* convert to float */
if (f != n) { /* if f != n, f is float */ printf(“The given number is not integer\n”); return EXIT_FAILURE; } switch ( n % 2 ) { case 0: printf(“%d is even\n”, n); break; case 1: printf(“%d is odd\n”, n); } /* End of switch */ } /* End of for loop */ printf(“\nExiting…\n”); return EXIT_SUCCESS; } $ gcc -o break2 -std=c99 -pedantic break2.c $ ./break2 Please type an integer number: 11 11 is odd Please type an integer number: quit Exiting…
V.6 goto The goto statement jumps to another point of the program specified by a label (see Figure V‑3). Here is an example: $ cat goto1.c #include #include int main(void) { int max = 10; int i; for (i=0; i < 10; i++) {
if ( i == 3 ) goto END; printf(“%d “, i); } END: printf(“\n”); return EXIT_SUCCESS; } $ gcc -o goto1 -std=c99 -pedantic goto1.c $ ./goto1 0 1 2
If the variable i holds the value 3, the goto statement jumps to the label END. Which leaves the for loop.
Figure V‑3 goto statement
A label does nothing. It is only used to specify a place in the program. It is used by the goto statement only. In the following example, we use two labels: $ cat goto2.c #include #include int main(void) { int max = 10; int i; LOOP_FOR: for (i=0; i < 10; i++) { printf(“%d “, i); } END: printf(“\n”); return EXIT_SUCCESS; } $ gcc -o goto2 -std=c99 -pedantic goto2.c $ ./goto2 0 1 2 3 4 5 6 7 8 9
Programmers often avoid using the goto statement because it makes debugging and understanding of the source code trickier. So, do not use it if you can.
V.7 Nested loops A nested loop is a loop inside another loop. Here is an example: $ cat nested_loop1.c 1 #include 2 #include 3 4 int main(void) { 5 int i, j, k; 6 7 for (i = 1; i < 4; i++ ) { 8 printf(“-> %d:\n”, i); 9 10 for (j = ‘A’ ; j < ‘C’; j++ ) { 11 printf(“ %c:\n”, j);
12 13 for (k = ‘a’; k < ‘c’; k++ ) { 14 printf(“ %c\n”, k); 15 } 16 17 } 18 19 } 20 return EXIT_SUCCESS; 21 } $ gcc -o nested_loop1 -std=c99 -pedantic nested_loop1.c $ ./nested_loop1 -> 1: A: a b B: a b -> 2: A: a b B: a b -> 3: A: a b B: a b
Explanation: o Lines 7-19: Digits from 1 through 3 are displayed. The first for loop contains two other loops (lines 10 and 13). o Lines 10-17: characters from A to B are displayed. The second for loop contains another loop (line 13). o Lines 13-15: characters from a to b are displayed. This is the last loop.
Nested loops can be used to display multidimensional arrays are shown below: $ cat nested_loops2.c #include #include int main(void) { int i, j, k; /* arr is a three-dimensional */ char arr[][3][2] = { { /* First array 2-dimensional array */ { ‘a’, ‘b’ }, /* first one-dimensional array: 2 elements */ { ‘c’, ‘d’ }, /* second one-dimensional array: 2 elements */ { ‘e’, ‘f’ } /* Third one-dimensional array: 2 elements */ }, { /* Second two-dimensional array */ { ‘A’, ‘B’ }, /* first two-dimensional array: 2 elements */ { ‘C’, ‘D’ }, /* second two-dimensional array: 2 elements */ { ‘E’, ‘F’ } /* Third two-dimensional array: 2 elements */ } }; /* display three-dimensioanl array */ for ( i=0; i < 2; i++ ) { for ( j=0; j < 3; j++ ) { for ( k=0; k < 2; k++ ) printf( “arr[%d][%d][%d]=%c\n”, i, j, k, arr[i][j][k]); printf(“\n”); } printf(“\n”); } return EXIT_SUCCESS; } $ gcc -o nested_loop2 -std=c99 -pedantic nested_loop2.c $ ./nested_loop2 arr[0][0][0]=a arr[0][0][1]=b
arr[0][1][0]=c arr[0][1][1]=d arr[0][2][0]=e arr[0][2][1]=f arr[1][0][0]=A arr[1][0][1]=B arr[1][1][0]=C arr[1][1][1]=D arr[1][2][0]=E arr[1][2][1]=F
The break statement leaves the innermost loop body (see Figure V‑2). That is, it exits the first loop in which it is directly contained: $ cat nested_loops3.c #include #include int main(void) { int i, j, k; for (i = 1; i < 4; i++ ) { printf(“-> i=%d:\n”, i); for (j = 1 ; j < 4; j++ ) { printf(“ j=%d:\n”, j); for (k = 1; k < 5; k++ ) { if ( k == 3 ) { printf(“ k=%d. BREAK\n”, k); break; } printf(“ k=%d\n”, k); } }
} return EXIT_SUCCESS; } $ gcc -o nested_loop3 -std=c99 -pedantic nested_loop3.c $ ./nested_loop3 -> i=1: j=1: k=1 k=2 k=3. BREAK j=2: k=1 k=2 k=3. BREAK j=3: k=1 k=2 k=3. BREAK -> i=2: j=1: k=1 k=2 k=3. BREAK j=2: k=1 k=2 k=3. BREAK j=3: k=1 k=2 k=3. BREAK -> i=3: j=1: k=1 k=2 k=3. BREAK j=2: k=1 k=2 k=3. BREAK j=3: k=1
k=2 k=3. BREAK
Compare with the following one: $ cat nested_loop4.c #include int main(void) { int i, j, k; for (i = 1; i < 4; i++ ) { printf(“-> i=%d:\n”, i); for (j = 1 ; j < 4; j++ ) { if ( j == 2 ) { printf(“ j=%d: BREAK.\n”, j); break; } printf(“ j=%d:\n”, j); for (k = 1; k < 5; k++ ) { printf(“ k=%d\n”, k); } } } return EXIT_SUCCESS; } $ gcc -o nested_loop4 -std=c99 -pedantic nested_loop4.c $ ./nested_loop4 -> i=1: j=1: k=1 k=2 k=3 k=4 j=2: BREAK. -> i=2: j=1: k=1 k=2
k=3 k=4 j=2: BREAK. -> i=3: j=1: k=1 k=2 k=3 k=4 j=2: BREAK.
The continue statement does not stop the current loop but jumps to the next iteration of the innermost loop body (see Figure V‑1). That is, it branches to next iteration of the innermost loop in which it is contained: $ cat nested_loops5.c #include #include int main(void) { int i, j, k; for (i = 1; i < 4; i++ ) { printf(“-> i=%d:\n”, i); for (j = 1 ; j < 4; j++ ) { printf(“ j=%d:\n”, j); for (k = 1; k < 4; k++ ) { if ( k == 2 ) continue; printf(“ k=%d\n”, k); } } } return EXIT_SUCCESS; } $ gcc -o nested_loop5 -std=c99 -pedantic nested_loop5.c $ ./nested_loop5 -> i=1:
j=1: k=1 k=3 j=2: k=1 k=3 j=3: k=1 k=3 -> i=2: j=1: k=1 k=3 j=2: k=1 k=3 j=3: k=1 k=3 -> i=3: j=1: k=1 k=3 j=2: k=1 k=3 j=3: k=1 k=3
V.8 Exercises Exercise 1. Write a program that takes a list of numbers separated by spaces and displays the mean value. Exercise 2. Write a program that takes a character string and displays the number of consonants and the number of vowels. Exercise 3. Explain why the following program is not correct. #include
#include int main(void) { char *s[] = { “ONE”, “TWO”, “THREE”, “FOUR” }; char **p; for ( p = s; *p; p++ ) printf(“%s\n”, *p ); return EXIT_SUCCESS; }
Exercise 4. Write a program that displays the internal representation of an integer. Exercise 5. Write a simple program that displays if the processor is little endian or big endian.
CHAPTER VI USER-DEFINED TYPES VI.1 Introduction So far, we have only worked with types defined by the C languages: arithmetic types, pointers and arrays. Now, you are going to learn to define your own types. In simple C programs, basic types are enough, you actually do not need to create new types but you will shortly find out that creating your own types greatly ease your work as your programs get more complex. For example, you could define a type called student allowing you to create objects composed of three attributes: name, surname and age. Once defined, you will be able to use them as any other type.
VI.2 Enumerations Consider the following example: $ cat enum1.c #include #include int main(void) { int const SUNDAY = 0; int const MONDAY = 1; int const TUESDAY = 2; int const WEDNESDAY = 3; int const THURSDAY = 4; int const FRIDAY = 5; int const SATURDAY = 6; int d; d = SUNDAY; printf(“d=%d\n”, d); d = FRIDAY; printf(“d=%d\n”, d); } $ gcc -o enum1 -std=c99 -pedantic enum1.c $ ./enum1 d=0 d=5
In the example above, we have defined seven integer constants that represent the days of
the week. The same program can be simplified by using an enumeration type as shown below: $ cat enum2.c #include #include int main(void) { enum days { SUNDAY, MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY }; enum days d; d = SUNDAY; printf(“d=%d\n”, d); d = FRIDAY; printf(“d=%d\n”, d); return EXIT_SUCCESS; } $ gcc -o enum2 -std=c99 -pedantic enum2.c $ ./enum2 d=0 d=5
We defined a new type called days that is an enumerated type. An enumerated type is a list of integer constant values, each of which is identified by a name. It is defined as follows: enum enum_tag { id1[=val1], id2[=val], …, idN[=valN] };
Where: o enum_tag is the name you give to the enumeration. It is called an enumeration tag. o id1, id2,…, idN are names of constants known as enumeration constants. They are composed of letters, digits and underscores, starting with a letter or an underscore. o va1, val2, …, valN are integer constant expressions. They are of type int. Their values can be negative. The enumeration constants id1, …, idN are initialized respectively with the values of type int val1, …, valN. If a value valP is not given to initialize an enumeration constant idP, idP takes the value of the preceding enumeration constant incremented. If the very first value val1 is not specified, id1 takes the value of zero. The declaration of an enumeration creates a new type. Keep in mind an enumeration tag is not a type specifier (type name) but the name of the enumeration. Consequently, once an enumerated type has been defined, you can use it as
any type but you still have to specify the keyword enum before the tag when declaring a variable. To declare a variable of enumerated type whose tag is enum_tag, use the following syntax: enum enum_tag var;
A variable of enumerated type is supposed to take one of the integer constants defined by the enumeration. If you set to it to any integer value, it does make no sense: in this case, you’d better use an integer type instead of an enumeration type. In our example enum2.c, we did not give initialization values to the enumeration constants, which caused the enumeration constant SUNDAY to take the value 0, MONDAY the value 1, and so on. In the following example, we specify the very first initialization value: $ cat enum3.c #include #include int main(void) { enum days { SUNDAY=1, MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY }; enum days d; d = SUNDAY; printf(“d=%d\n”, d); d = FRIDAY; printf(“d=%d\n”, d); return EXIT_SUCCESS; } $ comp enum3 $ gcc -o enum3 -std=c99 -pedantic enum3.c $ ./enum3 d=1 d=6
In the following example, we provide an explicit value to every enumeration constant: $ cat enum4.c #include #include int main(void) { enum shape { CIRCLE=0, RECTANGLE=4, TRIANGLE=3 }; enum shape s;
s = CIRCLE; printf(“s=%d\n”, s); s = TRIANGLE; printf(“s=%d\n”, s); return EXIT_SUCCESS; } $ gcc -o enum4 -std=c99 -pedantic enum4.c $ ./enum4 s=0 s=3
You are allowed to use unnamed enumerated type by omitting the tag as in the following example: $ cat enum5.c #include #include int main(void) { enum { EVEN = 0, ODD = 1 } remainder; int x = 10; remainder = x % 2; if ( remainder == EVEN ) printf(“%d is even\n”, x); else if ( remainder == ODD ) printf(“%d is odd\n”, x); return EXIT_SUCCESS; } $ gcc -o enum5 -std=c99 -pedantic enum5.c $ ./enum5 10 is even
As said earlier, when you declare a variable of enumerated type, you have to use the keyword enum before the tag. There is a convenient way to bypass it: using the typedef statement that creates an alias for the enumerated type as shown below: $ cat enum6.c #include #include int main(void) { enum shape { CIRCLE=0, RECTANGLE=4, TRIANGLE=3 }; typedef enum shape shape;
shape s; s = TRIANGLE; printf(“s=%d\n”, s); return EXIT_SUCCESS; } $ gcc -o enum6 -std=c99 -pedantic enum6.c $ ./enum6 s=3
The typedef statement can also be used at the time of the declaration of the enumerated type: $ cat enum7.c #include #include int main(void) { typedef enum shape { CIRCLE=0, RECTANGLE=4, TRIANGLE=3 } shape; shape s; s = TRIANGLE; printf(“s=%d\n”, s); return EXIT_SUCCESS; }
The C language lets you declare an enumeration type and variables of that type at the same time: enum [enum_tag] { id1[=val1], id2[=val2], …, idN[=valN] } [var1[, var2…]];
Under this form, the tag can be omitted (anonymous enumeration). The following example creates a new enumeration and two variables with a single declaration: $ cat enum8.c #include #include int main(void) { enum shape { CIRCLE=0, RECTANGLE=4, TRIANGLE=3 } s1,s2; s1 = TRIANGLE; printf(“s1=%d\n”, s1);
return EXIT_SUCCESS; } $ gcc -o enum8 -std=c99 -pedantic enum8.c $ ./enum8 s1=3
The following example creates a variable having an anonymous enumeration type: $ cat enum9.c #include #include int main(void) { enum { CIRCLE=0, RECTANGLE=4, TRIANGLE=3 } e; e = TRIANGLE; printf(“e=%d\n”, e); return EXIT_SUCCESS; } $ gcc -o enum9 -std=c99 -pedantic enum9.c $ ./enum9 e=3
As an enumeration type is an integer type, the arithmetic conversion rules apply (see Chapter II Section II.11 and more specifically Chapter IV Section IV.14). You can assign a variable of arithmetic type an enumeration constant or a variable of enumerated type as shown below: $ cat enum10.c #include #include int main(void) { enum shape { CIRCLE=0, RECTANGLE=4, TRIANGLE=3 }; enum shape s = RECTANGLE; int i = TRIANGLE; printf(“e=%d\n”, e); int f = s; printf(“f=%d\n”, f); return EXIT_SUCCESS; } $ gcc -o enum10 -std=c99 -pedantic enum10.c $ ./enum10
e=3 f=4
Since enumeration types are integer types, enumeration constants and variables of enumerated type can be used with arrays as in the following example: $ cat enum11.c #include #include int main(void) { enum days { SUNDAY, MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY }; char *name_days[] = {[SUNDAY] = “SUNDAY”, [MONDAY]=“MONDAY”, [TUESDAY]=“TUESDAY”, [WEDNESDAY]=“WEDNESDAY”, [THURSDAY]=“THURSDAY”, [FRIDAY]=“FRIDAY”, [SATURDAY]=“SATURDAY” }; // subscripts are enumeration constants int i; enum days iD = MONDAY; char *sD = name_days[ iD ]; // subscript is a variable of enumeration type printf(“%d->%s\n”, iD, sD); printf(“\nList days:\n”); for (i=SUNDAY; i < SATURDAY; i++) printf(“%d->%s\n”, i, name_days[i]); return EXIT_SUCCESS; } $ gcc -o enum11 -std=c99 -pedantic enum11.c $ ./enum11 1->MONDAY List days: 0->SUNDAY 1->MONDAY 2->TUESDAY 3->WEDNESDAY
4->THURSDAY 5->FRIDAY
Obviously, if your program is consistent, an object of enumerated type is supposed to be assigned an enumerated constant or an object of the same type. An enumerated type being an integer type, you could assign a variable of enumerated type an integer value but the behavior depends on the implementation. A compiler may choose to represent an enumerated type by char, a signed integer or unsigned integer. In Chapter VI Section VI.7.2, we will say more about conversions between integers and enumerated types. To write a portable C program, if you actually want to use an integer value, do not set a variable of enumerated type to any integer value: set it to a value ranging from [0SCHAR_MAX] or ranging from the minimum enumeration constant and the maximum enumeration constant. It is good practice to set it to an enumerated constant or a variable of the same type as in the following code snippet. enum shape { CIRCLE=0, RECTANGLE=4, TRIANGLE=3 }; enum shape s1=RECTANGLE, s2; s2 = s1;
VI.3 Structures VI.3.1 Declaration VI.3.1.1 Complete type A structure, also known as a record in computer science, is a data structure that comprises a set of elements that can have the same or different types. Each item is called a member of the structure (in computer science it also known as a field). In C, a structure is declared as follows: struct struct_name { obj_type1 mem1; obj_type2 mem2; … obj_typeN memN; };
Where: [48] o struct_name, called a tag , is the identifier of the structure composed of letters, digits and underscores and starting with an underscore or a letter. The new type called struct struct_name can be used to declare variables. o obj_type1, obj_type2, …, obj_typeN are the types of the members mem1, mem2, …, memN.
o mem1, mem2, …, memN are the identfiiers of the members. The members can be of any type with the exception of variably modified types (VM types, see Chapter III Section III.9, and Chapter VII Section VII.17). A declaration of a structure specifying its members is called a definition: the type is said to be complete since the compiler has enough information to compute its size. In the following example, we define the structure student composed of three members: first_name, last_name and age: $ cat struct_decl1.c #include #include int main(void) { struct student { char *first_name; char *last_name; int age; }; printf(“sizeof(struct student) = %d\n”, sizeof(struct student) ); return EXIT_SUCCESS; } $ gcc -o struct_decl1 -std=c99 -pedantic struct_decl1.c $ ./struct_decl1 sizeof(struct student) = 12
The structure student occupies 12 bytes in our computer. This is enough to hold two pointers (a pointer fits in four bytes in our computer) and one int (four bytes in our computer). The size of a structure is at least the sum of the sizes of its elements. A structure type is a programmer-defined type you can use to declare objects as you would do with any other type. However, the keyword struct must be still specified when declaring an object of type structure: struct struct_name obj;
Here is an example: $ cat struct_decl2.c #include
#include #define NAME_MAX_LEN 32 int main(void) { struct student { char first_name[ NAME_MAX_LEN ]; char last_name[ NAME_MAX_LEN ]; int age; }; struct student st1; return EXIT_SUCCESS; }
In the above example, the object st1 is declared as type structure student. The typedef statement is often used to create an alias for a structure type. $ cat struct_decl3.c #include #include #define NAME_MAX_LEN 32 int main(void) { struct student { char first_name[ NAME_MAX_LEN ]; char last_name[ NAME_MAX_LEN ]; int age; }; typedef struct student student; student st1; return EXIT_SUCCESS; }
The typedef statement can be placed before the declaration of the structure. $ cat struct_decl4.c #include #include #define NAME_MAX_LEN 32
int main(void) { typedef struct student student; struct student { char first_name[ NAME_MAX_LEN ]; char last_name[ NAME_MAX_LEN ]; int age; }; student st1; return EXIT_SUCCESS; }
The typedef statement can also be used at the time of the declaration of the structure. $ cat struct_decl5.c #include #include #define NAME_MAX_LEN 32 int main(void) { typedef struct student { char first_name[ NAME_MAX_LEN ]; char last_name[ NAME_MAX_LEN ]; int age; } student; student st1; return EXIT_SUCCESS; }
In C, you can also declare objects with an anonymous structure type. In this case, the structure tag is just omitted as shown below: $ cat struct_decl6.c #include #include #define NAME_MAX_LEN 32 int main(void) { struct {
char first_name[ NAME_MAX_LEN ]; char last_name[ NAME_MAX_LEN ]; int age; } st1, st2; return EXIT_SUCCESS; }
VI.3.1.2 Incomplete structure type The C language let you declare a structure without providing its members, in which case, the compiler will create an incomplete type that you cannot reuse to declare a variable until you define it by specifying all its members. The type is incomplete because the compiler cannot compute its size. An incomplete structure type is explicitly declared as follows: struct struct_name;
We will explain the use of such a declaration in Chapter VI Section VI.3.7 and Chapter VIII Section VIII.6.3.2. An incomplete type is a known type but with an unknown size. After declaring an incomplete structure type, later, somewhere within the program, you have to complete it before using it as shown below: $ cat struct_decl7.c #include #include int main(void) { struct my_integer; // type declared: incomplete type struct my_integer { int k; }; // type defined: it is complete struct my_integer k; // valid return EXIT_SUCCESS; }
Normally, in C, if you declare a variable with an unknown type, you get an error indicating the type does not exist as shown below: $ cat struct_decl8.c #include #include int main(void) {
my_integer k; return EXIT_SUCCESS; } $ gcc -o struct_decl8 -std=c99 -pedantic struct_decl8.c struct_decl8.c: In function ‘main’: struct_decl8.c:5:3: error: ‘my_integer’ undeclared (first use in this function) struct_decl8.c:5:3: note: each undeclared identifier is reported only once for each function it appears in struct_decl8.c:5:14: error: expected ‘;’ before ‘k’
The compiler complained logically: the type my_integer was unknown to the compiler. With structure types, things are quite different. It worth noting the keyword struct followed by a tag always creates a new structure type if no structure with that tag is visible. Compare the previous example with the following: $ cat struct_decl9.c #include #include int main(void) { struct my_integer k; return EXIT_SUCCESS; } $ gcc -o struct_decl9 -std=c99 -pedantic struct_decl9.c struct_decl9.c: In function ‘main’: struct_decl9.c:5:21: error: storage size of ‘k’ isn’t known
In the example above, we got a different error. The compiler did not say the structure type did not exit but it had an unknown size. What does it mean? Keep in mind the keyword struct followed by a tag creates a new type if no structure type with tag is visible (the rule has many consequences as we will find it out through the book). If the members are specified, the structure type is complete but if the members are not present, the new structure type is incomplete: the compiler has not enough information to compute its size and then it cannot allocate the appropriate storage for an object of such a type. Thus, as no structure type with the tag my_integer was visible at the time of the declaration of the object k, the declaration struct my_integer k created an incomplete type and declared the variable k with that type. All happens as if we had declared previously the incomplete structure type. The example struct_decl9.c s equivalent to the following one: $ cat struct_decl10.c #include #include int main(void) {
struct my_integer; // declare incomplete structure type struct my_integer k; // declare k with an incomplete type. Not permitted return EXIT_SUCCESS; } $ gcc -o struct_decl10 -std=c99 -pedantic struct_decl10.c struct_decl10.c: In function ‘main’: struct_decl10.c:7:21: error: storage size of ‘k’ isn’t known
In summary, if no structure type is visible and you declare an object of that type, the compiler will create an incomplete structure type. If a structure type is visible and you declare an object of that type, the compiler will just declare the object with that type.
VI.3.2 Initializing structures Initializing an object means giving it a value at the time of the declaration. You can initialize an object obj of structure type by providing values between braces as for arrays. At declaration time, a structure can be initialized (such a declaration is called a definition) as follows: struct struct_name obj = { val1, val2, … valN, };
Where struct_name is declared as follows: struct struct_name { obj_type1 mem1; obj_type2 mem2; … obj_typeN memN; };
The members mem1, mem2,.., mem4 are respectively assigned the values val1, val2,…, valN. Here is an example: $ cat struct_init1.c #include #include int main(void) {
typedef struct student student; struct student { char *first_name; char *last_name; int age; }; student st1 = {“Christine”, “Sun”, 35 }; student st2 = {“David”, “Moon”, 44 }; return EXIT_SUCCESS; }
The drawback of the method is the values within braces must appear in the same order as the members to be initialized. For example, the statement student st1 = {“Christine”, “Sun”, 35 } sets the member first_name to “Christine”, last_name to “Sun” and age to 35. Why is it a drawback? If you have a structure with several members, say five members, and you wish to initialize only the last one, with this method, you cannot do it. Fortunately, the C99 introduced a new way of initializing an object of type structure by specifying the values only for the members to be initialized: struct struct_name obj = { .memx=valx; .memy=valy; … };
Our previous example can be also written as follows: $ cat struct_init2.c #include #include int main(void) { typedef struct student student; struct student { char *first_name; char *last_name; int age; }; student st1 = {.age=35, .last_name=“Sun”, .first_name=“Christine”};
student st2 = {.first_name=“David”, .age=44, .last_name=“Moon”, }; return EXIT_SUCCESS; }
What is then the default value for uninitialized members? It is too soon to give a meaningful answer because it depends on the storage duration of the object. If it has automatic storage duration, uninitialized members have an undefined value. If the object has static storage duration, uninitialized members take the value of 0. We will not talk about storage duration now but in Chapter VII Section VII.7. After the declaration of an object of structure type, you cannot set new values as described earlier. The following example will fail to compile: $ cat struct_init3.c #include #include int main(void) { typedef struct student student; struct student { char *first_name; char *last_name; int age; }; student st1; st1 = {.age=35, .last_name=“sun”, .first_name=“Christine”}; return EXIT_SUCCESS; } $ gcc -o struct_init3 -std=c99 -pedantic struct_init3.c struct_init3.c: In function ‘main’: struct_init3.c:15:9: error: expected expression before ‘{‘ token
After the declaration, to set values to members, you have to access the members of the structure as described in the following section.
VI.3.3 Accessing members We have learned the way to declare a structure, let us take one more step forward: how
could we have access to a member? And how could be modify it? The member-access operator denoted by . (dot) allows you to access a member of a structure. If struct_obj is an object of structure type, struct_obj.obj_mb1 represents the member obj_mb1. The example below declares the object st1, initializing it, and displays the values of the members: $ cat struct_access1.c #include #include #define NAME_MAX_LEN 32 int main(void) { typedef struct student student; struct student { char first_name[NAME_MAX_LEN]; char last_name[NAME_MAX_LEN]; int age; }; student st1 = {“Christine”, “Sun”, 35 }; student st2 = {“David”, “Moon”, 44 }; printf(“First Name: %s\n”, st1.first_name); printf(“Last Name: %s\n”, st1.last_name); printf(“Age: %d\n\n”, st1.age); printf(“First Name: %s\n”, st2.first_name); printf(“Last Name: %s\n”, st2.last_name); printf(“Age: %d\n”, st2.age); return EXIT_SUCCESS; } $ gcc -o struct_access1 -std=c99 -pedantic struct_access1.c $ ./struct_access1 First Name: Christine Last Name: Sun Age: 35
First Name: David Last Name: Moon Age: 44
The following example is equivalent to the previous one. After declaring the object st1, without initializing it, it assigns values to its members and displays them: $ cat struct_access2.c #include #include #include #define NAME_MAX_LEN 32 int main(void) { typedef struct student student; struct student { char first_name[ NAME_MAX_LEN ]; char last_name[ NAME_MAX_LEN ]; int age; }; student st1; strcpy(st1.first_name, “Christine”); strcpy(st1.last_name, “Sun”); st1.age = 35; printf(“First Name: %s\n”, st1.first_name); printf(“Last Name: %s\n”, st1.last_name); printf(“Age: %d\n”, st1.age); return EXIT_SUCCESS; } $ gcc -o struct7 -std=c99 -pedantic struct7.c $ ./struct7 First Name: Christine Last Name: Sun Age: 35
VI.3.4 Array of structures An array can hold elements of structure type. In the following example, the array student_list contains a set of elements having a structure type. $ cat struct_array1.c #include #include #include #define NAME_MAX_LEN 32 int main(void) { int nb_elt = 10; /* maximum number of students in array student_list */ int i; typedef struct student student; struct student { char first_name[ NAME_MAX_LEN ]; char last_name[ NAME_MAX_LEN ]; int age; }; student student_list[ nb_elt ]; strcpy(student_list[0].first_name, “Christine”); strcpy(student_list[0].last_name, “Sun”); student_list[0].age = 35; strcpy(student_list[1].first_name, “David”); strcpy(student_list[1].last_name, “Moon”); student_list[1].age = 44; student_list[2].first_name[0] = ‘\0’; student_list[2].last_name[0] = ‘\0’; student_list[2].age = 0; /* Display list of elements in array student_list */ for (i=0; i < nb_elt; i++ ) { if ( ! student_list[i].age ) break; printf(“First Name: %s\n”, student_list[i].first_name);
printf(“Last Name: %s\n”, student_list[i].last_name); printf(“Age: %d\n\n”, student_list[i].age); } return EXIT_SUCCESS; } $ gcc -o struct_array1 -std=c99 -pedantic struct_array1.c $ ./struct_array1 First Name: Christine Last Name: Sun Age: 35 First Name: David Last Name: Moon Age: 44
The example does not contain problems, except possibly the lines student_list[2].first_name[0] = ‘\0’ and student_list[2].last_name[0] = ‘\0’. The third element of the array (of subscript 2) was used to indicate there are no more items. Take note the subscript operator (i.e. []) and the member-access operator dot (.) have same precedence and as both have left associativity student_list[2].first_name[0] is equivalent to ((student_list[2]).first_name)[0].
VI.3.5 Pointer to structure Structures allow us to build high-level data structures involving pointers. The following example declares a pointer to a structure: $ cat struct_pointer1.c #include #include #include #define NAME_MAX_LEN 32 int main(void) { typedef struct student student; struct student { char first_name[ NAME_MAX_LEN ]; char last_name[ NAME_MAX_LEN ]; int age; };
student *st1 = malloc( sizeof( student ) ); strcpy( (*st1).first_name, “Christine” ); strcpy( (*st1).last_name, “Sun” ); (*st1).age = 35; printf(“First Name: %s\n”, (*st1).first_name); printf(“Last Name: %s\n”, (*st1).last_name); printf(“Age: %d\n”, (*st1).age); return EXIT_SUCCESS; } $ gcc -o struct_pointer1 -std=c99 -pedantic struct_pointer1.c $ ./struct_pointer1 First Name: Christine Last Name: Sun Age: 35
The pointer st1 points to a structure. We allocated a memory area that would be able to store an object of type student. You can notice to access members, we had to dereference the pointer first in order to access the object pointed to by the pointer. We used parentheses because the member-access operator (.) has precedence over the dereference operator *. The C language defines a more convenient operator enabling to access members without explicitly dereferencing pointers: if p_obj is pointer to an object to a structure, p_obj->mb1 denotes the member mb1. Thus, (*st1).first_name can also be written st1>first_name. As a consequence, our previous example can be rewritten more gracefully as follows: $ cat struct_pointer2.c #include #include #include #define NAME_MAX_LEN 32 int main(void) { typedef struct student student; struct student { char first_name[ NAME_MAX_LEN ]; char last_name[ NAME_MAX_LEN ]; int age; };
student *st1 = malloc( sizeof( student ) ); strcpy( st1->first_name, “Christine”); strcpy( st1->last_name, “Sun”); st1->age = 35; printf(“First Name: %s\n”, st1->first_name); printf(“Last Name: %s\n”, st1->last_name); printf(“Age: %d\n”, st1->age); return EXIT_SUCCESS; } $ gcc -o struct_pointer2 -std=c99 -pedantic struct_pointer2.c $ ./struct_pointer2 First Name: Christine Last Name: Sun Age: 35
In example struct_array1.c, we defined an array of structures. The drawback of arrays is we cannot increase their size if there is no enough space to hold new elements: the array size is defined once and for all at the time of the declaration. That is why pointers are often preferred. They can be grown as needed. In the following example, we rewrite the example struct_array1.c with pointers: $ cat struct_pointer3.c #include #include #include #define NAME_MAX_LEN 32 int main(void) { int nb_elt = 10; /* number of students in student_list */ int i; typedef struct student student; struct student { char first_name[ NAME_MAX_LEN ]; char last_name[ NAME_MAX_LEN ]; int age; }; student *student_list = malloc (nb_elt * sizeof *student_list );
if ( !student_list) { printf(“Cannot allocate memory for pointer student_list\n”); return EXIT_FAILURE; } strcpy( student_list[0].first_name, “Christine” ); strcpy( student_list[0].last_name, “Sun” ); student_list[0].age = 35; strcpy( student_list[1].first_name, “David” ); strcpy( student_list[1].last_name, “Moon” ); student_list[1].age = 44; strcpy( student_list[2].first_name, “EOF_ARRAY” ); strcpy( student_list[2].last_name, “EOF_ARRAY” ); student_list[2].age = 0; /* Display list of elements in array student_list */ for (i=0; i < nb_elt; i++ ) { if ( ! strcmp( student_list[i].first_name, “EOF_ARRAY” ) ) break; printf(“First Name: %s\n”, student_list[i].first_name); printf(“Last Name: %s\n”, student_list[i].last_name); printf(“Age: %d\n\n”, student_list[i].age); } return EXIT_SUCCESS; } $ gcc -o struct_pointer3 -std=c99 -pedantic struct_pointer3.c $ ./struct_pointer3 First Name: Christine Last Name: Sun Age: 35 First Name: David Last Name: Moon Age: 44
VI.3.6 Nested structures VI.3.6.1 Accessing members of nested structures
As you may have guessed, structures allow building advanced types. For example, members of a structure can be themselves structures. Structures containing structures are called nested structures. For example, the following structure is a nested structure: struct my_struct1 { struct { int a; int b; } mem1; float f; }
The initialization of such a structure is quite natural. Since the inner structure struct { int a; int b} can be initialized by {10, 20 }, the structure my_struct1 can be initialized with { {10, 20}, 10.8 }. The question that naturally arises is how could we access the members of nested structures? In the same way as simple structures. For example, if we declare the object st1 as struct my_struct1 st1 o The member a of the nested structure is accessed like this: st1.mem1.a o The member b of the nested structure is accessed like this: st1.mem1.b o The member f is accessed like this: st1.f If ptr_st1 is declared as struct my_struct1 *ptr_st1: o The member a of the nested structure is accessed through ptr_st1->mem1.a o The member b of the nested structure is accessed through ptr_st1->mem1.b o The member f is accessed like this: st1->f Here is an example: $ cat struct_nested1.c #include #include struct my_struct1 { struct { int a; int b;
} mem1; float f; }; int main(int argc, char **argv) { struct my_struct1 st1 = { {10,20}, 10.8 }; struct my_struct1 *ptr_st1 = &st1; printf(“%d %d %f\n”, st1.mem1.a, st1.mem1.b, st1.f); printf(“%d %d %f\n”, ptr_st1->mem1.a, ptr_st1->mem1.b, ptr_st1->f); return EXIT_SUCCESS; } $ gcc -o nested_struct1 -std=c99 -pedantic nested_struct1.c $ ./nested_struct1 10 20 10.800000 10 20 10.800000
What if a member is a pointer to another structure? In the following structure, the member mem1 is a pointer to a structure: struct my_struct2 { struct { int a; int b; } *ptr_mem1; float f; }
If we declare the object st2 as struct my_struct2 st2 o The member a of the inner structure is accessed like this: st2.mem1->a o … If we declare the object ptr_st2 as struct my_struct2 *ptr_st2 o The member a of the inner structure can be accessed like this: ptr_st2->mem1->a o …
For example: $ cat struct_nested2.c #include #include struct my_struct1 { struct { int a; int b; } *mem1; float f; }; int main(int argc, char **argv) { struct my_struct1 st1; struct my_struct1 *ptr_st1 = &st1; st1.mem1 = malloc(sizeof *(st1.mem1)); st1.mem1->a = 10; /* same as ptr_str1->mem1->a = 10 */ st1.mem1->b = 20; /* same as ptr_str1->mem1->b = 20 */ st1.f = 10.8; /* same as ptr_str1->f = 10.8 */ printf(“%d %d %f\n”, st1.mem1->a, st1.mem1->b, st1.f); printf(“%d %d %f\n”, ptr_st1->mem1->a, ptr_st1->mem1->b, ptr_st1->f); free(st1.mem1); /* same as free(ptr_st1->mem1) */ return EXIT_SUCCESS; } $ gcc -o nested_struct2 -std=c99 -pedantic nested_struct2.c $ ./nested_struct2 10 20 10.800000 10 20 10.800000
VI.3.6.2 Initializing nested structures Suppose you wish to save in data structures information about students: their first name, last name and birth date. You have many ways to implement it. A simple way to do it could be: struct student {
char first_name[72]; char last_name[72]; char birthdate[9]; /* such as 15122000 */ }
It also could be implemented like this: struct student { struct person { char first_name[72]; char last_name[72]; } person; struct date { int month; int day; int year; } birthdate; }
In the latter case, our structure student is composed of two members that are also of structure type: person and birthdate. Now, how do you think such a structure could be initialized? In the same manner as we did for simpler structures. Since we have two methods for initializing members, and due the complexity of the structure, you have several ways to initialize it: by giving values without specifying the members or by giving values specifying the members or both of them. Let us consider the first embedded structure person. We could initialize it in two ways: o { “Christine”, “sun” } o Or { .first_name=“Christine”, .last_name=“sun” } For the second embedded structure date we also have two ways: o { 7, 4, 2002 } o Or { .year=2002, .month=7, .day=4 } This implies you have several ways to initialize the structure student: o struct student st1= {
{ “Christine”, “sun” }, { 7, 4, 2002 },
} o struct student st1={ { .first_name=“Christine”, .last_name=“sun” }, { 7, 4, 2002 },
} o struct student st1= { { .first_name=“Christine”, .last_name=“sun” }, { .year=2002, .month=7, .day=4 } }
o struct student st1= { .person={ .first_name=“Christine”, .last_name=“sun” }, .birthdate={ 7, 4, 2002 },
} o struct student st1= { .person={ “Christine”, “sun” }, .birthdate={ 7, 4, 2002 },
} o … Here is a piece of code showing what we said: $ cat struct_nested3.c #include #include #define MAX_NAME_LEN 72 int main(void) { struct student { struct person { char first_name[MAX_NAME_LEN];
char last_name[MAX_NAME_LEN]; } person; struct date { int month; int day; int year; } birthdate; }; struct student st1 = { { “Christine”, “sun” }, { 7, 4, 2002 }, }; struct student st2 = { { .first_name=“Christine”, .last_name=“sun” }, { 7, 4, 2002 }, }; struct student st3 = { { .first_name=“Christine”, .last_name=“sun” }, { .year=2002, .month=7, .day=4 } }; struct student st4 = { .person={ .first_name=“Christine”, .last_name=“sun” }, .birthdate={ 7, 4, 2002 }, }; struct student st5 = { .person={ “Christine”, “sun” }, .birthdate={ 7, 4, 2002 }, }; struct student list_st[] = { st1, st2, st3, st4, st5 }; int i; int nb_elt = sizeof list_st/sizeof list_st[0]; for (i=0; i < nb_elt; i++) printf(“%s %s %d/%d/%d\n”, list_st[i].person.first_name,
list_st[i].person.last_name, list_st[i].birthdate.month, list_st[i].birthdate.day, list_st[i].birthdate.year); return EXIT_SUCCESS; } $ gcc -o struct_nested3 -std=c99 -pedantic struct_nested3.c $ ./struct_nested3 Christine sun 7/4/2002 Christine sun 7/4/2002 Christine sun 7/4/2002 Christine sun 7/4/2002 Christine sun 7/4/2002
VI.3.7 Incomplete types and forward references There are two kinds of declarations for structure types: declarations including a definition and simple declarations. A declaration that specifies the members of a structure is a definition: the type is complete. A simple declaration, that omits the members of a structure, declares an incomplete structure type. An incomplete type is type whose size is unknown. A structure type that is not defined is an incomplete type. There are several kinds of incomplete types (described in Chapter VIII Section VIII.6.3.2), an incomplete structure type is only one of them. An incomplete type can be explicitly declared such as in the following example: struct string;
An incomplete type is also created by the declaration of a pointer to an undeclared structure type. In two special contexts, incomplete structure types can be used: o When declaring a pointer to a structure type not created creates it o Creating an alias for a structure type by using typedef The following example is valid: $ cat struct_incomplete1.c int main(void) { struct string *p; // pointer to incomplete type return 0; }
It is equivalent to: int main(void) { struct string; struct string *p; // pointer to incomplete type return 0; }
The standard C allows declaring a pointer to an incomplete type because it is not necessary to know the size of the pointed-to type. The size of a pointer is always known and then it can be allocated a memory area when declared. You may argue that pointers to structures may have a size depending on the structure. Fortunately, this is not the case: pointers to structures have the same representation and alignment. As long as a pointer to an incomplete type is not dereferenced, all is fine but before dereferencing it, the structure type struct string has to be completed. Completing a structure type means declaring it by defining its members. You can do it after the incomplete type is declared as shown below: $ cat struct_incomplete2.c int main(void) { struct string *p; // pointer to incomplete type. Forward reference struct string { char *s; int len; }; // struct string is complete return 0; }
A new type deriving from an incomplete type can be created with typedef: $ cat struct_incomplete3.c int main(void) { typedef struct string string; return 0; }
The new type string cannot be used to declare variables until it is completed. Allowing incomplete structure types and pointers to incompletes type is very useful. Consider two structures that reference each other; without such a feature, you will not be
able to do it. The following example uses this facility: struct A { char s[255]; struct B *p; // forward reference: points to struct B not yet defined }; struct B { int k; struct A *q; };
In the example above, the pointer p points to a type whose definition is delayed (forward reference): at the time the member p of the structure A is declared, the structure B has not been defined yet. In contrast, the following declaration of the structure A is not valid because at the time of the declaration of the member str_b, the structure B has not been defined (its size is unknown and then the member str_b cannot be allocated storage): struct A { char s[255]; struct B str_b; // invalid: struct B is an incomplete type }; struct B { int k; struct A str_b; // valid, struct A is a complete type };
The following example also takes advantage of this feature allowing building recursive high-level data structures such as linked lists: struct string { char s[255]; int len; }; struct node { struct string s; struct node *ptr_next_node; };
In the example above, the pointer ptr_next_node points to an incomplete type: at the time the member ptr_next_node of the structure node is declared, the size of the structure node is still unknown since its definition is being constructed. The definition of a structure is
considered complete when the right brace } is encountered. Moreover, this feature allows encapsulating your data safely and efficiently as we will find out in Chapter VIII Section VIII.11.
VI.3.8 High-level data structures Combining pointers and structures enable to create high-level data structures. The most commonly used data structures are link lists and trees. VI.3.8.1 Linked lists A linked list is a collection of structures called nodes. Each structure contains data and a pointer to another structure as depicted in Figure VI‑1.
Figure VI‑1 Linked list
The last element of a linked list is a null pointer, which allows determining the tail of the linked list. The head of a linked list is the very first allocated structure. Our examples struct_array1.c and struct_pointer3.c can be rewritten by using a linked list (see Figure VI‑1): $ cat struct_hl_ds1.c #include #include #include #define NAME_MAX_LEN 32 int main(void) { int nb_elt = 10; /* number of students in student_list */ int i; typedef struct student student; student *p, *student_list, *q; struct student { char first_name[ NAME_MAX_LEN ]; char last_name[ NAME_MAX_LEN ]; int age; student *p_next; }; /* first structure: head */ student_list = malloc ( sizeof *student_list ); if ( !student_list) { printf(“Cannot allocate memory for pointer student_list\n”); return EXIT_FAILURE; } strcpy( student_list->first_name, “Christine” ); strcpy( student_list->last_name, “Sun” ); student_list->age = 35; p = malloc ( sizeof *student_list ); /* allocate memory for next structure */ if ( !p ) { printf(“Cannot allocate memory for pointer student_list\n”); return EXIT_FAILURE; } student_list->p_next = p;
/* Second structure */ strcpy( p->first_name, “David” ); strcpy( p->last_name, “Moon” ); p->age = 44; p->p_next = NULL; /* tail of the list */ /* Display linked list student_list */ for (q = student_list; q != NULL; q = q->p_next ) { printf(“First Name: %s\n”, q->first_name); printf(“Last Name: %s\n”, q->last_name); printf(“Age: %d\n\n”, q->age); } return EXIT_SUCCESS; } $ gcc -o struct_hl_ds1 -std=c99 -pedantic struct_hl_ds1.c $ ./struct_hl_ds1 First Name: Christine Last Name: Sun Age: 35 First Name: David Last Name: Moon Age: 44
A linked list is very interesting because only one memory block is allocated at a time for a structure when required. The linked list can be grown easily: you just allocate a new memory block, copy information into it, set the p_next pointer of the previous structure to the pointer of the newly allocated structure. You can also remove easily a structure: the p_next pointer of the previous structure is set to the pointer p_next of the structure you want to remove. VI.3.8.2 Trees Programmers also resort to trees to organize their data. A tree is a linked list with several pointers to other structures. The simplest tree is a binary tree. It is a structure holding data and two pointers as depicted in Figure VI‑2.
Figure VI‑2 Tree data structure
An element of a tree is called a node. The top node of the tree is known as a root node or root. A node is called parent if it references one or more nodes called children. Nodes that have no children are called leaves. In Figure VI‑2, the node a is the root and parent of the children b and c. Nodes d, e, f, and g are leaves. Here is an example of a tree data structure:
$ cat struct13.c #include #include int main(void) { typedef struct myTree myTree; myTree *p_left, *root_tree, *p_right, *p; int c; struct myTree { char c; myTree *p_left; myTree *p_right; }; root_tree = malloc( sizeof *root_tree ); root_tree->c = ‘a’; p_left = malloc( sizeof *p_left ); p_left->c = ‘b’; root_tree->p_left = p_left; p_left->p_left = p_left->p_right = NULL; p_right = malloc( sizeof *p_right ); p_right->c = ‘c’; root_tree->p_right = p_right; p_right->p_left = p_right->p_right = NULL; return EXIT_SUCCESS; }
In the example above, we did not test the pointers returned by malloc() were valid in order to make the program easier to understand. Of course, in your program, do it…
VI.3.9 Structures and operators You cannot apply C operators on structures with the exception of the simple assignment operator = and the address operator &, and the member-access operators (. and ->). Here is an example: $ cat struct_op1.c #include #include
#include #define NAME_MAX_LEN 32 int main(void) { typedef struct student student; struct student { char first_name[ NAME_MAX_LEN ]; char last_name [NAME_MAX_LEN ]; int age; }; student st1 = {“Christine”, “Sun”, 35 }; student st2 = st1; printf(“First Name: %s\n”, st2.first_name); printf(“Last Name: %s\n”, st2.last_name); printf(“Age: %d\n”, st2.age); return EXIT_SUCCESS; } $ gcc -o struct_op1 -std=c99 -pedantic struct_op1.c $ ./struct_op1 First Name: Christine Last Name: Sun Age: 35
The assignment operation copies the value of each member of the structure on the right side of the equal sign to the corresponding member of the other structure on the left side of the equal sign. In the example struct_op1.c, the declaration of the structures st1 and st2 creates both structures with their members. The assignment st2 = st1 copies the value of each member of st1 into the corresponding member of st2. Thus, the items of the array first_name of the structure st1 are copied into the array first_name of structure st2. Likewise, the elements of the array last_name in the structure st1 are copied into the array last_name in structure st2. Finally, the value of the member age in the structure st1 is copied into the member age in structure st2. The example is interesting because it shows if a member is an array, all of its items are completely copied. Such a copy is called a deep copy. This holds true for whatever the type of members unless it is a pointer…If a member is a pointer, only the address of the referenced object (held in the pointer) is copied: the pointed-to object itself is not copied.
Such copy is also known as a shallow copy. This implies if you assign an object of type structure to another object of type structure, members that are pointers point to the same objects! Consequently, you have to watch out for the assignments of structures if some members are pointers. Let us show it through simple an example. Can you see why the following example is not correct? $ cat struct_op2.c #include #include #include #define NAME_MAX_LEN 32 int main(void) { typedef struct student student; struct student { char *first_name; char *last_name; int age; }; student st1, st2; st1.first_name = malloc( NAME_MAX_LEN ); st1.last_name = malloc( NAME_MAX_LEN ); strcpy(st1.first_name, “Christine”); strcpy( st1.last_name, “Sun”); st1.age = 35; st2 = st1; strcpy( st2.first_name, “David” ); strcpy( st2.last_name, “Moon” ); st2.age = 45; printf(“First Name: %s\n”, st1.first_name); printf(“Last Name: %s\n”, st1.last_name); printf(“Age: %d\n\n”, st1.age);
printf(“First Name: %s\n”, st2.first_name); printf(“Last Name: %s\n”, st2.last_name); printf(“Age: %d\n”, st2.age); return EXIT_SUCCESS; } $ gcc -o struct_op2 -std=c99 -pedantic struct_op2.c $ ./struct_op2 First Name: David Last Name: Moon Age: 35 First Name: David Last Name: Moon Age: 45
The assignment st2 = st1 copies the value of each member of st1 into the corresponding member of st2. This implies it also copies the pointers: the pointers of st1 points to the same objects as the pointers of st2. In our example, the members first_name of the structures st1 and st2 point to the same memory block (same note for the member last_name). The following example shows the pointers are copied but not the objects their reference: $ cat struct_op3.c #include #include #include #define NAME_MAX_LEN 32 int main(void) { typedef struct student student; struct student { char *first_name; char *last_name; int age; }; student st1, st2; st1.first_name = malloc( NAME_MAX_LEN ); st1.last_name = malloc( NAME_MAX_LEN ); st2 = st1;
printf(“address first_name: st1=%p and st2=%p\n”, st1.first_name, st2.first_name); printf(“address last_name: st1=%p and st2=%p\n”, st1.last_name, st2.last_name); return EXIT_SUCCESS; } $ gcc -o struct_op3 -std=c99 -pedantic struct_op3.c $ ./struct_op3 address first_name: st1=8061040 and st2=8061040 address last_name: st1=8061068 and st2=8061068
In summary, you must allocate memory for members that are pointers as in the example below: $ cat struct_op4.c #include #include #include #define NAME_MAX_LEN 32 int main(void) { typedef struct student student; struct student { char *first_name; char *last_name; int age; }; student st1, st2; st1.first_name = malloc( NAME_MAX_LEN ); st1.last_name = malloc( NAME_MAX_LEN ); strcpy(st1.first_name, “Christine”); strcpy( st1.last_name, “Sun”); st1.age = 35; st2.first_name = malloc( NAME_MAX_LEN ); st2.last_name = malloc( NAME_MAX_LEN ); strcpy( st2.first_name, “David” ); strcpy( st2.last_name, “Moon” ); st2.age = 45;
printf(“First Name: %s\n”, st1.first_name); printf(“Last Name: %s\n”, st1.last_name); printf(“Age: %d\n\n”, st1.age); printf(“First Name: %s\n”, st2.first_name); printf(“Last Name: %s\n”, st2.last_name); printf(“Age: %d\n”, st2.age); return EXIT_SUCCESS; } $ gcc -o struct_op4 -std=c99 -pedantic struct_op4.c $ ./struct18 First Name: Christine Last Name: Sun Age: 35 First Name: David Last Name: Moon Age: 45
VI.3.10 Flexible array member Normally within a structure, the size of arrays must be known at declaration time. However, as of the C99 standard, you are allowed to use an array with no specified size (incomplete array type) if it is the last member of the structure: the array is known as a flexible array member. Take note that the flexible array member is ignored as shown below: $ cat struct_flexible_am1.c #include #include int main(void) { struct myArray { int len; int s[]; }; printf(“Sizeof(int)=%d and sizeof(struct myArray)=%d\n”, sizeof(int), sizeof(struct myArray)); return EXIT_SUCCESS; } $ gcc -o struct_flexible_am1 -std=c99 -pedantic struct_flexible_am1.c
$ ./struct_flexible_am1 Sizeof(int)=4 and sizeof(struct myArray)=4
In our computer, an int is represented by 4 bytes, and as you can see it, the structure myArray is also represented in 4 bytes ignoring the last member. This does not mean we cannot work with the member s. In order to use it, we have first to allocate memory for it. How could we do that? Through a pointer as shown below: $ cat struct_flexible_am2.c #include #include int main(void) { int array_len = 10; int i; struct myArray { int len; int s[]; }; typedef struct myArray array; /* allocate memory */ array *int_array = malloc( sizeof(*int_array) + array_len * sizeof(int) ); if ( int_array == NULL ) { printf(“Cannot allocate memory”); return EXIT_FAILURE; } int_array->len = array_len; /* initialize array s */ for (i = 0; i < int_array->len; i++) int_array->s[i] = i; /* displaying the array s */ for (i = 0; i < int_array->len; i++) printf(“int_array->s[%d]=%d\n”, i, int_array->s[i] ); return EXIT_SUCCESS; } $ gcc -o struct_flexible_am2 -std=c99 -pedantic struct_flexible_am2.c $ ./struct_flexible_am2
int_array->s[0]=0 int_array->s[1]=1 int_array->s[2]=2 int_array->s[3]=3 int_array->s[4]=4 int_array->s[5]=5 int_array->s[6]=6 int_array->s[7]=7 int_array->s[8]=8 int_array->s[9]=9
One question arises, if the flexible array member is ignored, as said earlier, it means that an assignment of a structure containing such a member is partial as sketched in the following example: $ cat struct_flexible_am3.c #include #include int main(void) { int array_len = 10; int i; struct myArray { int len; int s[]; }; typedef struct myArray array; /* allocate memory */ array *int_array1, *int_array2; int_array1 = malloc( sizeof(*int_array1) + array_len * sizeof(int) ); if ( int_array1 == NULL ) { printf(“Cannot allocate memory”); return EXIT_FAILURE; } int_array1->len = array_len; /* initialize array s in array1*/ for (i = 0; i < int_array1->len; i++) int_array1->s[i] = i;
int_array2 = malloc( sizeof(*int_array1) + array_len * sizeof(int) ); if ( int_array2 == NULL ) { printf(“Cannot allocate memory”); return EXIT_FAILURE; } //Flexible Array Member is ignored by the following assignment *int_array2 = *int_array1; printf(“int_array2->len=%d\n”, int_array2->len); /* member len has been copied */ /* but array s was not copied at all since ignored */ /* attempt to display the array s in array2 */ for (i = 0; i < int_array2->len; i++) printf(“int_array2->s[%d]=%d\n”, i, int_array2->s[i] ); return EXIT_SUCCESS; } $ gcc -o struct_flexible_am3 -std=c99 -pedantic struct_flexible_am3.c $ ./struct_flexible_am3 int_array2->len=10 int_array2->s[0]=0 int_array2->s[1]=0 int_array2->s[2]=0 int_array2->s[3]=0 int_array2->s[4]=0 int_array2->s[5]=0 int_array2->s[6]=0 int_array2->s[7]=0 int_array2->s[8]=0 int_array2->s[9]=0
Therefore, to perform a full copy of a structure with a flexible array member, we have to invoke the memcpy() function: $ cat struct_flexible_am4.c #include #include #include int main(void) { int array_len = 10;
int i; struct myArray { int len; int s[]; }; typedef struct myArray array; /* allocate memory */ array *int_array1, *int_array2; int_array1 = malloc( sizeof(*int_array1) + array_len * sizeof(int) ); int_array2 = malloc( sizeof(*int_array2) + array_len * sizeof(int) ); if ( ! int_array1|| ! int_array2 ) { printf(“Cannot allocate memory”); return EXIT_FAILURE; } int_array1->len = array_len; /* initialize array s in array1*/ for (i = 0; i < int_array1->len; i++) int_array1->s[i] = i; /* copy of structure int_array1 into int_array2 */ memcpy(int_array2, int_array1, sizeof(*int_array1) + int_array1->len * sizeof(int)); printf(“int_array2->len=%d\n”, int_array2->len); for (i = 0; i < int_array2->len; i++) printf(“int_array2->s[%d]=%d\n”, i, int_array2->s[i] ); return EXIT_SUCCESS; } $ gcc -o struct_flexible_am4 -std=c99 -pedantic struct_flexible_am4.c $ ./struct_flexible_am4 int_array2->len=10 int_array2->s[0]=0 int_array2->s[1]=1 int_array2->s[2]=2 int_array2->s[3]=3
int_array2->s[4]=4 int_array2->s[5]=5 int_array2->s[6]=6 int_array2->s[7]=7 int_array2->s[8]=8 int_array2->s[9]=9
The program worked! We used the memcpy() function that is similar to strcpy(). While the function strcpy() copies strings (terminated by ‘\0’) only, memcpy() copies anything byte to byte. It has the following prototype: Until C95: void *memcpy(void *dest, const void *src, size_t n);
As of C99: void *memcpy(void *restrict dest, const void *restrict src, size_t n);
The memcpy() function copies the memory block pointed to by src into the memory chunk pointed to by dest. Of course, the number of bytes to be copied is specified in the last parameter n. In our example struct_flexible_am4.c, the last argument of memcpy() was the size in bytes of the structure int_array1. In summary, if you use a structure with a flexible array member: o Work with a pointer to it o Do not forget to allocate memory for the flexible array member. o Call the function memcpy() to copy structures. Do not use assignments because the flexible array member is ignored.
VI.4 unions VI.4.1 Declarations VI.4.1.1 Complete type A union is a user-defined type denoting a value that can take several flavors of types. A union is declared in the same way as a structure except the keyword enum substitutes for the keyword struct. A union is declared as follows: union union_tag { obj_type1 obj1; obj_type2 obj2; … obj_typeN objN; };
Where: o union_name, called a tag, is the identifier of the structure composed of letters, digits and underscores and starting with an underscore or a letter. The new type union union_name can then be used to declare variables. o obj_type1, obj_type2, …, obj_typeN are the types of the members obj1, obj2, …, objN. The members can be of any type with the exception of variably modified types. A declaration of a union specifying its members is called a definition: the type is said to be complete since the compiler has enough information to compute its size. Unions works in the same manner as structures, and the same rules apply to them. What is the difference? In a structure, every item will be reserved a piece of memory while in a union, there is a single memory block shared amongst all of the items. Let us start with a simple example: $ cat union_decl1.c #include #include int main(void) { union number { int iVal; double fVal; }; printf(“sizeof(int)=%d\n”, sizeof(int)); printf(“sizeof(double)=%d\n”, sizeof(double)); printf(“sizeof(union number)=%d\n”, sizeof(union number)); return EXIT_SUCCESS; } $ gcc -o union_decl1 -std=c99 -pedantic union_decl1.c $ ./union_decl1 sizeof(int)=4 sizeof(double)=8 sizeof(union number)=8
As you could see it, the size of the union is the size of the largest item. This is actually not surprising since it is supposed to hold any values of the items.
You have three methods to declare an object of union type: o Method 1: after declaring the union type. union union_tag obj;
For example: $ cat union_decl2.c #include #include int main(void) { union number { int iVal; double fVal; }; union number uNb; return EXIT_SUCCESS; }
o Method 2: at the time of the declaration of the union type. union union_tag { obj_type1 obj1; obj_type2 obj2; … obj_typeN objN; } obj;
For example: $ cat union_decl3.c #include #include int main(void) { union number { int iVal; double fVal; } uNb; return EXIT_SUCCESS; }
o Method 3: by using an unnamed union: union { obj_type1 obj1; obj_type2 obj2; … obj_typeN objN; } obj;
For example: $ cat union_decl4.c #include #include int main(void) { union { int iVal; double fVal; } uNb; return EXIT_SUCCESS; }
To avoid repeating the keyword union when referring to a union type, programmers generally invoke the typedef statement that creates an alias to the union type using one of the following ways: typedef union union_tag { obj_type1 obj1; obj_type2 obj2; … obj_typeN objN; } union_typename;
Or typedef union union_tag union_typename;
Or typedef union { obj_type1 obj1; obj_type2 obj2; … obj_typeN objN;
} union_typename;
Where: o union_tag is the identifier of the union o union_typename is an alias for union_tag. For example: $ cat union_decl5.c #include #include int main(void) { typedef union number number; union number { int iVal; double fVal; }; number uNb; return EXIT_SUCCESS; }
VI.4.1.2 Incomplete union type What we said about structures also applies to unions. You can declare a union without providing its members, which causes the compiler to create an incomplete type. As for structures, you cannot use it to declare a variable until you define it by specifying all its members. An incomplete union type is created as follows: union union_tag;
There is another way to create an incomplete union type. As for structures, if you declare an object of an undeclared union type, the compiler will create the incomplete union type. In the following example, the declaration of the pointer p also declares the incomplete union type with the tag number: union number *p;
VI.4.2 Initializing unions Unions are initialized as structures. At declaration time, a union can be initialized as
follows: union union_tag obj = { .memx=valx; };
The following example declares and initializes the object uNb of union type: $ cat union_init1.c #include #include int main(void) { union number { int iVal; double fVal; }; typedef union number number; number uNb1 = {.iVal = 1003 }; number uNb2 = {.fVal = 407.61 }; printf(“uNb.iVal=%d\n”, uNb1.iVal); printf(“uNb.fVal=%f\n”, uNb2.fVal); return EXIT_SUCCESS; } $ gcc -o union_init1 -std=c99 -pedantic union_init1.c $ ./union_init1 uNb.iVal=1003 uNb.fVal=407.610000
Take note that only a single member must be initialized. Once declared, you cannot use this method to set new values to the union. The following example will not compile: $ cat union_init2.c #include #include int main(void) { union number { int iVal; double fVal; };
typedef union number number; number uNb1; uNb1 = {.iVal = 1003 }; printf(“uNb.iVal=%d\n”, uNb1.iVal); return EXIT_SUCCESS; } $ gcc -o union_init2 -std=c99 -pedantic union_init2.c union_init2.c: In function ‘main’: union_init2.c:13:10: error: expected expression before ‘{‘ token
After the declaration, to set values, you will have to access the members as explained in the next section.
VI.4.3 Accessing union members Members of a union are accessed in the same way as a structure. The member-access operator denoted by . (dot) allows you to access a member of a union or a structure. If union_obj is an object of union type, union_obj.obj_mb1 represents the member obj_mb1. Here is an example: $ cat union_access1.c #include #include int main(void) { union number { int iVal; double fVal; }; typedef union number number; number uNb; uNb.iVal = 1003; printf(“uNb.iVal=%d\n”, uNb.iVal); uNb.fVal = 407.61; printf(“uNb.fVal=%f\n”, uNb.fVal); return EXIT_SUCCESS;
} $ gcc -o union_access1 -std=c99 -pedantic union_access1.c $ ./union_access1 uNb.iVal=1003 uNb.fVal=407.610000
Remember there is a single memory block shared amongst items. This implies at a given time only one member is meaningful! Try this: $ cat union_access2.c #include #include int main(void) { union number { int iVal; double fVal; }; typedef union number number; number uNb; uNb.fVal = 407.61; printf(“uNb.iVal=%d\n”, uNb.iVal); return EXIT_SUCCESS; } $ gcc -o union_access2 -std=c99 -pedantic union_access2.c $ ./ union_access2 uNb.iVal=-1889785610
We set the member fVal and we tried to get the value of the member iVal. As expected, we retrieved a value with no meaning. The following example shows the members of a union share the same memory block. We declare uNb as a union and we display the addresses of the items of the union: $ cat union_access3.c #include #include int main(void) { union number { int iVal;
double fVal; }; union number uNb; printf(“&iVal=%p\n”, &uNb.iVal); printf(“&fVal=%p\n”, &uNb.fVal); return EXIT_SUCCESS; } $ gcc -o union_access3 -std=c99 -pedantic union_access3.c $ ./union_access3 &iVal=feffea98 &fVal=feffea98
Compare with a structure: $ cat union_access4.c #include #include int main(void) { struct number { int iVal; double fVal; }; struct number uNb; printf(“&iVal=%p\n”, &uNb.iVal); printf(“&fVal=%p\n”, &uNb.fVal); return EXIT_SUCCESS; } $ gcc -o union_access4 -std=c99 -pedantic union_access4.c $ ./union_access4 &iVal=feffea94 &fVal=feffea98
The examples showed us, in a union, members share the same memory area while in a structure, each member has its own piece of memory. If programmers must know specifically which member of a union they have to access, how could they guess which one holds the right value? By embedding the union within a structure…In the structure, programmers could use an integer (or an enumerated type)
that indicates the type of the current value. Suppose you wanted to create a new type that would denote positive integer numbers that can be represented by either type int or a string storing its binary representation. Here is a piece of code implementing it (using a VLA, works with C99 and C11 compiler): $ cat union_access5.c #include #include int main(void) { enum type_number { INTEGER, BINARY, VOID }; typedef enum type_number type_number; struct number { type_number type; union { unsigned int iVal; char bVal[sizeof(int)]; } uVal; }; typedef struct number number; number nb; nb.type = INTEGER; nb.uVal.iVal = 1003; return EXIT_SUCCESS; }
In example union_access5.c, we embedded the union described earlier within a structure. In the structure number, the member type allows determining the member of the union that holds the correct value. It is has an enumeration type. If the member type holds the value INTEGER, we will retrieve the value in the member iVal. If it holds the value BINARY, we will retrieve the value from the member bVal. If it holds the value VOID, it means it contains nothing valuable. The following example completes the previous example. The user passes a number along with its type: $ cat union_access6.c
#include #include #include int main(int argc, char **argv) { enum type_number { INTEGER, BINARY, VOID }; typedef enum type_number type_number; struct number { type_number type; union { unsigned int iVal; char bVal[ sizeof(int) ]; } uVal; }; typedef struct number number; number nb; /* expect 2 arguments */ if (argc != 3 ) { printf(“USAGE: %s type number\n”, argv[0]); printf(“where\n\n”); printf(“- type is INTEGER or BINARY\n”); printf(“- number is an integer number\n”); return EXIT_FAILURE; } if ( ! strncmp(argv[1], “INTEGER”, 7) ) { nb.type = INTEGER; nb.uVal.iVal = atoi( argv[2] ); } else if ( ! strncmp(argv[1], “BINARY”, 6) ) { nb.type = BINARY; strncpy(nb.uVal.bVal, argv[2], 32 ); } else { printf(“Type %s unknown\n”, argv[1]); return EXIT_FAILURE; } switch (nb.type) {
case INTEGER: printf(“iVal=%d\n”, nb.uVal.iVal); break; case BINARY: printf(“bVal=%s\n”, nb.uVal.bVal); break; default: printf(“Unknown type\n”); return EXIT_FAILURE; } return EXIT_SUCCESS; } $ gcc -o union_access6 -std=c99 -pedantic union_access6.c $ ./union_access6 BINARY 1010 bVal=1010 $ ./union_access6 INTEGER 123 iVal=123
VI.4.4 Nested unions Nested unions are initialized and accessed as nested structures. The initialization and the access of members of embedded unions follow the same principle as described in section VI.3.6. Here a simple example: $ cat union_nested1.c #include #include int main(void) { enum type_number { INTEGER, FLOAT }; typedef enum type_number type_number; struct number { type_number type; union { unsigned int iVal; float fVal; } uVal; }; typedef struct number number;
number nb1 = { /* init structure */ INTEGER, { /* init embedded union */ 1003 } }; number nb2 = { .type=INTEGER, .uVal={ .iVal=1003 } }; number nb3 = { .type=FLOAT, { .fVal=12.8 } }; printf(“%d %d\n”, nb1.type, nb1.uVal.iVal); printf(“%d %d\n”, nb2.type, nb2.uVal.iVal); printf(“%d %f\n”, nb3.type, nb3.uVal.fVal); return EXIT_SUCCESS; } $ gcc -o union_nested1 -std=c99 -pedantic union_nested1.c $ ./union_nested1 0 1003 0 1003 1 12.800000
VI.4.5 Arrays and unions Arrays can hold elements of union type but practically since unions are embedded in structures, you will most often meet arrays or pointers to structures. For example: $ cat union_array2.c #include #include #include int main(void) {
enum type_number { INTEGER, BINARY, VOID }; typedef enum type_number type_number; struct number { type_number type; union { unsigned int iVal; char bVal[ 32 ]; } uVal; }; typedef struct number number; int i; int nb_elt = 32; /* number of elt in array number_list */ number number_list[ nb_elt ]; number_list[0].type = INTEGER; number_list[0].uVal.iVal = 1003; number_list[1].type = INTEGER; number_list[1].uVal.iVal = 407; number_list[2].type = BINARY; strcpy(number_list[2].uVal.bVal, “10101”); number_list[3].type = VOID; /* Display list of elements in array number_list */ for (i=0; i < nb_elt; i++ ) { if ( number_list[i].type == VOID ) /* End of list */ break; switch (number_list[i].type) { case INTEGER: printf(“iVal=%d\n”, number_list[i].uVal.iVal); break; case BINARY: printf(“bVal=%s\n”, number_list[i].uVal.bVal); break; default: printf(“Unknown type\n”);
return EXIT_FAILURE; } /* End of Switch */ } /* End of for */ return EXIT_SUCCESS; } $ gcc -o union_array1 -std=c99 -pedantic union_array1.c $ ./union_array1 iVal=1003 iVal=407 bVal=10101
VI.4.6 Pointer to unions Unions can be used with pointers in the same way we did with structures. The following example defines a pointer to a union: $ cat union_pointer1.c #include #include int main(void) { typedef union number number; union number { int iVal; double fVal; }; number *p_uNb = malloc( sizeof *p_uNb ); (*p_uNb).iVal = 10; printf(“iVal=%d\n”, (*p_uNb).iVal); return EXIT_SUCCESS; } $ gcc -o union_pointer1 -std=c99 -pedantic union_pointer1.c $ ./union_pointer1 iVal=10
The member-access operator -> we used to access members of structures pointed to by a pointer is also used to access members of a union pointed to by a pointer. Thus, (*p_uNb).iVal can be written p_uNb->iVal. The previous example is then equivalent to:
$ cat union_pointer2.c #include #include int main(void) { typedef union number number; union number { int iVal; double fVal; }; number *p_uNb = malloc( sizeof *p_uNb ); p_uNb->iVal = 10; printf(“iVal=%d\n”, p_uNb->iVal); return EXIT_SUCCESS; } $ gcc -o union_pointer2 -std=c99 -pedantic union_pointer2.c $ ./union_pointer2 iVal=10
VI.4.7 Unions and operators You cannot apply C operators on unions and structures with the exception of the assignment operator and the address operator & and the member-access operators (. and >). Here is an example: $ cat union_op1.c #include #include int main(void) { typedef union number number; union number { int iVal; double fVal; }; number uNb1, uNb2; uNb1.iVal = 10; // access operator
uNb2 = uNb1; // assignment operator printf(“iVal=%d\n”, uNb2.iVal); return EXIT_SUCCESS; } $ gcc -o union_op1 -std=c99 -pedantic union_op1.c $ ./union_op1 iVal=10
As we explained it when we described structures, if a union contains pointers, you have to allocate memory to them, other they are invalid.
VI.4.8 Incomplete union types and forward references All that we said about incomplete structure types and forward references in section VI.3.7 holds true for unions.
VI.4.9 Bit-fields We just have a glance of bit-fields since they are used only by experienced C programmers in very specific circumstances. Bit-fields allow programmers to specify the number of bits of a member in a structure or union as shown below: $ cat bitfields1.c #include #include int main(void) { typedef struct my_time my_time; struct my_time { unsigned int h: 5; /* h in range [0-24] */ unsigned int m: 6; /* m in range [0-60] */ unsigned int s: 6; /* m in range [0-60] */ }; my_time t; /* set time 10:20:18 */ t.h = 10; t.m = 20; t.s = 18; printf(“Time is %d:%d:%d\n”, t.h, t.m, t.s); return EXIT_SUCESS;
} $ gcc -o bitfields1 -std=c99 -pedantic bitfields1.c $ ./bitfields1 Time is 10:20:18
In our example, the member h (meaning hour) can be represented by five bits since it is in the range [0-24]. Five bits can represent a number in the range [0-31]. Likewise, the members m and s (minutes and seconds) can be represented by six bits since they are in the range [0-59]. Six bits can represent a number in the range [0-63]. You can use bit-fields only with member of type int, signed int or unsigned int and you cannot use pointers with bit-fields. Bit-fields might be of great help when doing low-level programming but most of the time, it seems unlikely you work a lot with bit-fields. The following example using a pointer to a bit-field will fail to compile: $ cat bitfields2.c #include #include int main(void) { typedef struct my_time my_time; struct my_time { unsigned int h: 5; /* h in range [0-24] */ unsigned int m: 6; /* m in range [0-60] */ unsigned int s: 6; /* m in range [0-60] */ }; unsigned int *p; my_time t; /* set time 10:20:18 */ t.h = 10; t.m = 20; t.s = 18; p = &(t.h); return EXIT_SUCCESS; } $ gcc -o bitfields2 -std=c99 -pedantic bitfields2.c bitfields2.c: In function ‘main’: bitfields2.c:20:2: error: cannot take address of bit-field ‘h’
The following example is correct: $ cat bitfields3.c #include #include int main(void) { typedef struct my_time my_time; struct my_time { unsigned int h; /* h in range [0-24] */ unsigned int m; /* m in range [0-60] */ unsigned int s; /* m in range [0-60] */ }; unsigned int *p; my_time t; /* set time 10:20:18 */ t.h = 10; t.m = 20; t.s = 18; p = &(t.h); return EXIT_SUCCESS; }
VI.5 Alignments VI.5.1 Structure alignment The compiler aligns correctly the structures. Then, you do not have to worry about it. However, it is interesting to understand how a structure is aligned and how members are organized within a structure. To ease our discussion, we consider computers run with natural alignments: a value is aligned according its type. A structure is an aggregate type grouping a set of objects having their own type and representation, each of which having its own storage. The members are stored in the order they appear within the structure. The first member starts at the address of the structure. The starting address may be subject to alignment constraints depending on the computer. On computers having data alignments constraints, the alignment of each member is properly done by the compiler. Since the storage for each member is allocated in order, to ensure a correct alignment of
each member, padding bytes may be inserted within the structure. As an example, consider the following structure: struct str { char c; int j; }
The member c can be stored at any address while j will have to be stored at an address that is a multiple of its size, say 4 bytes (see Figure VI‑3). To meet this requirement, the compiler adds unused bytes called padding bytes before the member to ensure the right alignment. This is shown by the following example (your computer may display different values): $ cat struct_align1.c #include #include int main(void) { struct str { char c; // 1 byte int j; // 4 bytes }; // the sizeof of the structure may be naively computed as 5 bytes printf( “sizeof(char)=%d\n”, sizeof(char) ); printf( “sizeof(int)=%d\n”, sizeof(int) ); printf( “sizeof(struct str)=%d\n”, sizeof(struct str) ); return EXIT_SUCCESS; } $ gcc -o struct_align1 -std=c99 -pedantic struct_align1.c $ ./struct_align1 sizeof(char)=1 sizeof(int)=4 sizeof(struct str)=8
In the example above, the member j is not correctly aligned. We might think if we swap the members, padding bytes would become useless: struct str { int i; char c; }
In this structure, the member j is properly aligned, yet the size of the structure is still 8 in
our computer as shown the following example: $ cat struct_align2.c #include #include int main(void) { struct str { int j; // 4 bytes char c; // 1 byte }; // the sizeof of the structure may be naively computed as 5 bytes printf( “sizeof(char)=%d\n”, sizeof(char) ); printf( “sizeof(int)=%d\n”, sizeof(int) ); printf( “sizeof(struct str)=%d\n”, sizeof(struct str) ); return EXIT_SUCCESS; } $ gcc -o struct_align2 -std=c99 -pedantic struct_align2.c $ ./struct_align2 sizeof(char)=1 sizeof(int)=4 sizeof(struct str)=8
The compiler inserted three trailing padding bytes. Why? Suppose you declared an array of two structures str: struct str arr[2];
Figure VI‑3 Example of padding bytes inside structures
In summary: o The address of the first member of a structure is the address of the structure o A structure has at least the alignment of the member with the stricter alignment. It interesting to note depending how you declare the members within a structure, the size of a structure varies as shown by the following example (on computer, sizeof(int)=4, sizeof(short)=2): $ cat struct_align3.c #include #include
int main(void) { struct struct1 { char c1; //1 byte + 3 padding bytes int j; // 4 bytes short int c; // 2 bytes + 2 padding bytes }; // Total=12 bytes struct struct2 { char c1; //1 byte + 1 padding byte short int c; // 2 bytes int j; // 4 bytes }; // Total=8 bytes printf( “sizeof(char)=%d\n”, sizeof(char) ); printf( “sizeof(short)=%d\n”, sizeof(short) ); printf( “sizeof(int)=%d\n”, sizeof(int) ); printf( “sizeof(struct struct1)=%d\n”, sizeof(struct struct1) ); printf( “sizeof(struct struct2)=%d\n”, sizeof(struct struct2) ); return EXIT_SUCCESS; } $ gcc -o struct_align3 -std=c99 -pedantic struct_align3.c $ ./struct_align3 sizeof(char)=1 sizeof(short)=2 sizeof(int)=4 sizeof(struct struct1)=12 sizeof(struct struct2)=8
If you do not want the compiler generates internal padding bytes and want to have full control of your structures, you can insert your own padding bytes. Of course, such a program is not portable and depends on the processor architecture on which you intend to run it. For example, struct1 and struct2 could be written as follows (not portable): struct struct1 { char c1; //1 byte char padd1[3]; // 3 bytes int j; // 4 bytes short int c; // 2 bytes char padd2[2]; // 2 bytes }; // Total=12 bytes
struct struct2 { char c1; //1 byte char padd1[1]; // 1 byte short int c; // 2 bytes int j; // 4 bytes }; // Total=8 bytes
The size of a structure is the sum of the sizes of its members plus the padding bytes. If you wish to write portable programs, you do not have to care about the padding bytes.
VI.5.2 Union alignment A union is different from a structure in that a single storage block is allocated for all members. This implies a union has at least the alignment of the member having the stricter alignment constraint and its size is at least the size of the largest member type. Trailing bytes may used for padding to meet the alignment requirements.
Figure VI‑4 Example of padding bytes in unions
Consider the following union: union u { int i; char s[5]; // 5 bytes };
What could be the size of such a union? According to the C standard, it must be large enough to hold the largest member: since in our computer sizeof(int)=4, it must be at least five bytes (the largest type is the array s) but the compiler may computer a larger size because of alignment restrictions. For example, if the type int was 4-byte wide and the computer required the type int to be aligned on 4-byte boundaries, the compiler could add three trailing padding bytes so that the union would be aligned on 4-byte boundaries (the
member i has the stricter alignment constraint). Therefore, the union u could have a size of eight bytes and would be then aligned on 4-byte boundaries (see Figure VI‑4). On our computer, we get this: $ cat union_align.c #include #include int main(void) { union u { int i; char s[5]; // 5 bytes }; printf( “sizeof(int)=%d\n”, sizeof(int) ); printf( “sizeof(union u)=%d\n”, sizeof(union u) ); return EXIT_SUCCESS; } $ gcc -o union_align -std=c99 -pedantic union_align.c $ ./union_align sizeof(int)=4 sizeof(union u)=8
Normally, you do not have worry about the padding bytes within unions if you wish to write portable programs. If is better to let the compiler dealing with the padding bytes.
VI.6 Compatible types The following sections are incomplete. We complete them after describing the scopes of identifiers introduced in Chapter VII Section VII.6.
Remember that two compatible types have the same representation and alignment. No conversion is performed between compatible types.
VI.6.1 Structure and union compatible types
Within a program consisting in a single source file, two structure or union types are incompatible even if they have the same members declared in the same order. In the following example, the structure types struct1 and struct2 are not compatible: $ cat struct_compatible_types1.c #include #include int main(void) { struct struct1 { int k; }; struct struct2 { int k; }; struct struct1 s1; struct struct2 s2; s1 = s2; // invalid. Incompatible types return EXIT_SUCCESS; } $ gcc -o struct_compatible_types1 -std=c99 -pedantic struct_compatible_types1.c struct_compatible_types1.c: In function ‘main’: struct_compatible_types1.c:11:6: error: incompatible types when assigning to type ‘struct struct1’ from type ‘struct struct2
The two unnamed structures (declared with no tag) in the following program are not compatible either for the same reason: $ cat struct_compatible_types2.c #include #include int main(void) { struct { int k; } s1; struct { int k; } s2; s1 = s2; // invalid. Incompatible types return EXIT_SUCCESS; } $ gcc -o struct_compatible_types2 -std=c99 -pedantic struct_compatible_types2.c struct_compatible_types2.c: In function ‘main’: struct_compatible_types2.c:8:6: error: incompatible types when assigning to type ‘struct ’ from type ‘struct ’
VI.6.2 Enumerated types
Within the same source file, two enumeration types are incompatible. Enumeration types are integer types compatible with the integer type used to represent them. The compatible integer type can be char, an unsigned integer type or signed integer type. The compiler is free to choose the right compatible type provided it could represent its members. The compatible integer type is implementation-defined but it does not actually matter since an enumerated type is considered an integer type. Enumerated types are integer types allowing making programs more readable. Keep in mind enumeration constants are of type int but an enumeration type is an integer type that may not be the type int.
Take note unlike structure and unions types, enumerated types cannot be incomplete.
VI.7 Conversions VI.7.1 Structures and unions In C, there is no way to cast a type to a structure or a union type. Conversion rules for structures and unions are those of the simple assignment operator =. An object of type structure or union can be assigned a value having a compatible type. Qualifiers do not matter. #include #include int main(void) { typedef struct struct1 { int k; } struct1; typedef struct struct2 { int k; } struct2; struct1 s1; struct2 s2; const struct1 cs1 = s1; // OK s1 = s2; // invalid. Incompatible types s1 = cs1; // OK. return EXIT_SUCCESS; }
VI.7.2 Enumerated types Since enumerated types are integer types and enumerated constants are type int, conversion rules for arithmetic types apply to enumerated types and enumerated constants (see Chapter II Section II.11 and Chapter III Section III.14). You can work with enumerated types and enumerated constants as with integers. An object of enumerated type can be used as an integer type in expressions. It is unlikely you need to do that, and you should avoid doing it, but nothing prevents someone from assigning a value of enumeration type to a variable of another enumeration type since both are arithmetic types. This denotes a poor programming style: enum shape { CIRCLE=0, RECTANGLE=4, TRIANGLE=3 } s1, s2; enum myBool { FALSE=0, TRUE=1 } b1, b2; b1 = TRUE; s1 = b1; s2 = FALSE; b2 = TRIANGLE;
Take note that enumerated constants are of type int while enumerated types can be represented by char, a signed integer or an unsigned integer. The compiler is free to choose how an enumerated type is actually represented. This implies assigning an integer to a variable of enumerated type may lead to a behavior that you do not expect. Suppose you declare an enumeration as follows: enum myBool {FALSE=0, TRUE=1};
The compiler might choose to represent such an enumeration as char. If you assign an integer value that cannot be represented by char, you will not get the expected result: enum myBool s = 12345;
If you wish to write a portable program, the integer value to assign should be ranging from 0 to SCHAR_MAX or from the minimum enumeration constant to the maximum enumeration constant. However, it is better to assign a variable of enumerated type only one of the enumerated constants of the enumeration or a variable of the same type. Take note that the compiler may choose different integer types to represent different enumeration types. The C standard permits the compiler to choose the right integer type (char, signed integer or unsigned integer) for each enumeration type independently from each other. However, generally, enumeration types are represented by int.
VI.8 Exercises Exercise 1. Correct the following code:
#include #include int main(void) { typedef struct student student; struct student { char first_name[64]; char last_name[64]; int age; }; student st1; st1.first_name = “Christine”; st1.last_name = “Sun”; st1.age = 35; printf(“First Name: %s\n”, st1.first_name); printf(“Last Name: %s\n”, st1.last_name); printf(“Age: %d\n”, st1.age); return EXIT_SUCCESS; }
Exercise 2. Explain why the first program is wrong while the second one is correct $ cat exercise2_1.c #include #include #include #define DEFAULT_ARRAY_LEN 10 struct array_int { int *a; size_t nb_elt; size_t len; }; int main(void) {
struct array_int a1, a2; a1.a = calloc(DEFAULT_ARRAY_LEN, sizeof *a1.a); a2.a = calloc(DEFAULT_ARRAY_LEN, sizeof *a2.a); printf(“a1.a=%p a2.a=%p\n”, a1.a, a2.a); a1.a[0] = 1; a1.a[1] = 2; a1.len=DEFAULT_ARRAY_LEN; a1.nb_elt = 2; memcpy(&a2, &a1, sizeof a1); printf(“a2.a[0]=%d a2.a[1]=%d a2.len=%d a2.nb_elt=%d\n”, a2.a[0], a2.a[1], a2.len, a2.nb_elt ); printf(“a1.a=%p a2.a=%p\n”, a1.a, a2.a); return EXIT_SUCCESS; }
$ cat exercise2_2.c #include #include #include #define DEFAULT_ARRAY_LEN 10 struct array_int { int a[20]; size_t nb_elt; size_t len; }; int main(void) { struct array_int a1, a2; printf(“a1.a=%p a2.a=%p\n”, a1.a, a2.a); a1.a[0] = 1;
a1.a[1] = 2; a1.len=DEFAULT_ARRAY_LEN; a1.nb_elt = 2; memcpy(&a2, &a1, sizeof a1); printf(“a2.a[0]=%d a2.a[1]=%d\n a2.len=%d a2.nb_elt=%d\n”, a2.a[0], a2.a[1], a2.len, a2.nb_elt ); printf(“a1.a=%p a2.a=%p\n”, a1.a, a2.a); return EXIT_SUCCESS; }
Exercise 2. Write a program implementing a stack data structure in wish we push the numbers from 1 to 10 and then from which those numbers are extracted and printed in the reversed order. Exercise 3. Write a program implementing a generic array in which we put the number 3.14 of type float, the number of type int, and the character ‘A’ of type char. Exercise 4. Write a program that prompts the user to provide 3 values and their type (allowed types float, int and char) and stores them. Then, once the user has typed the string quit, the program displays the values with their type. Exercise 5. Write a program that prompts the user to type any number of values and their type (allowed types float, int and char) and stores them. Then, once the user has typed the string quit, the program displays the values with their type. Exercise 6. Write a program that shows the alignment of types int, long, and double. Exercise 7. Using a union, write a program that displays the internal representation of the number 5 of type int. Exercise 8. Consider the following structure struct my_string{ int len; char s[]; };
o What is the size of the structure? o Write a piece of code that stores the string “Hello!” into str1, an object of type my_string. o Write a piece of code that copies the object str1 into another object of type my_string called str2. Exercise 9. Explain why the following program is not correct: #include #include int main(void) { struct rate { float f; }; struct currency { float f; }; struct rate r = { 1.2} ; struct currency c; c = r; return EXIT_SUCCESS; }
Exercise 10. Write a piece of code implementing a data structure that would store a list of strings. The number of strings is unknown at runtime.
CHAPTER VII FUNCTIONS
VII.1 Introduction Amongst good programming practices, readability and maintenance are part of the most important for programmers. Could you image debugging your own program of thousands lines embedded in the main() function months later after writing it? Imagine the time spent for testing it fully… For this reason, programmers split their code into several subprograms called functions in the C language (also known as routines or subroutines in computing science), each performing a specific task. The underlying idea is to have several independent pieces of code that can be tested and debugged separately. As long as a routine produces the same effect, the way it performs it does not matter. For example, you can even change completely an algorithm within a routine without having any impact on your program provided its output and input remain the same. In addition to ease maintenance and readability, functions can be reused as many times as you wish. For example, you could write a function that calculates the average value of a list of numbers. Instead of writing the same piece of code several times, you will just have to invoke the function with the list of numbers as arguments, and it will return the average value. This will save you a great deal of time and avoid introducing errors. Before programmers start writing a program, they first think the way they will split it. In the same way as a book is broken into chapters and sections, a program is divided into one or more parts known as modules, and modules are split into functions. Modules will be described in the next chapter: they can be compared to a chapter of a book. Functions can be compared to sections. A function is a set of statements indentified by a name performing a specific task. A function identifier is composed of letters, digits and underscores, starting with a letter or an underscore. There are two kinds of functions: functions provided by C libraries and functions defined by users. In the chapter, you will learn how to create and use your own functions.
In the chapter, we will also go into details about declarations, definitions, variable scopes, storage durations and initializations of identifiers. We refine several features of the C language we studied in previous chapters.
VII.2 Definition Before a function can be called, it must be defined somewhere. Defining a function means providing a declaration and the code corresponding to the tasks to perform. A function cannot be defined within another function. Let us start with a simple example. In the following example, the function add() adds two given numbers and returns the resulting value: double add(double a, double b) { return a+b; }
The definition of a function is composed of two parts: o The declaration consists in: ▪ Return type: at the leftmost side lies the return type that represents the type of the
value that the function returns. In the example above, the return type is double. ▪ The identifier of the function. In our example, the function is named add. ▪ The parameters of the function. In our example, the parameters are a and b of type double.
o The body of the function. It comprises a set of statements, between braces, defining the tasks to perform. More generally, a function is defined as follows (C standard style): type_ret function_name(type1 arg1, type2 arg2,…, typeN argN) { statement1; … statementN; }
A declaration of a function describes the types of its parameters and its return type. The definition of a function consists in its declaration and its body. If a function specifies a return type, it should return a value of that type with the return statement. A function may have several return statements as in the following example: int compare_string(char *s1, char *s2) { if ( s1 == NULL || s2 == NULL )
return 0; if ( ! strcmp(s1, s2) ) { /* s1 and s2 holds the same string */ return 1; } else { /* s1 and s2 holds different strings */ return 0; } }
The function compare_string() returns 1 if the given strings are the same and 0 otherwise. A function that has no parameter is defined as follows: type_ret function_name(void) { statement1; … statementN; }
The void parameter means the function takes no parameter as in the example below. int print_starting_header(void) { printf(“=====================================\n”); printf(“========STARTING OF PROGRAM==========\n”); printf(“=====================================\n”); return 1; }
A function that returns nothing, called a procedure in other programming languages, is defined as follows: void function_name(type1 arg1, type2 arg2,…, typeN argN) { statement1; … statementN; }
The keyword void in place of the return type means the function returns nothing. Here is an example void print_header(char *header) { if ( ! header ) /* if pointer is NULL */ return;
printf(“=====================================\n”); printf(“========%s==========\n”, header); printf(“=====================================\n”); }
When a function returns nothing, the return statement with no argument can be used to give back the control to the caller (return to the point it was called).
VII.3 Function calls Though programmers often use indifferently the words arguments and parameters as synonyms, as we also do it sometimes, it is worth noting those words have not exactly the same meaning according to the C standard. So far, we did not make clear distinction. Now, we will do it. A parameter (or formal parameter) is an object declared in the declaration of the function while an argument (or actual argument) is a value (or an expression) passed to a function when called.
Figure VII‑1 Function call
Let us consider our function add(): double add(double a, double b) { return a+b; }
The variables a and b are parameters of the function. When we call the function, we pass real values as below: x = add(5, 8);
Above, the values 5 and 8 are arguments of the function. The parameter a will take the first
argument of value 5 and the parameter b will be assigned the second argument of value 8. The parameters work as any object declared within the function. The function performs its expected tasks and returns to the caller with a value specified by the return statement (see Figure VII‑1). In summary, parameters are assigned the arguments passed to the function. Arguments can be literals, variables and more generally expressions: y = 9; x = add(5*2, 8-y);
The expressions are first evaluated before being passed to the function but the order the evaluation is implementation-defined. Once a function has been defined, you can call it to perform the expected tasks as in the following example: $ cat function_call1.c #include #include /* NAME: add() DESCRIPTION: add two input numbers PARAMETERS: - double a - double b RETURN: the resulting value of the addition of the input numbers. */ double add(double a, double b) { return a+b; } int main(void) { float x = 10; float y = 2.1; double z = add( x, y ); printf(“%f + %f = %f\n”, x, y, z); return EXIT_SUCCESS; } $ gcc -o function_call1 -std=c99 -pedantic function_call1.c $ ./function_call1
10.000000 + 2.100000 = 12.100000
In the example function_call1.c, the add() function is invoked with the arguments x and y: add(x, y). Before executing the function, the variables x and y are first evaluated: they are replaced by their value. Then, the function add() returns its value that is assigned to the z variable. In the following example, we call the function compare_string() that takes two strings and compares them. If they are identical, it returns 1. Otherwise, it returns 0. $ cat function_call2.c #include #include #include /* NAME: compare_string() DESCRIPTION: tells if two strings are identical or not PARAMETERS: - char *s1: input string - char *s2: input string RETURN: 0 if s1 and s1 are different and 1 otherwise. */ int compare_string(char *s1, char *s2) { if ( s1 == NULL || s2 == NULL ) return 0; if (! strcmp(s1, s2) ) { /* s1 and s2 holds the same string */ return 1; } else { /* s1 and s2 holds different strings */ return 0; } } int main(void) { char *msg[] = {“different”, “same”}; char s1[] = “OK”; char s2[] = “OK”; int cmp1 = compare_string(s1, s2); char s3[] = “OK”; char s4[] = “KO”; int cmp2 = compare_string(s3, s4);
printf(“%s and %s are %s\n”, s1, s2, msg[ cmp1 ] ); printf(“%s and %s are %s\n”, s3, s4, msg[ cmp2 ] ) ; return EXIT_SUCCESS; } $ gcc -o function_call2 -std=c99 -pedantic function_call2.c $ ./function_call2 OK and OK are same OK and KO are different
In the following example, we call the functions print_header() and add(): $ cat function_call3.c #include #include /* NAME: add() DESCRIPTION: add two input numbers PARAMETERS: - double a - double b RETURN: the resulting value of the addition of the input numbers. */ double add(double const a, double const b) { return a+b; } /* NAME: printf_header() DESCRIPTION: display a banner containing the passed string PARAMETERS: - char *header RETURN: None */ void print_header(char *header) { if ( ! header ) /* if pointer is NULL */ return; printf(“======================================\n”); printf(“========%s==========\n”, header);
printf(“======================================\n”); } int main(void) { float x = 10; float y = 2.1; double z = add( x, y ); print_header(“BEGINNING OF PROGRAM”); printf(“%f + %f = %f\n”, x, y, z); return EXIT_SUCCESS; } $ gcc -o function_call3 -std=c99 -pedantic function_call3.c $ ./function_call3 ====================================== ========BEGINNING OF PROGRAM========== ====================================== 10.000000 + 2.100000 = 12.100000
VII.4 Return statement, part1 The return statement leaves the function that contains it and returns to the caller. The return statement takes an argument if the function returns a value. Below, the program function_return1.c takes two strings as arguments and compares them using the function compare_string(): $ cat function_return1.c #include #include #include /* NAME: compare_string() DESCRIPTION: tells if two strings are identical or not PARAMETERS: - char *s1: input string - char *s2: input string RETURN: 0 if s1 and s1 are different and 1 otherwise. */ int compare_string(char *s1, char *s2) { if ( s1 == NULL || s2 == NULL ) return 0;
if ( ! strcmp(s1, s2) ) { /* s1 and s2 holds the same string */ return 1; } else { /* s1 and s2 holds different strings */ return 0; } } int main(int argc, char **argv) { char *s1, *s2; if ( argc != 3 ) { printf(“USAGE: %s string1 string2\n”, argv[0]); return EXIT_FAILURE; } s1 = argv[1]; s2 = argv[2]; switch ( compare_string(s1, s2) ) { case 0: printf(“%s != %s\n”, s1, s2 ); break; case 1: printf(“%s = %s\n”, s1, s2 ); } return EXIT_SUCCESS; } $ gcc -o function_return1 -std=c99 -pedantic function_return1.c $ ./function_return1 HELLO hello HELLO != hello $ ./function_return1 hello hello hello = hello
Within the function compare_string(), we called three times the return statement with an argument depending on the case. In some cases, the return statement takes no argument. This occurs when the function returns nothing (void) and you want control to return to the caller before reaching the end of the function: in the example below, the function print_header() invokes return with no
value if the passed argument is a null pointer. void print_header(char *header) { if ( ! header ) /* if pointer is NULL */ return; printf(“=====================================\n”); printf(“========%s==========\n”, header); printf(“=====================================\n”); }
If a function is declared returning void, you may not invoke the return statement at all: when the end of the function body is reached (specified by the right brace }), control automatically returns to the caller. In the example above, if the parameter header is not a null pointer, a banner is printed, the function terminates (with no return statement) and control is given back to the caller as if the return statement was called. If the argument of the return statement is an expression, it is evaluated before the resulting value is finally returned. In the following example, the expression a % 2 is evaluated to a value that will then be returned. int is_even(int a) { return a % 2; }
A return statement can return arithmetic types, pointers, structures, union, and enumerations but it cannot return an array. The following example duplicates a passed string and returns a pointer to the allocated memory chunk holding the duplicated string: $ cat function_return2.c #include #include #include /* NAME: duplicate_string() DESCRIPTION: allocate memory and copy the passed string into it PARAMETERS: - char *s: input string to duplicate RETURN: the pointer to the memory block holding a copy of the passed string */ char *duplicate_string(char *s) { char *duplicate_s; int len;
if (s == NULL) return NULL; len = strlen ( s ); duplicate_s = malloc (len + 1); if ( duplicate_s != NULL ) strcpy( duplicate_s, s); return duplicate_s; } int main(void) { char *s = “Duplicate String”; char *dup_s = duplicate_string( s ); if ( dup_s != NULL ) printf(“dup_s=%s\n”, dup_s); else printf(“dup_s=NULL\n”); free(dup_s); return EXIT_SUCCESS; } $ gcc -o function_return2 -std=c99 -pedantic function_return2.c $ ./function_return2 dup_s=Duplicate String
Of course, as malloc() has been invoked, the free() function will be called somewhere to free the memory allocated by the function duplicate_string(). What happens if we return a value that has a type different from the return type? The return value is just implicitly converted to the return type as it would be done in a simple assignment operation. $ cat function_return3.c #include #include #include int ret_int(double a) {
return a; } int main(void) { double val = 3.14159; printf(“return value=%d\n”, ret_int(val) ); } $ gcc -o function_return3 -std=c99 -pedantic function_return3.c $ ./function_return3 return value=3
VII.5 Function declarations You may ask yourself what could be the use of a declaration. Before answering the question, we first need to give some definitions: declaration, prototype, and definition. As of C99, before calling a function, you must declare it through either a simple declaration or a definition: a declaration must have been done before the call to the function. A declaration is a way to specify the type bound to a given name. For example, int x tells the compiler we will use the name x as a variable of type int. Similarly, declaring a function means we tell the compiler we want to identify a function with a specific name: int is_even(int a) indicates the compiler the name add is bound to a function. In C standard, when a declaration is part of a definition, the names of the parameters and their types must be specified: double add(double a, double b) { return a + b; }
In C standard, if a function declaration is not part of a definition, declaring the types of the parameters (the names of the parameters are optional in this case) is sufficient. The following simple declarations are allowed and equivalent: double add(double a, double b); double add(double, double);
In the K&R style, the old C style, still permitted by the C standard, though obsolete, you can declare a function without specifying the type of its parameters (i.e. type signature). In K&R style, when a declaration is part of a definition, the names of the parameters are specified without their type. The old C style would define a function like this: type_ret function_name(arg1, arg2,…, argN)
type1 arg1; type2 arg2; …; typeN argN; { statement1; … statementN; }
For example: double add(double a, double b) double a; double b; { return a + b; }
The types appear in the code of the function not in the declaration. This kind of definition should be avoided and we will explain why. In K&R style (old C style, also known as pre-ANSI C), if a declaration is not part of a definition, the parameter types are omitted as follows: return_type function_name();
For example, the function add() is declared like this in K&R style: double add();
There is no information about the parameters. This kind of declaration should be avoided. You may see it in old C programs. The prototype of a function is a declaration completed with the types of the parameters it accepts. For example, int add(double a, double b) is a prototype: it tells the compiler the name add identifies a function that takes two parameters of type double. In C standard style, a declaration is a prototype. In K&R style, a declaration is not a prototype. A definition of a function comprises a declaration and the code of the function. It provides the statements that will be executed when the function will be called. Before the inception of the C standard, there were no function prototypes at all. As of
ANSI C (C89/C90), functions prototypes were introduced but function prototypes and even declarations were not required (though recommended). As of C99, functions must be declared, preferably as prototypes but this not required, before being used. As of C99, if you do not declare a function and try to call it, the compile will generate an error. Here are some examples of declarations, definitions and prototypes: double add(); /* declaration K&R style*/ double mult(double, double); /* prototype */ double mult(double a, double b); /* prototype */ void printf(); /* declaration K&R style */ int is_even(int a) { /* definition with prototype */ return a % 2; } int is_even()/* definition with declaration in K&R style */ int a ; { return a % 2; }
Unless otherwise stated throughout the book, we will use the word function declaration as synonym for function prototype or just prototype. We will not use the K&R function declaration style that is obsolete. Now, you have understood the difference between prototype, declaration and definition, we can explain why declarations are important. One of most useful features of the C language is its modularity. As we will find out in the next chapter, you can split you program into several source files and create your our set of functions that will be able to be used by other programs. You can also use functions written by other programmers. To call them you just need their binaries containing the code of the functions and header files holding their declarations. Suppose you had written a set of functions, and built a library from the compiled binaries (object files). A library is just a set of binary modules (known as object files) containing the code of the provided functions (we will learn to do it in Chapter XIII). Since the functions are packaged as binaries, programmers and compilers have no access to their definitions, how could the compiler and programmers check the arguments passed to the
functions and their return value? You have understood that declarations are used by the compiler to allow calling them properly. For example, if the function add() was defined outside your program, you would have had to provide in your program the declaration of the function: double add(double a, double b);
Generally, the declarations of functions are placed in a text file called a header file such as stdio.h
[49]
as we will explain it in the next chapter.
So far, we have considered we have a program composed of a single file (source file) holding the complete C code, and our source files were organized like this: #include #include …function1(…) { … } … int main(…) { … }
Thus, our program was split into three sections: o “include section” that includes header files o “function section” that defines functions o “main section” containing the main() function What happens if our “function section” is placed after the definition of the main() function? In other words, if we define our functions after they are actually called, does it work? We have already answered to the question…Here is an example clarifying the answer: $ cat function_decl1.c #include #include int main(void) { float x = 10;
float y = 2.1; double z = add( x, y ); printf(“%f + %f = %f\n”, x, y, z); return EXIT_SUCCESS; } double add(double a, double b) { return a+b; } $ gcc -o function_decl1 -std=c99 -pedantic function_decl1.c function_decl1.c: In function ‘main’: function_decl1.c:7:4: warning: implicit declaration of function ‘add’ function_decl1.c: At top level: function_decl1.c:13:8: error: conflicting types for ‘add’ function_decl1.c:7:15: note: previous implicit declaration of ‘add’ was here
The call to the function add() occurs before the declaration of the function. That is why the compiler complained. To correct it, we can place the definition of the add() function (that is also a declaration) before the main() function (as we did in example function1.c) or we could also give the declaration of the function before it is called as in the following example: $ cat function_decl2.c #include #include double add(double a, double b); int main(void) { float x = 10; float y = 2.1; double z = add( x, y ); printf(“%f + %f = %f\n”, x, y, z); return EXIT_SUCCESS; } double add(double a, double b) { return a+b; } $ gcc -o function_decl2 -std=c99 -pedantic function_decl2.c $ ./function_decl2 10.000000 + 2.100000 = 12.100000
When a declaration is not part of the definition of a function, you may omit the parameter names: $ cat function_decl3.c #include #include double add(double, double); int main(void) { float x = 10; float y = 2.1; double z = add( x, y ); printf(“%f + %f = %f\n”, x, y, z); return EXIT_SUCCESS; } double add(double a, double b) { return a+b; } $ gcc -o function_decl3 -std=c99 -pedantic function_decl3.c $ ./function_decl3 10.000000 + 2.100000 = 12.100000
The parameter types in the declaration are used to check the arguments and perform the appropriate conversions (explained later in the chapter) if an argument has a type different from the type of the corresponding parameter. If an argument cannot be converted implicitly, an error is displayed as shown below: $ cat function_decl4.c #include #include double add(double, double); int main(void) { float x = 10; float y = 2.1; double z = add( &x, y ); printf(“%f + %f = %f\n”, x, y, z); return EXIT_SUCCESS; }
double add(double a, double b) { return a+b; } $ gcc -o function_decl4 -std=c99 -pedantic function_decl4.c function_decl4.c: In function ‘main’: function_decl4.c:9:4: error: incompatible type for argument 1 of ‘add’ function_decl4.c:4:8: note: expected ‘double’ but argument is of type ‘float *’
The argument &x is a pointer to float and then cannot be converted to double. In the same way, if we move the “include section” after the main() function, we have the same error: $ cat function_decl5.c double add(double a, double b) { return a+b; } int main(void) { float x = 10; float y = 2.1; double z = add( x, y ); printf(“%f + %f = %f\n”, x, y, z); return EXIT_SUCCESS; } #include #include $ gcc -o function_decl5 -std=c99 -pedantic function_decl5.c function_decl5.c: In function ‘main’: function_decl5.c:10:4: warning: implicit declaration of function ‘printf’ function_decl5.c:10:4: warning: incompatible implicit declaration of built-in function ‘printf’ function_decl5.c:11:11: error: ‘EXIT_SUCCESS’ undeclared (first use in this function) function_decl5.c:11:11: note: each undeclared identifier is reported only once for each function it appears in
The compiler complained for two reasons: o The printf() function, declared in the header file stdio.h, was not declared before being used o The EXIT_SUCCESS macro, declared in the header file stdlib.h, was not declared before being used
If we move the inclusion of the header files just before the main() function, it works again: $ cat function_decl6.c double add(double a, double b) { return a+b; } #include #include int main(void) { float x = 10; float y = 2.1; double z = add( x, y ); printf(“%f + %f = %f\n”, x, y, z); return EXIT_SUCCESS; } $ gcc -o function_decl6 -std=c99 -pedantic function_decl6.c $ ./function_decl6 10.000000 + 2.100000 = 12.100000
Traditionally, the inclusions of header files are placed at the beginning of the source file allowing functions within the source file to call the functions declared in header files. Historically, before the inception of the C standard, function declarations could appear with an empty parameter list (K&R style) or even omitted. Though the compilers still accept this obsolescent feature, you should never use it because this prevents the compiler to do its job correctly. In the C standard style, the declarations of functions specify the types of the parameters or the keyword void if the function takes no parameter. In the original C style, known as K&R style (Kernighan & Ritchie style), we could declare a function like this: return_type function_name();
Let us show why you should not use the old style. Let us start with K&R declarations as in the example below: $ cat old_style1.c #include #include double add(); /* K&R style declaration */
int main(void) { double x = 10; double y = 2; double z = add( x, y ); printf(“%f + %f = %f\n”, x, y, z); return EXIT_SUCCESS; } double add(double a, double b) { return a+b; } $ gcc -o old_style1 -std=c99 -pedantic old_style1.c $ ./old_style1 10.000000 + 2.000000 = 12.000000
It works but now try this one: $ cat old_style2.c #include #include double add(); /* K&R style declaration */ int main(void) { int x = 10; int y = 2; double z = add( x, y ); printf(“%d + %d = %f\n”, x, y, z); return EXIT_SUCCESS; } double add(double a, double b) { return a+b; } $ gcc -o old_style2 -std=c99 -pedantic old_style2.c $ ./old_style2 10 + 2 = -2124375231618922398463637855521183204518847099
No comment. It does not yield the expected result because the declaration is not a prototype and then the compiler cannot check the arguments and convert them if required. In our example, the arguments of type int are passed to the function without converting
them to type double. The following example shows it more explicitly: $ cat old_style3.c #include #include double display_arg(); /* K&R style declaration */ int main(void) { int x = 20; printf(“call display_arg(%d)\n”, x); display_arg( x ); return EXIT_SUCCESS; } double display_arg(double a) { printf(“passed argument = %f\n”, a); } $ gcc -o old_style3 -std=c99 -pedantic old_style3.c $ ./old_style3 call display_arg(20) passed argument = 0.000000
Therefore, the K&R declaration does not allow the compiler to convert the arguments if required. The following example shows you can even pass any number of arguments! $ cat old_style4.c #include #include double add(); /* K&R style declaration */ int main(void) { double x = 10; double y = 2; double z = add( x ); printf(“%d + %d = %f\n”, x, y, z); return EXIT_SUCCESS; }
double add(double a, double b) { return a+b; } $ gcc -o old_style4 -std=c99 -pedantic old_style4.c $ ./old_style4 0 + 1076101120 = 2.000000
Now, the turn of the K&R definition. The definition of the old style looks like the definition of the C standard syntax but they behave differently. Try this: $ cat old_style5.c #include #include /* K&R style declaration */ double add(a, b) double a; double b; { return a+b; } int main(void) { double x = 10; double y = 2; double z = add( x, y ); printf(“%f + %f = %f\n”, x, y, z); return EXIT_SUCCESS; } $ gcc -o old_style5 -std=c99 -pedantic old_style5.c $ ./old_style5 10.000000 + 2.000000 = 12.000000
The arguments are of the same type as that of the parameters. So, all is fine but if you pass other types: $ cat old_style6.c #include #include /* K&R style declaration */ double add(a, b) double a; double b;
{ return a+b; } int main(void) { int x = 10; int y = 2; double z = add( x, y ); printf(“%d + %d = %f\n”, x, y, z); return EXIT_SUCCESS; } $ gcc -o old_style6 -std=c99 -pedantic old_style6.c $ ./old_style6 10 + 2 = -21243752316189223984636378555211832045188470999510
The arguments are not converted to the corresponding types of the parameters, which yields erroneous output.
VII.5.1 Name spaces There are four different name spaces for identifiers: o Identifiers for functions, macros, objects, user-defined types (typedef) and enumeration constants o Labels (used by the goto statement) o Identifiers for members of structures, unions, and enumerations, o Tags for structures, unions and enumerations There will be no collision if two or more identical identifiers pertain to different name spaces. In the following example, the identifier s refers to elements in different name spaces: $ cat name_space1.c #include #include int main(void) { char *s = “Hello”; /* identifier s for object */ struct s { /* identifier s is a tag */ int s[10]; /* identifier s for structure member */ };
return EXIT_SUCCESS; }
In the following example, the identifier string refers to an object, a structure and a member of a structure: $ cat name_space2.c #include #include int main(void) { struct string { /* identifier s is a tag */ char string[255]; /* identifier of structure member */ } string; /* identifier of an object */ return EXIT_SUCCESS; }
VII.6 Scope of identifiers VII.6.1 Definition There is an important point, that we will complete in the next chapter, we are going to talk about here. It is the scope of identifiers. An identifier is a symbol composed of alphanumeric characters that represent a function, an object (variable), a typedef type, a union, a structure, an enumeration, a macro, a label (used by the statement goto) or a member of a structure, union or enumeration type. Natural questions that arise are: o “Is an identifier accessible everywhere in the program?” o “Could we hide an identifier?” o “Are identifiers within a function visible outside the function?” o “What is the lifetime of an identifier?” o And so on. An identifier is said to be visible if it is accessible. The scope of an identifier (also known as a lexical scope) is the portion of code where it is visible. There are four kinds of scopes: file scope, function scope, block scope, and function prototype scope.
VII.6.2 Prototype scope Parameters declared within a prototype of a function (that is not part of a definition) are visible only within the declaration. Within a function prototype, identifiers are unique. Otherwise, an error is generated at compilation time as in the following example: double f(double a, int a);
The following is valid. The parameters a and b have function prototype scope: double add(double a, double b);
VII.6.3 Function scope Only labels (used by the goto statement) have function scope. They can be used anywhere within a function, and unlike other identifiers, they cannot be hidden. That is, within a function, a label is unique and then you cannot use another label with the same name even within another block. The following example, using two labels of the same name, is not correct: $ cat function_scope1.c #include #include int main(void) { int max = 10; int i; for (i=0; i < 10; i++) { if ( i == 3 ) goto MSG; printf(“%d “, i); MSG: printf(“goto label MSG. i=%d\n”, i); } MSG: printf(“Goto label MSG. End of Program\n”); return EXIT_SUCCESS; } $ gcc -o function_scope1 -std=c99 -pedantic function_scope1.c function_scope1.c: In function ‘main’: function_scope1.c:16:4: error: duplicate label ‘MSG’ function_scope1.c:12:7: note: previous definition of ‘MSG’ was here
VII.6.4 Block scope An identifier declared within a block has block scope. It is visible within the block in which it is declared. It is often known as a local identifier in programming languages. We remind that a block starts with a left brace ({) and terminates with the corresponding right brace (}). In the following example, the variable j has block scope since it is declared in [50] the body of the main() function . $ cat block_scope1.c #include #include int main(void) { int j = 500; printf(“j=%d\n”, j); return EXIT_SUCCESS; } $ gcc -o block_scope1 -std=c99 -pedantic block_scope1.c $ ./block_scope1 j=500
In the example below, the variable j is declared in two different blocks. The variable j in the if block hides the variable j declared in the block enclosing it (body of the main() function): $ cat block_scope2.c #include #include int main(void) { int j = 500; int cond = 1; if ( cond ) { int j = 10; printf(“IF BODY: j=%d\n”, j); } printf(“main() BODY: j=%d\n”, j);
return EXIT_SUCCESS; } $ gcc -o block_scope2 -std=c99 -pedantic block_scope2.c $ ./block_scope2 IF BODY: j=10 main() BODY: j=500
This example shows that an identifier or a user-defined type declared within a block (block scope) hides the other declarations in the file, or in blocks they encloses it. Within the same block, there can be only a unique identifier. The following example is wrong: $ cat block_scope3.c #include #include int main(void) { int j = 500; float j = 1.9; return EXIT_SUCCESS; } $ gcc -o block_scope3 -std=c99 -pedantic block_scope3.c block_scope3.c: In function ‘main’: block_scope3.c:6:10: error: conflicting types for ‘j’ block_scope3.c:5:8: note: previous definition of ‘j’ was here
In the following example, the variable s and j are declared in the function f() and main() but they do not reference the same object since they are declared in different blocks (body of function f() and body of function main()): $ cat block_scope4.c #include #include void f(void) { char *s = “function f()”; int j = 10; printf(“s=%s, j=%d\n”, s, j); }
int main(void) { f(); char *s = “function main()”; int j = 500; printf(“s=%s, j=%d\n”, s, j); return EXIT_SUCCESS; } $ gcc -o block_scope4 -std=c99 -pedantic block_scope4.c $ ./block_scope4 s=function f(), j=10 s=function main(), j=500
An identifier declared within a function is visible only in the body of the function in which it is declared (block scope). The parameters of a function are visible in the body of the function as if they were declared in it: they have block scope as shown below. $ cat block_scope5.c #include #include void f(int j) { int cond = 1; if ( cond ) { int j = 10; printf(“IF BODY: j=%d\n”, j); } printf(“f() BODY: j=%d\n”, j); } int main(void) { f(500); return EXIT_SUCCESS; } $ gcc -o block_scope5 -std=c99 -pedantic block_scope5.c $ ./block_scope5 IF BODY: j=10
f() BODY: j=500
In the example above, the variable j in the if body hides the parameter j. As soon as the if statement terminates, the parameter j is no longer hidden. The same rule applies to user-defined types. User-defined types defined within a block are visible only within the block in which they are declared (block scope): $ cat block_scope6.c #include #include void display_parity(int j) { typedef enum { EVEN = 0, ODD = 1 } parity; parity remainder; int x = 10; remainder = x % 2; if ( remainder == EVEN ) printf(“%d is even\n”, x); else if ( remainder == ODD ) printf(“%d is odd\n”, x); } int main(void) { display_parity(10); return EXIT_SUCCESS; } $ gcc -o block_scope6 -std=c99 -pedantic block_scope6.c $ ./block_scope6 10 is even
In the example above, the enumeration type parity is visible only within the body of the function display_parity().
VII.6.5 File scope An identifier declared outside a function has file scope. It is visible anywhere within the file in which it is declared except within a block in which there is another declaration of the identifier (it is hidden). Such an identifier is also said to be external (sometimes called global). Throughout the book, we will use the adjective global as a synonym for external [51] meaning having a file scope .
A function cannot be declared within another function and then has always file scope. The identifier of a function (its name) is accessible everywhere in the file in which it is declared (it has file scope). Since a function identifier is always external, it cannot be hidden. In the following example, the function f() and g() are accessible by any function in the file file_scope1.c: $ cat file_scope1.c #include #include void f(void) { printf(“function f() called\n”); } void g(void) { f(); } int main(void) { g(); f(); return EXIT_SUCCESS; } $ gcc -o file_scope1 -std=c99 -pedantic file_scope1.c $ ./file_scope1 function f() called function f() called
An object can also have file scope: it is visible within the body of any function of the file in which it is declared. Such an object is declared outside functions. For this reason, such an object is often qualified external. In the following example, the variable j and the array s have file scope: $ cat file_scope2.c #include #include char *s = “global object”; int j = 500; void f(void) { printf(“s=%s, j=%d\n”, s, j); }
int main(void) { printf(“s=%s, j=%d\n”, s, j); return EXIT_SUCCESS; } $ gcc -o file_scope2 -std=c99 -pedantic file_scope2.c $ ./file_scope2 s=function main(), j=500 s=function main(), j=500
In the following example, the identifiers s and j have both file scope (global) and block scope (local) since they are also declared in the f() function (block scope) and in the main() function (block scope): $ cat block_scope3.c #include #include /* variables with file scope */ char *s = “global object”; int j = 500; void f(void) { char *s = “block f()”; int j = 10; printf(“s and j are local: s=%s, j=%d\n”, s, j); } void g(void) { printf(“s and j are global: s=%s, j=%d\n”, s, j); } int main(void) { char *s = “block main()”; int j = 20; f(); g(); printf(“s and j are local: s=%s, j=%d\n”, s, j);
return EXIT_SUCCESS; } $ gcc -o file_scope3 -std=c99 -pedantic file_scope3.c $ ./file_scope3 s and j are local: s=block f(), j=10 s and j are global: s=global object, j=500 s and j are local: s=block main(), j=20
Local objects (block scope) hide global objects (file scope). The array s and the variable j of the function f() hide the array s and the variable j having the file scope. In the same way, the array s and the variable j in the main() function hide the array s and the variable j having the file scope. A global user-defined type (external) visible by any function within a source file (file scope) is declared outside functions. In the following example, the structure string is visible by all the functions of the source file file_scope4.c: $ cat file_scope4.c #include #include #include /* Global structure string */ struct string { char *s; int len; }; typedef struct string string; /* create a structure string from a string passed as argument */ string create_string (char *s) { string ret_s = { NULL, 0 }; int len = 0; if ( s == NULL ) return ret_s; len = strlen(s); ret_s.s = malloc( len + 1 ); if (ret_s.s == NULL ) {
printf(“Cannot allocate memory\n”); return ret_s; } ret_s.len = len; strcpy (ret_s.s, s); return ret_s; } /* display the string stored in the structure string */ void display_string (string s) { s.s != NULL ? printf(“String=%s\n”, s.s) : printf(“String=NULL\n”); } int main(void) { string msg1 = create_string(“This is a struct string”); string msg2 = create_string(NULL); display_string(msg1); display_string(msg2); return EXIT_SUCCESS; } $ gcc -o file_scope4 -std=c99 -pedantic file_scope4.c $ ./file_scope4 String=This is a struct string String=NULL
VII.6.6 Same scope Two identifiers are said to have the same scope if their scope ends at the same point within a program. Two identifiers with file scope have the same scope. Two identifiers declared in the same block have the same scope. Two identifiers having function prototype scope have the same scope if they belong to the same declaration of a function.
VII.6.7 Scope and visibilty We summarize what we said about the visibility of identifiers. Two identifiers having the same name space may be identical if they are declared in different scopes. As scopes may overlap (a scope s1 may be larger than a scope s2), an identifier declared in the larger scope may be hidden by identifiers declared in embedded scopes (see Figure VII‑2).
Figure VII‑2 Scope overlaps
VII.7 Storage duration Any object is stored the computer’s memory so that it could be reused for reading or updating. An object exists as long as it has a memory location storing it. What happens if try to use an object that no longer exists? So far, we have always worked with objects within their scope and then their lifetime seemed to be obvious: they existed in their scope. What do you think about the following code?
$ cat function_lifetime1.c #include #include int *f(void) { int s[10] = {10, 18, 20}; return s; } int main(void) { int *p = f(); return EXIT_SUCCESS; } $ gcc -o function_lifetime1 -std=c99 -pedantic function_lifetime1.c function_lifetime1.c: In function ‘f’: function_lifetime1.c:7:4: warning: function returns address of local variable $ ./function_lifetime1
The compiler guessed our code was wrong. In our program, the f() function returned a pointer to an array. The problem is that the array was a local variable (block scope) that would be destroyed as soon as the function f() terminated. This means the pointer returned by the f() function pointed to an object that no longer exists. Hence the question what is the lifetime of objects? The time during which an object exists, while the program is running, is the lifetime of the object. An object exists as long as it is bound to a memory chunk in which it is stored. In other words, the storage duration is the lifetime of an object. There are three kinds of storage durations: automatic, static and allocated. The storage-class specifiers (auto, extern, static, register) are the keywords determining the storage duration for an identifier. A single storage-class specifier is allowed in a declaration. However, only the storage-class register is allowed in the declarations of formal parameters in function prototype declarations. Storage duration must not be confused with scope. A scope defines the portion of a program where you can use an identifier. The storage duration defines the lifetime of an identifier. Thus, a variable may exist as long as the program is running while it can be used only within a specific block (local variable declared with the keyword static).
VII.7.1 Automatic duration An object declared within a block (block scope) with the storage-class specifier auto has
automatic storage duration. The reserved word auto is generally omitted. It is used by default when objects having block scope are declared without the storage-class specifier static. This means that local objects have automatic storage duration. The storage-class specifier register also declares an object with automatic storage duration. It is used to suggest the compiler to make the access of a variable as fast as possible. This is not a requirement. The compiler may ignore it and then considers it as if it was just declared with the keyword auto. The C standard does not specify how to make the access faster. Technically, it means the variable will be put in a register not in the computer’s memory. The storage-class specifier register is not frequently used because of its constraints and because the compiler is smart enough to optimize the code according to the processor architecture. Since registers have no address, the address of an object declared with the keyword register is not computable. This means, the operator & cannot be applied to an object declared with the storage-class specifier register. When applied to an array, since its address cannot be computed, you cannot use subscripts to access its elements as shown below: $ cat register.c #include #include int main(void) { register int v =10; register int s[10] = { 1, 2 , 3}; printf(“&v=%p\n”, &v); printf(“s[1]=%d\n”, s[1]); return EXIT_SUCCESS; } $ gcc -o register -std=c99 -pedantic register.c register.c: In function ‘main’: register.c:7:4: error: address of register variable ‘v’ requested register.c:8:25: warning: ISO C forbids subscripting ‘register’ array
An object having automatic storage duration (local objects) is created at its declaration within its block and is destroyed as the block is left: it is temporary. When an object is created, storage is allocated for storing its value. It is destroyed when its storage is freed and becomes available for another object. This implies you must not use the address of an object with automatic storage duration outside its scope as we did in example function_lifetime1.c. If a block is entered several times, such as a in the case of a loop body, local objects of the
block are created and initialized each time the block is entered and destroyed each time it is left.
VII.7.2 Static storage duration An object has static storage duration in the following cases: o It is declared with the storage-class specifier static. Its scope can be file or block. o It is has file scope (global object). o It is declared with the storage-class specifier extern. Throughout the book, we call static identifier an identifier declared with the storage-class specifier static. Therefore, a static identifier has static storage duration and can have file scope (global) or block scope (local). VII.7.2.1 Global objects (file scope) An object declared outside functions (file scope) is said to be external or global. Not only is it visible within the source file in which it is declared but also within all other source files: a global object is visible throughout the whole program. It exists until the program terminates: it is permanent. It is created once at its declaration and destroyed when the program ends. For example, functions are global (file scope) by design. In the following example, the variable status is visible throughout the source file function_lifetime2.c and exists as long as the program is running: $ cat function_lifetime2.c #include #include int status = 10; /* global variable */ void f(void) { printf (“function f() status=%d\n”, status); status = 20; printf (“function f() set status to %d\n\n”, status); } void g(void) { printf (“function g() status=%d\n”, status); status = 30; printf (“function g() set status to %d\n\n”, status); }
int main(void) { f(); g(); printf (“function main() status=%d\n”, status); return EXIT_SUCCESS; } $ gcc -o function_lifetime2 -std=c99 -pedantic function_lifetime2.c $ ./function_lifetime2 function f() status=10 function f() set status to 20 function g() status=20 function g() set status to 30 function main() status=30
VII.7.2.2 Extern storage-class specifier The extern storage-class specifier will be better understood in the next chapter. So far, our program is composed of a single source file holding all our code. As matter of fact, a program can be composed of several source file. In each source file, you can declare global objects and functions (that are global by design). The extern storage-class specifier used in a declaration tells the compiler the object is actually defined in another source file as an external object (file scope). For example, the declaration extern int status in a translation unit indicates the variable status is declared in another file as global object (file scope) and we wish to access it throughout this source file. Such an object holds the same identifier throughout the whole program and exists until the program terminates. It is created once at its declaration and destroyed when the program ends: it is permanent. Let us suppose our program is made of two source files function_lifetime_dummy.c: $ cat function_lifetime_main1.c #include #include extern int status; /* global variable defined elsewhere */ int main(void) { printf (“status=%d\n\n”, status); return EXIT_SUCCESS; } $ cat function_lifetime_dummy1.c
function_lifetime_main.c
and
int status = 40; /* global variable declared and initialized here */ $ gcc -c function_lifetime_dummy1.c $ gcc -c function_lifetime_main1.c $ gcc -o function_lifetime_main1 function_lifetime_main1.o function_lifetime_dummy1.o $ ./function_lifetime_main status=40
We will talk more about modules in the next chapter. The command gcc –c creates an object file (binary code) from a source file. The command gcc –o creates an executable from object files. By design, a function is global. In the following example the function f() is visible throughout the whole program composed of two source files function_lifetime_main2.c and function_lifetime_dummy2.c: $ cat function_lifetime_main2.c #include extern void f(void); /* function f() is declared elsewhere */ int main(void) { f(); return EXIT_SUCCESS; } $ cat function_lifetime_dummy2.c #include void f(void) { printf (“function f()\n”); } $ gcc -c function_lifetime_dummy2.c $ gcc -c function_lifetime_main2.c $ gcc -o function_lifetime_main2 function_lifetime_main2.o function_lifetime_dummy2.o $ ./function_lifetime_main2 function f()
VII.7.2.3 Static storage-class specifier The static storage-class specifier can be used in two ways: at file scope or block scope. An object declared with the storage-class specifier static exists until the program terminates: a static object is permanent.
VII.7.2.3.1 File scope
Used outside functions (file scope), the static storage-class specifier makes an object visible only within the source file in which it is declared. Without the storage-class specifier static, a global object can be accessed within other source files. Let us reuse our previous example, let us place the static keyword before our variable status. What do you think it will happen? $ cat function_lifetime_main3.c #include #include extern int status; /* global variable defined elsewhere */ int main(void) { printf (“status=%d\n\n”, status); return EXIT_SUCCESS; } $ cat function_lifetime_dummy3.c static int status = 40; /* global variable declared and initialized here */ $ gcc -c function_lifetime_dummy3.c $ gcc -c function_lifetime_main3.c $ gcc -o function_lifetime_main3 function_lifetime_main3.o function_lifetime_dummy3.o Undefined first referenced symbol in file status function_lifetime_main3.o ld: fatal: symbol referencing errors. No output written to function_lifetime_main3 collect2: ld returned 1 exit status
The compilation failed because the global variable status is no longer visible by the source file function_lifetime_main3.c. The global variable status is visible only throughout the source file function_lifetime_dummy3.c. What we said about objects is holds true for functions. For example: $ cat function_lifetime_main4.c #include extern void f(void); /* function f() is declared elsewhere */ int main(void) { f();
return EXIT_SUCCESS; } $ cat function_lifetime_dummy4.c #include static void f(void) { printf (“function f()\n”); } $ gcc -c function_lifetime_dummy4.c $ gcc -c function_lifetime_main4.c $ gcc -o function_lifetime_main4 function_lifetime_main4.o function_lifetime_dummy4.o Undefined first referenced symbol in file f function_lifetime_main4.o ld: fatal: symbol referencing errors. No output written to function_lifetime_main4 collect2: ld returned 1 exit status
The compilation failed because the function f() in the source file function_lifetime_dummy4.c is visible only within this file. We will say more about static objects in the next chapter. For now, just retain the keyword static used with identifiers having file scope make them visible only in the source file in which they are declared. VII.7.2.3.2 Block scope
Used with an identifier having block scope, a temporary local object (automatic), it turns it into a permanent object. The object is created and initialized at program startup and keeps its value until the program terminates. Let us consider the first program: $ cat function_lifetime5.c #include #include void f(void) { static int j = 10; printf (“j=%d\n”, j); j++; } int main(void) { f(); f();
f(); f(); return EXIT_SUCCESS; } $ gcc -o function_lifetime5 -std=c99 -pedantic function_lifetime5.c $ ./function_lifetime5 j=10 j=11 j=12 j=13
Compare with the following one: $ cat function_lifetime6.c #include #include void f(void) { int j = 10; printf (“j=%d\n”, j); j++; } int main(void) { f(); f(); f(); f(); return EXIT_SUCCESS; } $ gcc -o function_lifetime6 -std=c99 -pedantic function_lifetime6.c $ ./function_lifetime6 j=10 j=10 j=10 j=10
In the program function_lifetime5.c, the variable j has static storage duration. It is created (and initialized) at program startup and exists as long as the program runs, keeping its value until it is changed. The variable j is permanent even though it is local (block scope). In the program function_lifetime6.c, the variable j has automatic storage duration. It is created and initialized each time the function f() is executed. It is destroyed as the function f() is
left. The variable j is temporary. This means that if we rewrite our program function_lifetime1.c using the static keyword, it will work as expected: $ cat function_lifetime7.c #include #include int *f(void) { static int s[10] = {10, 18, 20}; return s; } int main(void) { int *p = f(); printf (“p[0]=%d\n”, p[0]); return EXIT_SUCCESS; } $ gcc -o function_lifetime7 -std=c99 -pedantic function_lifetime7.c $ ./function_lifetime7 p[0]=10
Yes, it will work but it implies you will get always the same array each time you call the function f() as shown below: $ cat function_lifetime8.c #include #include int *f(void) { static int s[10] = {10, 18, 20}; return s; } int main(void) { int *p; int *q; p = f();
p[0] = 200; printf (“p[0]=%d\n”, p[0]); q = f(); printf (“q[0]=%d\n”, q[0]); return EXIT_SUCCESS; } $ gcc -o function_lifetime8 -std=c99 -pedantic function_lifetime8.c $ ./function_lifetime8 p[0]=200 q[0]=200
If this is what you want, it is fine but if you want to get a new array at each call, you have to use memory block dynamically allocated by malloc() or calloc(). Such objects are more interesting since they have allocated storage duration.
VII.7.3 Allocated storage duration A valid pointer holds an address pointing to an existing memory block. As we explained it, a valid pointer reference an object created automatically (such as a variable) or a memory area allocated by the malloc(), calloc() or realloc() function. An automatic object is created in the block in which it is declared and destroyed when left. A pointer referencing such an object can be used only within the block in which the object is declared. A pointer to an object with static storage duration can be returned by a function and used throughout a program until it terminates. A memory area allocated by the malloc(), calloc() or realloc() function can be exploited until the free() function is invoked: such an abject has allocated storage duration. You decide the lifetime of such an object. As soon as, you do not need it, you just call the free() function. You can view it as a dynamic storage duration controlled by the user. We can rewrite our program function_lifetime1.c using an allocated memory area: $ cat function_lifetime9.c #include #include int *f(void) { int len = 10; int *s = malloc(len * sizeof *s); s[0] = 10; s[1] = 18; s[2] = 20;
return s; } int main(void) { int *p; int *q; p = f(); p[0] = 200; printf (“p[0]=%d\n”, p[0]); q = f(); printf (“q[0]=%d\n”, q[0]); return EXIT_SUCCESS; } $ gcc -o function_lifetime9 -std=c99 -pedantic function_lifetime9.c $ ./function_lifetime9 p[0]=200 q[0]=10
As soon as you no longer need the allocated memory area, you can relinquish it as shown below: $ cat function_lifetime10.c #include #include int *f(void) { int len = 10; int *s = malloc(len * sizeof *s); s[0] = 10; s[1] = 18; s[2] = 20; return s; } int main(void) { int *p; int *q;
p = f(); p[0] = 200; printf (“p[0]=%d\n”, p[0]); free( p ); /* we do not need anymore the allocated memory */ q = f(); printf (“q[0]=%d\n”, q[0]); free( q ); /* we do not need anymore the allocated memory */ return EXIT_SUCCESS; } $ gcc -o function_lifetime10 -std=c99 -pedantic function_lifetime10.c $ ./function_lifetime10 p[0]=200 q[0]=10
Do not confuse the pointer holding the address of the referenced object with the object itself. A pointer is a variable holding an address of an object and then has storage duration different from the object it actually references. In our example function_lifetime10.c, the allocated memory area is pointed to by the pointer s in the function f() and then by the pointers p and q. In the function f(), the pointer s has block storage duration: as the function is left, the pointer is destroyed while the allocated memory block still exists and then used in the main() function.
VII.8 Compound literals A string literal has static storage duration: it exists as long as the program is executing. This is not true for compound literals. If it has file scope, a compound literal has static storage duration but if it has block scope, it has automatic storage duration. This can lead to misuses, as you will find out, hence the section about compound literals placed here in the book. A compound literal, introduced in the C99 standard, is an anonymous object (i.e. it holds no name) that is a list of comma-separated values within braces such as {1.2, 12.7}. A compound literal, by itself, has no predefined type. This implies that before assigning it, you have to cast it. In the following example, though nobody does such a thing, we assign the variable v a compound literal: $ cat pointer_lit1.c #include #include
int main(void) { float v; v = (float){10.1}; printf(“v=%f\n”, v); return EXIT_SUCCESS; } $ gcc -o pointer_lit1 -std=c99 -pedantic pointer_lit1.c $ ./pointer_lit1 v=10.100000
VII.8.1.1 Compound literals and pointers We have learned to allocate memory and assign it to a pointer, assign an existing object to a pointer but we could also assign a pointer a compound literal. To be more specific, the C language, as of C99, allows a more convenient way to write the following program without allocating memory: $ cat pointer_lit2.c #include #include int main(void) { float *p = (float *)malloc(2 * sizeof *p); p[0] = 10.1; p[1] = 3.14; printf(“p[0]=%f p[1]=%f\n”, p[0], p[1]); free(p); return (EXIT_SUCCESS); } $ gcc -o pointer_lit2 -std=c99 -pedantic pointer_lit2.c $ ./pointer_lit2 p[0]=10.100000 p[1]=3.140000
You can initialize a pointer with literals by using an anonymous array as follows: $ cat pointer_lit3.c #include #include
int main(void) { float *p = (float []){10.1, 3.14}; printf(“p[0]=%f p[1]=%f\n”, p[0], p[1]); return (EXIT_SUCCESS); } $ gcc -o pointer_lit3 -std=c99 -pedantic pointer_lit3.c $ ./pointer_lit3 p[0]=10.100000 p[1]=3.140000
Why did it work? In our example pointer_init_lit3, we gave the type float[] (array of float) to the compound literal allowing an anonymous array to be assigned to the pointer. All happened as if we did something like this: $ cat pointer_lit4.c #include #include int main(void) { float unnamed_array[] = {10.1, 3.14}; float * p = unnamed_array; printf(“p[0]=%f p[1]=%f\n”, p[0], p[1]); return EXIT_SUCCESS; } $ gcc -o pointer_lit4 -std=c99 -pedantic pointer_lit4.c $ ./pointer_lit4 p[0]=10.100000 p[1]=3.140000
You could specify the size of the anonymous array: $ cat pointer_lit5.c #include #include int main(void) { float *p = (float [4]){10.1, 3.14}; printf(“p[0]=%f p[1]=%f p[2]=%f p[3]=%f\n”, p[0], p[1], p[2], p[3]); return (EXIT_SUCCESS);
} $ gcc -o pointer_lit5 -std=c99 -pedantic pointer_lit5.c $ ./pointer_lit5 p[0]=10.100000 p[1]=3.140000 p[2]=0.000000 p[3]=0.000000
Uninitialized items of the anonymous array take the value of zero. It works fine but be cautious…unlike string literals that always has static storage duration, compound literals have automatic storage duration when appearing within a block (block scope) and has static storage duration when appearing outside functions (file scope). Accordingly, the following program is wrong producing an undefined output: $ cat pointer_lit6.c #include #include int main(void) { int i; int *p[3]; for (i=0; iage); if ( student2 ) printf(“%s %s %d\n”, student2->first_name, student2->last_name, student2->age);
return EXIT_SUCCESS; } $ gcc -o function_return6 -std=c99 -pedantic function_return6.c $ ./function_return6 Christine Sun 34 David Moon 44
The statement student1 = create_student(“Christine”, “Sun”, 34) calls the create_student() function that returns a pointer to a structure. The pointer student1 points to the address of the allocated memory area storing the structure.
VII.11 Default argument promotions The old C declarations of functions (pre-C standard declaration style, known as K&R style) do not constitute prototypes (not recommended). That is, the parameters are not declared within the function declarations. The problem is the compiler cannot check and [52] convert the passed arguments to the expected target types. As of C89 , the compiler performs default conversions known as default argument promotions before passing the arguments. The compiler applies the integer promotion rule (see section IV.14.2) on the arguments having an integer type except for the arguments having type float that are converted to double. The integer promotion rule states a value of integer type having a type smaller than int (char, or short whether they are signed or unsigned) is promoted to int or unsigned int (see section IV.14.2). In the following example, the default argument promotions apply to the functions disp_float1() as it has no prototype. $ cat default_arg_promotion1.c #include #include void disp_float(); // Old declaration style. Not a prototype int main(void) { float f = 1.2; disp_float(f); return EXIT_SUCCESS; } void disp_float(float f) { printf(“disp_float(): f=%f\n”, f);
} $ gcc -o default_arg_promotion1 -std=c99 -pedantic default_arg_promotion1.c gcc -o default_arg_promotion1 -std=c99 -pedantic default_arg_promotion1.c default_arg_promotion1.c:13:6: error: conflicting types for ‘disp_float’ default_arg_promotion1.c:13:1: note: an argument type that has a default promotion can’t match an empty parameter name list declaration default_arg_promotion1.c:4:6: note: previous declaration of ‘disp_float’ was here
The compiler generated an error because the parameter of the function disp_float() must be double as the default argument promotions convert the type float to double (next section describes the function type compatibility). Both the declarations are incompatible, hence the error message. Now, if we change the type of the parameter f to the expected type, the compiler generates no error: $ cat default_arg_promotion2.c #include #include void disp_float(); int main(void) { float f = 1.2; disp_float(f); return EXIT_SUCCESS; } void disp_float(double f) { printf(“disp_float(): f=%f\n”, f); } $ gcc -o default_arg_promotion2 -std=c99 -pedantic default_arg_promotion2.c $ ./default_arg_promotion2 disp_float(): f=1.200000
Declaring a function in the old style prevents the compiler from checking and converting the arguments to the appropriate types. In the following example, the argument f of type int will not be converted to double before passing it to the function causing the function to have an undefined behavior. $ cat default_arg_promotion3.c
#include #include void disp_float(); int main(void) { int f = 1; disp_float(f); return EXIT_SUCCESS; } void disp_float(double f) { printf(“disp_float(): f=%f\n”, f); } $ gcc -o default_arg_promotion3 -std=c99 -pedantic default_arg_promotion3.c $ ./default_arg_promotion3 disp_float(): f=-18680809829685359372194810…
More generally, the default argument promotions apply to the arguments passed to a function when the parameters of the function are not declared within the declaration of the function. This happens in two cases: functions declared with no prototype (case studied above) or functions having variable number of arguments (variadic functions) such as printf() (see Chapter VII Section VII.28).
VII.12 Function type compatibility If you declare functions in the standard way, by providing prototypes, the rule that governs the compatibility between functions types is quite simple but if a program uses the old fashion to declare functions (deprecated declarations), things are not so simple… If two functions are declared in a standard way by providing a prototype, their function types are compatible if the following conditions are met: o Their return type are compatible o They have the same number of parameters and the corresponding parameters have compatible types o If a function has a variable number of parameters, the other should also be a variadic function.
In the following example, both the declarations of the function add() declare compatible function types: $ cat func_compat2.c #include #include long add(long a, long int b); // first declaration with prototype int main(void) { printf(“sum=%ld\n”, add(2, 3) ); return EXIT_SUCCESS; } // second declaration with prototype. Both function types are compatible signed long add(signed long a, signed long b) { return a+b; }
Now, if you declare functions using the old style (not recommended), there are several cases to consider. If two functions are declared without prototype (pre-C standard declaration style), two function types are compatible if they return compatible types. In the following example, both the declarations of the function display_header() declare compatible function types: $ cat func_compat1.c #include #include void display_header(); // first declaration with no prototype. Old style int main(void) { display_header(“STARTING OF PROGRAM”); return EXIT_SUCCESS; } // second declaration with no prototype. Old declaration style. // Both declarations are compatible void display_header(msg) char *msg; { printf(“=======================\n”);
printf(“==%s==\n”, msg); printf(“=======================\n”); }
If a function declaration is a prototype, and the other function declaration is not a prototype and is not part of a definition. The function types are compatible if the following conditions are met: o Their return type are compatible o There is no ellipsis declaring a variable number of parameters o The parameters have compatible types with the types resulting from the default argument promotions In the following example, both the declarations of the function add() declare compatible function types: $ cat func_compat3.c #include #include double add();// first declaration with no prototype. Old style int main(void) { printf(“sum=%f\n”, add(2.1, 3.1) ); return EXIT_SUCCESS; } double add(double a, double b) { // prototype. Function Types are compatible return a+b; }
In the following example, the two declarations of the function add() declare incompatible function types because of the default argument promotions: $ cat func_compat4.c #include #include float add();// first declaration with no prototype. Old style int main(void) { printf(“sum=%f\n”, add(2.1, 3.1) );
return EXIT_SUCCESS; } // prototype. Function Types are incompatible float add(float a, float b) { return a+b; } $ gcc -o func_compat4 -std=c99 -pedantic func_compat4.c func_compat4.c:11:7: error: conflicting types for ‘add’ func_compat4.c:11:1: note: an argument type that has a default promotion can’t match an empty parameter name list declaration func_compat4.c:4:7: note: previous declaration of ‘add’ was here
In contrast, the two declarations of the function add() declare compatible function types: $ cat func_compat4.1.c #include #include float add();// first declaration with no prototype. Old style int main(void) { printf(“sum=%f\n”, add(2.1, 3.1) ); return EXIT_SUCCESS; } // prototype. Function Types are compatible float add(double a, double b) { return a+b; }
If a function declaration is a prototype, and the other function declaration is not a prototype and is part of a definition. The function types are compatible if the following conditions are met: o Their return type are compatible o They have the same number of parameters o The parameters have compatible types with the types resulting from the default argument promotions In the following example, the two declarations of the function add() declare compatible function types: $ cat func_compat5.c
#include #include double add(double a, double b); // prototype int main(void) { printf(“sum=%f\n”, add(2.1, 3.1) ); return EXIT_SUCCESS; } // old declaration style double add(a, b) double a; double b; { return a+b; }
In the following example, the two declarations of the function add() declare incompatible function types: $ cat func_compat6.c #include #include float add(float a, float b); // First declaration: prototype int main(void) { printf(“sum=%f\n”, add(2.1, 3.1) ); return EXIT_SUCCESS; } // Second declaration: old style float add(a, b) float a; float b; { return a+b; } $ gcc -o func_compat6 -std=c99 -pedantic func_compat6.c func_compat6.c: In function ‘add’: func_compat6.c:12:7: warning: promoted argument ‘a’ doesn’t match prototype func_compat6.c:4:7: warning: prototype declaration func_compat6.c:12:16: warning: promoted argument ‘b’ doesn’t match prototype
func_compat6.c:4:7: warning: prototype declaration
VII.13 Conversions We complete here the conversion rules we have studied so far.
VII.13.1 Conversion Rules VII.13.1.1 Explicit conversions Explicit conversions can be performed through the explicit cast. Table VII‑1 lists the permitted explicit conversions.
Table VII‑1 Explicit conversions
A pointer to a function of any type can be converted to a pointer to a function of another type and back again without any change. VII.13.1.2 Implicit conversions Table VII‑2 lists the permitted implicit conversions occurring in the following situations: o Simple assignments (including initializations) o Function calls o return statement If an implicit conversion cannot be performed, an explicit conversion is then required.
Table VII‑2 Implicit conversions
You may have noticed that implicit conversions involving scalar types (pointer types, arithmetic types), structure and union types do care about the qualifiers applied to objects of those types. In the following examples, the const qualifier does not matter: const int b; int a = b; // int const int const int c = a; // const int int int *const p = &a; // int *const int * int *q = p; // int * int * const struct A {int k; } st_a = { 1 }; const struct A st_b = st_a; // const struct A struct A struct A st_c = st_b; // struct A const struct A
Consider the assignment X = Y. If the variable X has a qualified type and Y has an unqualified type, there is no problem as qualifiers adds restrictions on an unqualified type. Conversely, if the variable X has an unqualified type and Y has a qualified type, is there an issue as we assign a value with some constraints to a variable that has none? As matter of fact, in this case, qualifiers do matter as explained in Chapter IV Section IV.9. The qualifiers are removed from the value of an lvalue. This means, if the variable X has an unqualified type and Y has a qualified type, the value of the lvalue Y has an unqualified type and then can be copied to X safely. Do not confuse this with pointed object type that can be qualified and in this case, the qualifiers are kept and matter:
int a=10; int *const p = &a; // OK int *q = p; // OK const int b=10; const int *m = &b; // OK, b has type const int * const int *n = &a; // OK: &a has type int * int *r = &b; // Invalid assignment: &b has type const int *
VII.13.2 Conversions and functions The return value of a function is subject to implicit conversions as listed in Table VII‑2. In the following example, the return value of the function f() is converted to int before being returned: $ cat func_conv1.c #include #include int f(void) { return 3.14; } int main(void) { float x = f(); printf(“%f\n”, x); return EXIT_SUCCESS; } $ gcc -o func_conv1 -std=c99 -pedantic func_conv1.c $ ./func_conv1 3.000000 $
The implicit conversion rules (Table VII‑2) applies to the arguments of functions when called. Consider the show_param() function: void show_param(int a) { printf(“show_param(): a=%d\n”, a); }
What happens if we pass arguments of type double or char? The arguments are implicitly converted to type of the corresponding parameters according the rules described in Table
VII‑2 as shown below: $ cat func_conv2.c #include #include void show_param(int a) { printf(“show_param(): a=%d\n”, a); } int main(void) { double x = 3.14159; char j = 10; printf(“main(): x=%f\n”, x); show_param( x ); printf(“–––-\n”); printf(“main():j=%d \n”, j); show_param( j ); return EXIT_SUCCESS; } $ gcc -o func_conv2 -std=c99 -pedantic func_conv2.c $ ./func_conv2 main(): x=3.141590 show_param(): a=3 –––main():j=10 show_param(): a=10
VII.14 Call-by-value When you call a function, the values of arguments you pass to the function are copied to their corresponding parameters (see Figure VII‑3). This method is known as a call-byvalue (also called a pass-by-value). For example, when you invoke the function add(x, y), the value of x is copied to the first parameter a and the value of y is copied to the second parameters b: $ cat call_by_value1.c
#include #include double add(double a, double b) { return a+b; } int main(void) { float x = 10; float y = 2.1; double z = add( x, y ); printf(“%f + %f = %f\n”, x, y, z); return EXIT_SUCCESS; }
In C, the call-by-value is the only way to call a function: the arguments are copied. It is often sufficient but it happens that we want the called function to modify the arguments as in the example below. The following example seems to work, but it does not. The goal of our program is to swap the values of arguments: $ cat call_by_value2.c #include #include void swap(int a, int b) { int c = b; b = a; a = c; } int main(void) { int x = 1; int y = 10; printf(“x=%d and y=%d\n”, x, y); swap( x, y ); printf(“x=%d and y=%d\n”, x, y); return EXIT_SUCCESS;
} $ gcc -o call_by_value2 -std=c99 -pedantic call_by_value2.c $ ./call_by_value2 x=1 and y=10 x=1 and y=10
Since the arguments were copied, the inversion did not occur. If you pass structures as arguments, they are also copied, which causes issues with structures having a flexible array member as depicted below: $ cat call_by_value3.c #include #include #include struct myString { int len; char s[]; }; typedef struct myString string; /* displaying the string in structure */ void print_string(string str) { printf(“str.s=%s\n”, str.s); } int main(void) { char *s = “Hello World”; int len = strlen( s ); /* size of s is len + 1 for the null character \0 terminating a string */ string *p_str = malloc( sizeof *p_str + (len + 1) ); if ( p_str == NULL ) { printf(“Cannot allocate memory”); return EXIT_FAILURE; } p_str->len = len; strcpy(p_str->s, s);
print_string( *p_str ); /* display the string */ return EXIT_SUCCESS; } $ gcc -o call_by_value3 -std=c99 -pedantic call_by_value3.c $ ./call_by_value3 str.s=z���e��
Explanation: o In main() function, The pointer p_str points to a structure with a flexible array member. Therefore, a memory block must also be allocated to the member s in the structure. As a string is terminated by the null character, the size of the flexible array member s that can hold the string “Hello World” is the length of that string plus one. o In main(), the statement print_string( *p_str ) calls the function print_string() to show the member s. Since the structure is passed by value, it is copied: the parameter str is assigned the argument *p_str. o The function print_string() displays rubbishes but not the member s. The rational is, as we explained it earlier, the flexible array member is ignored while the structure is copied by assignment. When we pass the argument *p_str, the member len is copied while the member s is left behind. o The next section explains how to do it properly. Finally, we need another method to call functions. The second method to call functions is known as a call-by-reference.
Figure VII‑3 Call-by-value
VII.15 Call-by-reference As matter of fact, unlike other languages, the C language does not implement the call-byreference method (also called pass-by-reference) but emulates it through pointers. A callby-reference means that instead of copying the arguments, we pass the objects themselves (i.e. a reference), which allows the functions to have access to them directly (Figure VII‑4). In C, we simulate it through pointers.
Figure VII‑4 Call-by-reference
If you remember our example call_by_value1.c, it failed to swap to arguments because we used the call-by-value method. Now, let us write it using pointers instead: $ cat call_by_ref1.c #include #include
void swap(int *a, int *b) { int c = *b; *b = *a; *a = c; } int main(void) { int x = 1; int y = 10; printf(“x=%d and y=%d\n”, x, y); swap( &x, &y ); printf(“x=%d and y=%d\n”, x, y); return EXIT_SUCCESS; } $ gcc -o call_by_ref1 -std=c99 -pedantic call_by_ref1.c $ ./call_by_ref1 x=1 and y=10 x=10 and y=1
This time our goal was reached by using pointers to the variables x and y. Why did it work? The statement swap(&x , &y) calls the function swap() and passes the pointers to objects x and y. Pointers are copied (call-by-value) to the corresponding parameters, but this time the parameters a and b reference the objects themselves and then points to x and y. Changing the objects pointed to by the parameters a and b come down to changing the variables x and y (Figure VII‑4). Passing a pointer instead of a structure could help us to overcome the issue regarding the flexible array member in example call_by_value3.c. In that example, we passed a structure with a flexible array member to be printed by the function print_string(). Our problem was the structure was passed by value and then the flexible array member was ignored (not copied). If we pass the structure by reference, the parameter str of the function print_string() accesses directly the structure with no copy. In the new version of our program, we also will be implementing a new function, called allocate_string(), that allocates storage for the structure. $ cat call_by_ref2.c #include #include
#include struct myString { int len; char s[]; }; typedef struct myString string; /* FUNCTION: print_string PARAMETERS: string *p_str OBECTIVE: display the string in structure RETURN: - 1 if successul - 0 otherwise */ int print_string(string *p_str) { if (p_str == NULL) return 0; printf(“String=%s\n”, p_str->s); return 1; } /* FUNCTION: allocate_string PARAMETERS: - char *msg: will be copied to the s member OBECTIVE: returning a pointer to a string structure dynamically allocated TASKS: - allocate memory for a string structure with malloc() - initialize the stucture with parameter msg RETURN: - returns a pointer to the newly created structure */ string *allocate_string(char *msg) { int len; string *p_str; if ( msg == NULL ) return NULL;
len = strlen( msg ); /* size of member s is len + 1 for the null character \0 terminating a string */ p_str = malloc( sizeof *p_str + (len + 1) ); if (p_str == NULL ) { printf(“Cannot allocate memory for string structure\n”); return NULL; } strcpy(p_str->s, msg); p_str->len = len; return p_str; } void free_string( string *p_str ) { if ( p_str != NULL ) free( p_str ); } int main(void) { char *s = “Hello World”; string *p_string1, *p_string2; p_string1 = allocate_string( s ); /* allocate string structure */ p_string2 = allocate_string( “Second Structure” ); /* allocate string structure */ print_string( p_string1 ); /* display the string structure */ print_string( p_string2 ); /* display the string structure */ free_string( p_string1 ); p_string1 = NULL; free_string( p_string2 ); p_string2 = NULL; return EXIT_SUCCESS; } $ gcc -o call_by_ref2 -std=c99 -pedantic call_by_ref2.c $ ./call_by_ref2
String=Hello World String=Second Structure
At the end of the program, we freed the allocated memory for our structures.
VII.16 Passing arrays VII.16.1 Array declared as formal parameter What happens if we pass an array to a function? Passing an array of objects of type obj_type is equivalent to pass a pointer to type obj_type to a function: the array is converted to a pointer to its first element. This rule has three consequences: o A parameter of a function can be declared equally as obj_type p[n] or obj_type p[] or obj_type *p o If arr is an array, you can pass to a function an array as arr, or &arr[0] o The size of the array passed to a function is unknown within the body of the function. In the following two sections, we go into details on this simple rule. We will talk about one-dimensional arrays and multidimensional arrays though both of them follows the same rule, and are then always passed as a pointer to their initial element. As of C99, programmers can also specify qualifiers within brackets []. More generally, a formal parameter of the form arr_type arr[qualifiers n]
is converted to arr_type * qualifiers arr
Where arr_type is the type of the elements of the array, n is an optional parameter representing its length (that is ignored), and qualifiers represents a list of qualifiers (const, volatile or restrict). For example: $ cat array_formal_param1.c #include #include void f(int arr[const 10]) { arr = NULL; /* error arr has type int *const */ } int main(void) { int a[20] = { 1, 2 };
f(a); /* array a converted to int *const */ return EXIT_SUCCESS; } $ gcc -o array_formal_param1 -std=c99 -pedantic array_formal_param1.c array_formal_param1.c: In function ‘f’: array_formal_param1.c:5:3: error: assignment of read-only location ‘arr’
Compare with the following code snippet: $ cat array_formal_param2.c #include #include void f(int arr[10]) { arr = NULL; // OK arr has type int * } int main(void) { int a[20] = { 1, 2 }; f(a); // array a converted to int * return EXIT_SUCCESS; } $ gcc -o array_formal_param2 -std=c99 -pedantic array_formal_param2.c $
C99 introduced another interesting feature that is not implemented in all compilers. The storage-class specifier static can be placed within brackets [] in a declaration of a formal parameter of a function: arr_type arr[static n]
It indicates that arr is a pointer to the first element of the array, has at least n elements and [53] is not a null pointer . $ cat array_formal_param3.c #include #include void f(int arr[static 10]) { // arr not null and has at least 10 elements int i; for (i=0; i < 10; i++) printf(“arr[%d]=%d\n”, i, arr[i]); }
int main(void) { int a[20] = { 1, 2 }; f(a); return EXIT_SUCCESS; }
VII.16.2 One dimensional array Consider the following example: $ cat func_pass_array1.c #include #include #define LEN 10 void array_size( int list[] ) { printf(“array_size(): sizeof of array=%d\n”, sizeof list); } void pointer_size( int *list ) { printf(“pointer_size(): sizeof of pointer=%d\n”, sizeof list); } int main(void){ int a_list[ LEN ] = { 0, 1 , 8 , 9, 5 }; int *p_list = malloc( LEN * sizeof *p_list ); printf(“main(): sizeof of array=%d\n”, sizeof a_list ); array_size( a_list ); printf(“\nmain(): sizeof of pointer=%d\n”, sizeof p_list); pointer_size( p_list ); return EXIT_SUCCESS; } $ gcc -o func_pass_array1 -std=c99 -pedantic func_pass_array1.c $ ./func_pass_array1 main(): sizeof of array=40 array_size(): sizeof of array=4
main(): sizeof of pointer=4 pointer_size(): sizeof of pointer=4
The example func_pass_array1.c shows two things: o The prototypes of the functions array_size() and pointer_size() are the same though their prototype seems to be different (the function prototypes are actually equivalent). o An array is converted to a pointer when passed to a function. Whether arrays are converted to pointers implies we cannot compute the size or the number of elements in an array passed to a function. The following example is then wrong: $ cat func_pass_array2.c #include #include #define LEN 10 /* incorect implementation */ void display_array( int list[] ) { int i; int array_nb_elt = sizeof list / sizeof list[0]; for (i = 0; i < array_nb_elt; i++ ) printf(“list[%d]=%d\n”, i, list[i]); } int main(void) { int a_list[ LEN ] = { 0, 1 , 8 , 9, 5 }; display_array( a_list ); return EXIT_SUCCESS; } $ gcc -o func_pass_array2 -std=c99 -pedantic func_pass_array2.c $ ./func_pass_array2 list[0]=0
To work with an array passed as an argument, we have to specify its size or the number of the elements it holds as if we passed a pointer. The previous example must be written as follows:
$ cat func_pass_array3.c #include #include #define LEN 10 void display_array( int list[], size_t array_size) { int i; int len; if ( list == NULL ) return; len = array_size / sizeof list[ 0 ]; for (i = 0; i < len; i++ ) printf(“list[%d]=%d\n”, i, list[i]); } int main(void) { int a_list[ LEN ] = { 0, 1 , 8 , 9, 5 }; size_t array_size = sizeof a_list; display_array( a_list, array_size ); return EXIT_SUCCESS; } $ gcc -o func_pass_array3 -std=c99 -pedantic func_pass_array3.c $ ./func_pass_array3 list[0]=0 list[1]=1 list[2]=8 list[3]=9 list[4]=5 list[5]=0 list[6]=0 list[7]=0 list[8]=0 list[9]=0
If we change void display_array(int list[], size_t array_size) to void display_array(int *list, size_t array_size), we get an equivalent program as shown below:
$ cat func_pass_array4.c #include #include #define LEN 10 void display_array( int *list, size_t array_size) { int i; int len; if ( list == NULL ) return; len = array_size / sizeof list[ 0 ]; for (i = 0; i < len; i++ ) printf(“list[%d]=%d\n”, i, list[i]); } int main(void) { int a_list[ LEN ] = { 0, 1 , 8 , 9, 5 }; size_t array_size = sizeof a_list; display_array( a_list, array_size ); return EXIT_SUCCESS; }
In the following example, we sort an array passed to a function. Since arrays are turned into pointers, the array passed to the function sort_array() will be modified (call-byreference): $ cat func_pass_array5.c #include #include /* FUNCTION: sort_array PARAMETERS: - list[]: array to sort - arrays_size: size of the array TASKS: sort the array of int passed as argument. Bubble algorithm RETURN: void
*/ void sort_array( int list[], size_t array_size ) { int i, j, swap_val; int len; if ( list == NULL ) return; len = array_size / sizeof list[0]; for ( i = len - 1; i > 0; i— ) for ( j = 1; j 0; i— ) for ( j = 1; j len = 0; } else { int len = strlen(s); ptr_str->s = malloc( len + 1 ); /* + 1 for the null character */ if ( ptr_str->s == NULL ) { printf(“Cannot allocate memory\n”); free( ptr_str );
return NULL; } else { strcpy(ptr_str->s, s); ptr_str->len = len; } } ptr_str->show = show_string; return ptr_str; }
The main() function is given below: int main(void) { string *ptr_str = new_string(“Example of high-level object”); ptr_str->show(); }
The complete program is shown below: $ cat function_pointer7.c #include #include #include typedef struct string string; struct string { char *s; int len; void (*show)(string *); }; void show_string(string *ptr_str) { if ( ptr_str == NULL ) return ; printf(“%s\n”, ptr_str->s); } string *new_string(char *s) { string *ptr_str = malloc( sizeof *ptr_str );
if ( ptr_str == NULL ) { printf(“Cannot allocate memory\n”); return NULL; } if ( s == NULL ) { ptr_str->s = NULL; ptr_str->len = 0; } else { int len = strlen(s); ptr_str->s = malloc( len + 1 ); /* + 1 for the \0 character */ if ( ptr_str->s == NULL ) { printf(“Cannot allocate memory\n”); free( ptr_str ); return NULL; } else { strcpy(ptr_str->s, s); ptr_str->len = len; } } ptr_str->show = show_string; return ptr_str; } int main(void) { string *ptr_str = new_string(“Example of high-level object”); ptr_str->show(ptr_str); } $ gcc -o function_pointer7 -std=c99 -pedantic function_pointer7.c $ ./function_pointer7 Example of high-level object
VII.23 functions and void * VII.24 Parameters declared as void * Function parameters can be declared as void *. Within the function, if the pointers declared as void * are accessed, you have to cast them to the appropriate type. In the following
example, the function display_num() prints the elements of an array passed as an argument. The array can have elements of type int or float. $ cat func_void.c #include #include enum type_list { INT, FLOAT }; /* Function display_num() displays the numbers stored in the array list_num - type is INT or FLOAT. Indicates the type of objects stored in list_num - size is the size of the array list_num */ void display_num(void *list_num, int type, size_t size) { int *p1; float *p2; int i, nb_elt; switch ( type ) { case INT: p1 = list_num; nb_elt = size / sizeof *p1; for ( i = 0; i < nb_elt; i++ ) printf(“list_num[%d]=%d \n”, i, p1[i] ); break; case FLOAT: p2 = list_num; nb_elt = size / sizeof *p2; for ( i = 0; i < nb_elt; i++ ) printf(“list_num[%d]=%f \n”, i, p2[i] ); break; default: printf(“Type %d not supported\n”, type ); } } int main(void) { int a1[5] = {1, 2, 3, 4, 5};
float a2[4] = {1.1, 1.2, 3.3, 4.8}; display_num( a1, INT, sizeof a1 ); printf(“\n”); display_num( a2, FLOAT, sizeof a2 ); return EXIT_SUCCESS; } $ gcc -o func_void -std=c99 -pedantic func_void.c $ ./func_void list_num[0]=1 list_num[1]=2 list_num[2]=3 list_num[3]=4 list_num[4]=5 list_num[0]=1.100000 list_num[1]=1.200000 list_num[2]=3.300000 list_num[3]=4.800000
VII.24.1 Function pointers and object pointers Consider the following piece of code: $ cat func_obj_ptr1.c #include #include float f(void) { return 3.14; } int main(void) { float (*ptr1)(void) = f; printf(“%f\n”, ptr1()); return EXIT_SUCCESS; } $ gcc -o func_obj_ptr1 -std=c99 -pedantic func_obj_ptr1.c $ ./func_obj_ptr1 3.140000
Now, what do you think about the following example? $ cat func_obj_ptr2.c #include #include float f(void) { return 3.14; } int main(void) { void *ptr3 = f; printf(“%f\n”, ptr3()); return EXIT_SUCCESS; } $ gcc -o func_obj_ptr2 -std=c99 -pedantic func_obj_ptr2.c func_obj_ptr2.c: In function ‘main’: func_obj_ptr2.c:9:16: warning: ISO C forbids initialization between function pointer and ‘void *’ func_obj_ptr2.c:11:22: error: called object ‘ptr3’ is not a function
Such a code is not compliant with the C standard, and then not portable: ptr2 is a pointer to an object of type float not a pointer to a function. Such a code may work on some systems but the C standard does say such a conversion is allowed: it talks about conversions between object pointers, conversions between function pointers but does not describe the conversions between function pointers and object pointers.
The compiler explains why the code is not compliant. Though it is tempting to assign a function pointer to a pointer to void, and may make sense and work on some systems, it must not be done if you wish to write portable programs. The rationale is a function pointer may have a representation different from a pointer to an object.
VII.25 Side effects A side effect changes something within the program or in the computer. When a function writes data to a file, it has a side effect: the environment of the computer is changed. For a programmer, side effects to watch out for are changes within the program. When an object is altered, there is a side effect. For example, the assignment operations have side effects: they modify the value of objects. For example, the expressions x = 1 and x++ have side effects. A function that alters objects with static storage duration or interacts with other elements
of the computer (such as files) has side effects. When you call such a function, the state of the program has changed. A function is said to be pure when it has not side effects. Side effects are usual but you have to watch out for them in some circumstances: o Within an expression, you should avoid modifying a variable if it is also be accessed. For example, x[i] = i++ has an undefined behavior because depending on the compiler, the variable i may be altered by the postfix operator (i++) before or after the subscript of the array x is accessed. Thus, if the variable i holds the value 0, both evaluations can be performed depending on the compiler: ▪ x[0]=0 and i assigned the value 1 ▪ x[1]=0 and i assigned the value 1 Do not alter and access an object within the same expression: it leads to an undefined behavior. o Calling a function having expressions with side effects as arguments. If you call the function f() like f(++x, x = 4), you cannot guess the evaluation order of the arguments since this is not specified by the C standard: the compiler is allowed to evaluate the arguments in any order. Of course, this must be avoided. Functions are expected to have an expected behavior whatever the order of evaluation of the arguments. Here is an example of function call that must be avoided: $ cat function_side_effects.c #include #include void f(int a, int b) { printf(“a=%d b=%d\n”,a ,b); } int main(void) { int x = 10; f( ++x, x = 4 ); f( x = 4, ++x ); return EXIT_SUCCESS; } $ gcc -o function_side_effects -std=c99 -pedantic function_side_effects.c $ ./function_side_effects
a=5 b=5 a=4 b=4
The gcc compiler has the option –Wall that warns you: $ gcc -o function_side_effects -std=c99 -Wall -pedantic function_side_effects.c function_side_effects.c: In function ‘main’: function_side_effects.c:10:14: warning: operation on ‘x’ may be undefined function_side_effects.c:11:14: warning: operation on ‘x’ may be undefined
VII.26 Compound statements A compound statement is just block. That is, a set of statement enclosed between parentheses. A loop body is a compound statement, a function body is a compound statement… You can also use a compound statement anywhere within a function as in the following example: $ cat function_compound_statement.c #include #include int main(void) { int x = 10; int y = 20; printf(“x=%d, y=%d\n”, x, y); /* swap x and y */ { int c = x; x = y; y = c; } printf(“x=%d, y=%d\n”, x, y); return EXIT_SUCCESS; }
The variable c within the compound statement is local (block scope): it is visible only
within that block. Inside the block, the variables x and y are swapped.
VII.27 Inline functions and macros VII.27.1 Preprocessor Before talking about macros, we have to introduce the C preprocessor (describe in Chapter XIII). The compiler is actually composed of several tools invoked implicitly in sequence: the preprocessor is one of them. It is called before actually compiling a C program. A preprocessor has its own “language” composed of directives telling it what to do. A directive starts with the symbol # followed by a keyword. For example, the #include “myfile” directive includes the file myfile.
VII.27.2 Macros VII.27.2.1 Defining macros The second most relevant directive of the C preprocessor is #define that creates a macro. It has two forms. Let us start with the simplest syntax: #define macro_name rep_text
Where o macro_name is the identifier of the macro composed of letters, digits and underscores (starting with a letter or an underscore). By convention, a macro name is written in capital letters indicating it is a macro (it is permitted to use lower-case letters to define your macros). o rep_text is a series of characters. When the preprocessor reads the input file, it replaces the string of characters macro_name with the replacement text rep_text. It is used to define real constants. It is visible within the file in which it is defined after its definition. Traditionally, so that they could be seen throughout the whole source file, they are defined after including the header files (with #include). There are several predefined macros. For example, in the header file stdlib.h, the macros EXIT_SUCCESS and EXIT_FAILURE are defined as follows: #define EXIT_FAILURE 1 #define EXIT_SUCCESS 0
Another predefined macro is NULL: #define NULL 0
In the following example, we define the macro MAX_LEN: $ cat cpp1.c #include #include #define MAX_LEN 10 int main(void) { printf(“MAX_LEN=%d\n”, MAX_LEN); return EXIT_SUCCESS; } $ gcc -o cpp1 -std=c99 -pedantic cpp1.c $ ./cpp1 MAX_LEN=10
Compilers allow you to invoke the preprocessor alone. With gcc, the –E option invokes the preprocessor only: $ gcc -E cpp1.c … int main(void) { printf(“MAX_LEN=%d\n”, 10); return 0; }
For your macro, you can use any replacement text you wish as shown below: $ cat cpp2.c #include #include #define MSG “Hello world” int main(void) { printf(“MSG=%s\n”, MSG); return EXIT_SUCCESS; } $ gcc -o cpp2 -std=c99 -pedantic cpp2.c $ ./cpp2 MSG=Hello world
If we invoke the preprocessor alone, we get this: $ gcc -E cpp2.c … int main(void) { printf(“MSG=%s\n”, “Hello world”); return 0; }
Watch out for the replacement text: $ cat cpp3.c #include #include #define MSG “Hello world”, “This is a macro” int main(void) { printf(“MSG=%s. %s\n”, MSG); return EXIT_SUCCESS; } $ gcc -o cpp3 -std=c99 -pedantic cpp3.c $ ./cpp3 MSG=Hello world. This is a macro
Since the macro is replaced by its replacement text as it is written, it could be wise to use parentheses in some circumstances. The following example does not work as expected, guess why: $ cat cpp4.c 1 #include 2 #include 3 4 #define MAX_LEN 10 5 #define STRING_SIZE MAX_LEN + 1 6 7 int main(void) { 9 int new_size = STRING_SIZE * 2; 10 printf(“STRING_SIZE=%d\n”, STRING_SIZE); 11 printf(“new_size=%d\n”, new_size); 12
13 return EXIT_SUCCESS; 14 } $ gcc -o cpp4 -std=c99 -pedantic cpp4.c $ ./cpp4 STRING_SIZE=11 new_size=12
Explanation: o Line 4: we define the macro MAX_LEN as the constant integer 10. o Line 5: we define the macro STRING_SIZE as MAX_LEN + 1, namely 10 + 1. o Line 9: the preprocessor will replace the statement int new_size = STRING_SIZE * 2 by int new_size = 10 + 1 * 2. That is, the variable new_size will hold the value 12. o Line 10: the statement printf(“STRING_SIZE=%d\n”, STRING_SIZE) will be replaced by printf(“STRING_SIZE=%d\n”, 10 + 1), which will output the text STRING_SIZE=11 after the evaluation of the expression 10 + 1. o Line 11: the statement printf(“new_size=%d\n”, new_size) will output the text new_size=12. Now, if we surround the replacement text by parentheses, we will get the expected behavior: $ cat cpp5.c #include #include #define MAX_LEN 10 #define STRING_SIZE (MAX_LEN + 1) int main(void) { int new_size = STRING_SIZE * 2; printf(“STRING_SIZE=%d\n”, STRING_SIZE); printf(“new_size=%d\n”, new_size); return EXIT_SUCCESS; } $ gcc -o cpp5 -std=c99 -pedantic cpp5.c $ ./cpp5 STRING_SIZE=11 new_size=22
The preprocessor replaced the statement int new_size = STRING_SIZE * 2 by int new_size = (MAX_LEN + 1) * 2. Thus, new_size was assigned the expected value 22.
The second form allows imitating functions: #define macro_name(param_list) rep_text
Under this form, you can pass arguments param_list to the macro imitating a function. The arguments can then be used in the replacement text rep_text. Param_list is a list of identifiers separated by commas. Do not insert blanks (spaces or tabs) between the macro name and the left parenthesis. Otherwise, you define a macro using the first form described earlier. For example: $ cat cpp6.c #include #include #define MAX(a , b) ( (a) > (b) ? (a) : (b) ) int main(void) { printf(“max(2, 4)=%d\n”, MAX(2,4)); printf(“max(1+1 , 2+2)=%d\n”, MAX(1+1,2+2)); printf(“max(1+1,2+2)*2=%d\n”, MAX(1+1,2+2) * 2); return EXIT_SUCCESS; } $ gcc -o cpp6 -std=c99 -pedantic cpp6.c $ ./cpp6 max(2,4)=4 max(1+1,2+2)=4 max(1+1,2+2)*2=8
The preprocessor replaces: o MAX(2,4) by ( (2) > (4) ? (2) : (4) ) o MAX(1+1,2+2) by ( (1+1) > (2+2) ? (1+1) : (2+2) ) o MAX(1+1,2+2)*2 by ( (1+1) > (2+2) ? (1+1) : (2+2) ) * 2 For the reasons already explained, do not forget the parentheses. In the following example, we have forgotten, purposely, the parentheses: $ cat cpp7.c
#include #include #define MAX(a , b) a > b ? a : b int main(void) { printf(“max(2,4)=%d\n”, MAX(2,4)); printf(“max(1+1,2+2)=%d\n”, MAX(1+1,2+2)); printf(“max(1+1,2+2)*2=%d\n”, MAX(1+1,2+2) * 2); return EXIT_SUCCESS; } $ gcc -o cpp7 -std=c99 -pedantic cpp7.c $ ./cpp7 max(2,4)=4 max(1+1,2+2)=4 max(1+1,2+2)*2=6
It is easy to use macros, and it is easy to write a wrong macro as well. Our macro works as a function but it is not the case: o There is no call. A macro is just replaced by its code. o The parameters are not check unlike functions. o In functions, parameters are first evaluated before the call. In macros, the parameters are not evaluated at all. o A function returns a value. A macro is subject to substitutions. Therefore, finding a bug evolving a macro may turn out to be very tricky. For all those reasons, macros are often considered dangerous. Test them conscientiously. Do not use complex macros: the code of your macros should be small and simple. If you pass expressions with side effects to your macro, you may face trouble. The major issue caused by macros is its arguments are not evaluated. In the following example, we create a function abs() and a macro ABS. Compare their output: $ cat cpp8.c #include #include #define ABS(a) ( (a) < 0 ? -(a) : (a) )
int abs(int a) { if (a < 0) return -a; else return a; } int main(void) { int p; p = 1; printf(“abs(p++)=%d\n”, abs(p++)); printf(“p=%d\n”, p); p = 1; printf(“\nABS(p++)=%d\n”, ABS(p++)); printf(“p=%d\n”, p); return EXIT_SUCCESS; } $ gcc -o cpp8 -std=c99 -pedantic cpp8.c $ ./cpp8 abs(p++)=1 p=2 ABS(p++)=2 p=3
The macro ABS did not produce the right value. If you place the # symbol before a parameter in the replacement text, it will be surrounded by double-quotes. In the following example, the macro LITERAL2STRING turns literals to string: $ cat cpp9.c #include #include #define LITERAL2STRING(x) #x int main(void) {
printf(“%s\n”, LITERAL2STRING(10)); return EXIT_SUCCESS; } $ gcc -o cpp9 -std=c99 -pedantic cpp9.c $ ./cpp9 10
Another feature of macros is the concatenation of the arguments by using the symbol ##: $ cat cpp10.c #include #include #define CONCAT(a, b) a ## b int main(void) { int p = 10; int q = 20; int pq = 30; printf(“%d\n”, CONCAT( p, q ) ); return EXIT_SUCCESS; } $ gcc -o cpp10 -std=c99 -pedantic cpp10.c $ ./cpp10 30
The macro CONCAT(p, q) is replaced by pq. To finish with macros, it is worth noting you can pass a variable number of arguments to a macro as shown below. $ cat cpp11.c include #include #define PRINT(fmt,…) printf(“VALUES: ” fmt “\n”, __VA_ARGS__ ); int main(void) { int x = 10; int y = 20;
PRINT(“%d, %d”, x, y) ; return EXIT_SUCCESS; } $ gcc -o cpp11 -std=c99 -pedantic cpp11.c $ ./cpp11 VALUES: 10, 20
The ellipsis as parameter (…) indicates a variable number of arguments. Within the replacement text of the macro, the arguments will replace the keyword __VA_ARGS__. VII.27.2.2 Removing macros It happens that programmers need to remove macros. This can be done thanks to the directive #undef: #undef macro_name
If macro_name does not exist, the directive is just ignored.
VII.27.3 Inline functions To overcome the issues caused by macros, as of the C99 standard, inline functions can be used. An inline function is a function whose calls are replaced by its body by the compiler (not by the preprocessor). The goal is to make the execution of function faster. The inline specifier introduces an inline function. The following example defines an inline function called add(): $ cat function_inline1.c #include #include static inline double abs_val(double a) { return a < 0 ? -a : a ; } int main(void) { int p; printf(“abs_val(-10)=%f\n”, abs_val(-10) ); p = 1; printf(“abs_val(p++)=%f\n”, abs_val(p++));
printf(“p=%d\n”, p); return EXIT_SUCCESS; } $ gcc -o function_inline1 -std=c99 -pedantic function_inline1.c $ ./function_inline1 abs_val(-10)=10.000000 abs_val(p++)=1.000000 p=2
It is worthwhile noting the specifier inline gives just an indication to the compiler. It does not guarantee the compiler will optimize the calls. Therefore, you cannot guess if a function will actually be inlined or not. According to the C99 standard, an inline function just tells the compiler to make the call of the function as fast as possible. That’s all. As a consequence, a compiler may omit the inline specifier or perform optimization. How optimization is actually performed is not specified by the standard. Technically, compilers replace the function calls by the body of the function. Inline functions are similar to macros but they differ in several manners: o Inline functions are processed by the compiler while macros are processed by the preprocessor o Inline functions are real functions: the arguments are checked and there may be a return value. The arguments are evaluated before they are passed to functions. o Macros are not functions but a substitution of text. They have no prototypes, and then arguments cannot be checked. They do not return a value. The arguments are not evaluated before being passed to the macro. Inline functions may be faster than traditional functions but they lead to bigger programs. If an inline function is called one hundred times, its code will be copied one hundred times! This infers that the body of inline functions should be small. You may have noticed we used the specifier static making the function visible only inside the file in which it is defined. We will say more about inline functions in the next chapter…
VII.28 Variable number of parameters The C language has an interesting feature that allows creating functions with a variable
number of parameters such as the printf() function: they are called sometimes variadic functions. A function with a variable number of parameters is composed of a number of fixed parameters followed by ellipses denoting a variable number of parameters. For example, a function declared as int *allocate_array(int nb_elt, …);
has one fixed parameter called nb_elt and a variable number of parameters. The function must have at least one known parameter. To define such a function, you have to include the header file stdarg.h. Three macros will be called and one special object must be declared in your program: o The object ap of type va_list will contain the known parameters and the variable list of the parameters. You can use any name but programmers often use the name ap (argument list pointer). You have to declare it first as follows: va_list ap;
o The macro va_start(ap, last_param) initializes the object ap with last_param. The second parameters of the macro last_param must be the identifier of the last parameter preceding the ellipses in the declaration of the function. o The macro va_arg(ap, type) takes from the object ap the next argument of type type. o The macro va_end() frees the allocated resources. In the following example, the function allocate_array() has one fixed parameter nb_elt (giving the number of variable parameters) and a list of variable parameters. It allocates a memory area that stores the passed arguments and returns a pointer to that object. $ cat function_var_params #include #include #include int *allocate_array(int nb_elt, …) { int i; int *array = malloc(nb_elt * sizeof *array); /* memory allocation */ if ( array == NULL ) { printf(“Cannot allocate memory”); return NULL; } va_list ap; /* ap will store variable arguments */
va_start(ap, nb_elt); /* initialiaze the object ap to the first element of the variable argument list */ for( i = 0; i < nb_elt ; i++) array[i] = va_arg(ap, int); /* retrieve and store the next passed argument */ va_end(ap); /*clean up */ return array; } int main(void) { int *int_list; int nb_item, i; nb_item = 4; int_list = allocate_array( nb_item, 10, 20, 30, 40 ); for (i=0; i < nb_item; i++) printf(“int_list[%d]=%d\n”, i, int_list[i] ); if ( int_list != NULL ) free( int_list ); return EXIT_SUCCESS; } $ gcc -o function_var_params -std=c99 -pedantic function_var_params.c $ ./function_var_params int_list[0]=10 int_list[1]=20 int_list[2]=30 int_list[3]=40
You have noticed that the parameters of a variadic function represented by the ellipsis are not declared: we do not know their types, which can lead to issues that you have to watch for. Consider the following variadic function print_float(): $ cat func_var_parms_promot1.c #include #include #include
void print_float(int nb_float, …) { int i; va_list ap; /* ap will store variable arguments */ va_start(ap, nb_float); /* initialiaze th object ap */ for( i = 0; i < nb_float ; i++) printf(“float nb %d=%f\n”, i, va_arg(ap, float) ); /* retrieve and store the next passed argument */ va_end(ap); /*clean up */ } int main(void) { int nb_float = 4, i; print_float( nb_float, 1.1, 2.2, 3.3, 4.4 ); return EXIT_SUCCESS; } $ gcc -o func_var_parms_promot1 -std=c99 -pedantic func_var_parms_promot1.c func_var_parms_promot1.c: In function ‘print_float’: func_var_parms_promot1.c:13:35: warning: ‘float’ is promoted to ‘double’ when passed through ‘…’ func_var_parms_promot1.c:13:35: note: (so you should pass ‘double’ not ‘float’ to ‘va_arg’) func_var_parms_promot1.c:13:35: note: if this code is reached, the program will abort $ ./func_var_parms_promot1 Illegal Instruction (core dumped)
The program failed. The compiler explained the causes: the type float is promoted to double. Why such a conversion occurred? In C, the default argument promotions apply to the arguments passed to a function when the parameters of the function are not declared. In variadic functions, the arguments are not declared in the function prototype (their types and numbers are unknown at declaration time), which implies they cannot be checked and converted to the appropriate types. The default argument promotion rule converts arguments of integer type smaller than int to unsigned int or int as ruled by the integer promotion (see Chapter IV Section IV.14.2) and converts arguments of type float to double. Other arguments are not converted. Therefore, arguments with type float passed to variadic functions are converted double. In our function print_float(), we dealt with the type float that is smaller than the type actually passed (double), causing the program to fail. Our function must use the type double, and then has to be rewritten as follows:
$ cat func_var_parms_promot2.c #include #include #include void print_float(int nb_float, …) { int i; va_list ap; va_start(ap, nb_float); for( i = 0; i < nb_float ; i++) printf(“float nb %d=%f\n”, i, va_arg(ap, double) ); va_end(ap); /*clean up */ } int main(void) { int nb_float = 4, i; print_float( nb_float, 1.1, 2.2, 3.3, 4.4 ); return EXIT_SUCCESS; } $ gcc -o func_var_parms_promot2 -std=c99 -pedantic func_var_parms_promot2.c $ ./func_var_parms_promot2 float nb 0=1.100000 float nb 1=2.200000 float nb 2=3.300000 float nb 3=4.400000
This explain why the function printf() does not take arguments of type float but double (type specifier %f). When you pass an argument of type float to printf(), it is converted to double.
VII.29 Some useful macros In your program, you can invoke three useful macros: o __FILE__: expands to the filename containing it. o __LINE__: expands to the line number in which it appears.
o __func__: expands to the function name containing it. It was introduced in C99. For example: $ cat function_useful_macros1.c #include #include void f(void) { printf(“File %s, function %s, line %d\n”, __FILE__, __func__, __LINE__); } int main(void) { f(); printf(“File %s, function %s, line %d\n”, __FILE__, __func__, __LINE__); return EXIT_SUCCESS; } $ gcc -o function_useful_macros1 -std=c99 -pedantic function_useful_macros1.c $ ./function_useful_macros1 File function_useful_macros1.c, function f, line 5 File function_useful_macros1.c, function main, line 11
Instead of calling each time those macros, you could create a macro that calls them as in the following example: $ cat function_useful_macros2.c #include #include #include #define PRINTERR(msg) ( disp_error((msg), __FILE__, __func__, __LINE__) ) void disp_error(const char *msg, const char *filename, const char *funcname, int line) { printf(“%s. From file %s, function %s, line %d\n”, msg, filename, funcname, line); } int main(int argc, char **argv) { float f;
if (argc < 2) { PRINTERR(“Argument missing”); return EXIT_FAILURE; } f =atof(argv[1]); if (f < 0 || f > 9 ) { PRINTERR(“Argument must range from 0 to 9”); return EXIT_FAILURE; } return EXIT_SUCCESS; } $ gcc -o function_useful_macros2 -std=c99 -pedantic function_useful_macros2.c $ ./function_useful_macros2 Argument missing. From file function_useful_macros2.c, function main, line 15 $ ./function_useful_macros2 10 Argument must range from 0 to 9. From file function_useful_macros2.c, function main, line 21
VII.30 main() function Any C program must contain one main() function that is the entry point of the program. When you launch a C program, the system will branch to the main() function that will actually start the program. You cannot compile a C program without defining the main() function.
VII.30.1 Parameters The declaration of the main() function can take two forms. In the first one, the function accepts no argument: int main(void) { … }
In its second form, it takes two parameters that are traditionally named argc and argv (you can give them any name). The parameter argc holds the number of arguments. The parameter argv is a pointer to character strings denoting the arguments themselves. int main(int argc, char **argv) { … } Or
int main(int argc, char *argv[]) { … }
Take note the parameter argc counts the program name along with its arguments. That is, if you call your program with two arguments, argc will hold the value 3. The parameter argv [54] contains the list of passed arguments: argv[0] holds the program name , argv[1] the first argument, argv[2] the second argument… The following example displays the arguments passed to the program: $ cat display_args.c #include #include int main(int argc, char **argv) { int i; printf(“Nb of arguments=%d\n”, argc); for (i = 0; i < argc; i++) printf(“argv[%i]=%s\n”, i, argv[i]); return EXIT_SUCCESS; } $ gcc -o display_args -std=c99 -pedantic display_args.c $ ./display_args Hello World Nb of arguments=3 argv[0]=./display_args argv[1]=Hello argv[2]=World
There is a third form that you may meet on UNIX systems and UNIX-like systems (such as Linux and BSD systems) depicted in the following example: $ cat display_env1.c #include #include int main(int argc, char **argv, char **envp) { char **p; for (p = envp; *p; p++ ) printf(“%s\n”, *p);
return EXIT_SUCCESS; }
The third parameter envp is a pointer to the environment variables. In this example, we just displayed the environment variables. Being not specified by the C standard or the Single UNIX Specification (SUS), this form must be avoided if you want your program to be portable. Instead, write something like this: $ cat display_env2.c #include #include #include extern char **environ; int main(int argc, char **argv) { char **p; for (p = environ; *p; p++ ) printf(“%s\n”, *p); return EXIT_SUCCESS; }
VII.30.2 Return value The main() function returns a value of type int. We could wonder why the main() function returns something that cannot be retrieved? As matter of fact, the value can be taken from the calling program. In our example below, the terminal gets the return value of the main() function: $ cat main_ret1.c int main(void) { return 10; } $ gcc -o main_ret1 -std=c99 -pedantic main_ret1.c $ ./main_ret1 $ echo $? 10 $ cat main_ret2.c int main(void) { return 20; } $ gcc -o main_ret2 -std=c99 -pedantic main_ret2.c
$ ./main_ret2 $ echo $? 20
On UNIX and UNIX-like systems, the shells (command line interfaces similar to Microsoft DOS or PowerShell) can get the return value of main(). For example, in POSIX shell, Bash, Korn shell, Bourne shell, the variable $? holds the return value of the last executed command. It is called an exit status or return code. In the following example, the program main_ret1 is called from an awk script: $ echo | nawk ‘{n=system(“./main_ret1”); printf “Exit status=%d\n”, n}’ Exit statu10
In the following example, the program main_ret1 is called from a perl script: $ perl -e ‘{$n=system(“./main_ret1”); printf “Exit status=%d\n”, $n >> 8}’ Exit status=10
VII.31 exit() function At any point of your program, you can terminate it by calling the function exit(), declared in the header file stdlib.h: void exit(int exit_status);
For example: $ cat main_ret3.c #include void f(void) { exit(30); } int main(void) { f(); return 0; } $ gcc -o main_ret3 -std=c99 -pedantic main_ret3.c $ ./main_ret3 $ echo $? 30
The parameter of the exit() function holds the return code of the program.
VII.32 Exercises Exercise 1. Write a program composed of a function that returns a pointer to an object having allocated storage duration holding a list of numbers passed as arguments (the number of elements may vary). The values of a list can be of type int or double. The program will also display the contents of the memory area allocated by the function. As an example, two lists will be used: a list of objects of type int that is 1, 2, 3, 4, 5 (5 items) and a list of objects of type double (4 items) that is 1.1, 1.2, 3.3, 4.8. That is, we would pass a list to an allocation function that would return a pointer to a memory area containing the numbers. Then, the newly allocated object will be displayed to check our allocation function. Exercise 2. Explain why the following program does not work properly and correct it: #include #include int alloc_long(int nb_elt, long *p) { p = malloc(nb_elt * sizeof *p ); printf(“Allocated at address %p\n”, p); if (p != NULL) return 1; else return 0; } int main(void) { long *list_long = NULL; int n; if ( n = alloc_long(5, list_long) ) { printf(“Allocation OK: list_long=%p\n”, list_long); } else { printf(“Allocation Not OK: list_long=%p\n”, list_long); } return EXIT_SUCCESS; }
Exercise 4. Write a function get_string1() that returns pointer to an array of 20 char. Write another function get_string2() that returns pointer to a memory area containing 20 characters. What is the difference between them. Exercise 5. Why structure with flexible array member must be created through pointers? Exercise 6. Why structure with flexible array member must not be assigned? Exercise 7. Consider the following structures: struct string1 { int nb_element; char s[256]; };
struct string2 { int nb_element; int len; // capacity. Maximum number of elements char *s; };
struct string3 { int nb_element; // capacity. Maximum number of elements int len; char s[]; };
For each structure, propose a function that duplicates it and returns it. Exercise 8. Write a macro that swap two numbers. Exercise 9. Write a function get_index() that returns an integer value incremented at each call (counting from 0). For example, the first call returns 0, the second returns 1, the third returns 2… Exercise 10. Explain why the statement ABS(get_index()) is wrong?
ABS is a macro defined as: #define ABS(x) ( (x) < 0 : -(x) : (x) )
Exercise 10. Write a macro, that we will call PRINT_VAR, that prints the value of the variable preceded by its name. For example, PRINT_VAR(“%d”, p) would produce p holds value “10”. Exercise 11. Write a function addvar() that takes a variable number of parameters and returns their sum. Exercise 12. Write a program that store in an array the functions - double add(double a, double b) that returns a+b - mult(double a, double b) that returns a*b Exercise 13. Recode the following program (seen in Chapter VII Section VII.10.2). Instead of returning a pointer to int, the function will return a pointer to an array of 10 objects of type int. $ cat function_return4.c #include #include int *f(void) { int len = 10; int *s = malloc(len * sizeof(*s) ); s[0] = 10; s[1] = 18; s[2]= 20; return s; } int main(void) { int *p; int *q; p = f(); p[0] = 200; printf (“p[0]=%d sizeof *p=%d\n”, p[0], sizeof *p);
return EXIT_SUCCESS; }
CHAPTER VIII C MODULES VIII.1 Introduction So far, our programs consisted of a single file. In this chapter, we will learn how to build a program composed of several files.
Figure VIII‑1 Simplified view of compilation steps
A program is composed of one or more files known as source files. They hold C code and preprocessor directives. The very first step of compilation is managed by the preprocessor that reads each input source file, interprets the directives it contains and generates C code to produce a translation unit that contains also C code. C statements cannot directly be executed by a machine. There must be a tool that translates C code to a language, known as machine code, that the machine can process. This is the role of a compiler.
Each translation unit becomes the input of the C compiler that then translates C code into a binary file called object file. You cannot edit an object file; it can only be used to build executables or libraries (studied later in Chapter XIII). The final step consists in merging all the object files into a single file that can be an executable or a library (in this chapter, we will talk about executables only). The utility that puts the object files together to make an executable is known as a linker (see Figure VIII‑1). Fortunately, you do not have to worry about the compilation steps, they are managed by a single tool known as a compiler driver (see Chapter XIII). The utility gcc is the compiler driver we use throughout the book. The chapter in itself brings few new concepts about the C language. Mainly, in this chapter you will learn how to share objects and functions between modules composing your program. Thus, you will learn how an identifier declared in several modules refer to the same object or function throughout the program. This chapter is also an opportunity to clarify some tricky notions and review some important concepts we studied earlier by putting what we have learned together.
VIII.2 Overview Let us start with a single source file that we will split into several source files: $ cat main.c #include #include float avg(float x, float y) { return ( (x + y)/2 ); } float square(float x) { return ( x * x ); } int main(void) { float z = 1.2; float w = 3.4; printf(“avg(%g,%g)=%g\n”, z, w, avg(z,w));
return EXIT_SUCCESS; }
Now, we would like to create another source files that will contain our mathematical functions. Let’s call it calc.c: $ cat calc.c float avg(float x, float y) { return ( (x + y)/2 ); } float square(float x) { return ( x * x ); }
In our main source file, we will then have something like: $ cat main.c #include #include int main(void) { float z = 1.2; float w = 3.4; printf(“avg(%g,%g)=%g\n”, z, w, avg(z,w)); return EXIT_SUCCESS; }
Our code, expressed like this is incomplete because in our main() function, we invoke the [55] avg() function while there is no declaration of it . This means the compiler could not check the arguments we would pass to the function avg(). So, let us provide the declaration of the avg() function in the main.c file: $ cat main.c #include #include float avg(float, float); int main(void) { float z = 1.2;
float w = 3.4; printf(“avg(%g,%g)=%g\n”, z, w, avg(z,w)); return EXIT_SUCCESS; }
The next step consists in generating object files. This can be accomplished by gcc with the option –c: $ gcc -c main.c $ gcc -c calc.c $ ls calc.c calc.o main.c main.o -std=c99 -pedantic
The object files main.o and calc.o have been produced. Next, we invoke the linker to produce an executable. This can be done with the option –o: $ gcc -o disp_avg1 main.o calc.o
We called our executable disp_avg1. The name following the –o option is the name of the executable. Finally, we can run our executable: $ ./disp_avg1 avg(1.2,3.4)=2.3
Take note the object files and source files are not removed: $ ls calc.c calc.o disp_avg main.c main.o
It is just as simple as that. To tell the compiler to work in C99 mode (conforming to C99 standard), specify the option -std=c99. To tell the compiler to show warnings, use the option -pedantic (and –Wall for further warnings) $ gcc -c -std=c99 -pedantic main.c $ gcc -c -std=c99 -pedantic calc.c
Once you have compiled a source file to create an object file, you do not have to recompile it unless you change something in the source file. You can use the object file for other projects. You can also provide object files to other programmers who will be able to call the functions you have coded. Your object files can be linked with other object files to build other executables. Each time a function is called, it should be declared in the file in which it is called. The problem is an object file is a binary file meant for being processed
by a machine: it contains no information about how functions should be invoked. In other words, objet files do not provide the declarations of functions. For this reason, the programmer who provides object files also provides additional files, called header files, containing the declaration of the functions. Traditionally, every source file has a corresponding header file. Suppose we wish to provide the object file calc.o to other programmers. To allow them to work with our functions defined in our object file, we will also provide the header file calc.h: $ cat calc.h float avg(float x, float y); float square(float x);
Programmers could then use our module to call our functions. To do that, they just have to link our object file with their object files and include our header file within the source files calling our functions. For example: $ cat disp_avg2.c #include #include #include “calc.h” int main(void) { float z = 1.2; float w = 3.4; printf(“avg(%g,%g)=%g\n”, z, w, avg(z,w)); return EXIT_SUCCESS; } $ gcc -c -std=c99 -pedantic disp_avg2.c $ gcc -o disp_avg2 disp_avg2.o calc.o $ ./disp_avg2 avg(1.2,3.4)=2.3
Another programmer could link it with her object files to generate her own executable: $ cat disp_square.c #include #include #include “calc.h” int main(int argc, char *argv[]) {
float x; if ( argc == 2 ) { x = atof( argv[1] ); } else { printf(“USAGE: %s x\n”, argv[0]); return EXIT_FAILURE; } printf(“%g^2=%g\n”, x, square(x)); return EXIT_SUCCESS; } $ gcc -c -std=c99 -pedantic disp_square.c $ gcc -o square disp_square.o calc.o $ ./square 4 4^2=16
Take note the calc.h header file has been included in the source file calling functions defined in the object file calc.o.
VIII.3 Writing Source Files Consider the following C program: $ cat main.c #include #include float avg(float x, float y) { return ( (x + y)/2 ); } float square(float x) { return ( x * x ); } int main(void) { float z = 1.2; float w = 3.4; printf(“avg(%g,%g)=%g”, z, w, avg(z,w));
return EXIT_SUCCESS; }
Source files are text files written in C language with the .c suffix. Your machine cannot execute it, because it does not understand the C language. It must be translated into machine code. If we call prog the executable that we wish to build, the main.c source file is compiled by gcc with the option –o as follows: $ gcc –o prog main.c $ ./prog avg(1.2,3.4)=2.3
Writing an entire C program in one file imposes various limitations: o It is very difficult for several programmers to work together on the same project o Maintaining a small source file is quite easy, but it gets tough when it contains several thousands of lines o If you wish to reuse functions in another project, you have to copy their definitions and then insert them into your source files. It is prone to errors and therefore does not constitute a good way to manage a project. For this reason, programmers prefer modular programming: C code is split into several files called modules. This approach provides the following benefits: o Source files can be developed and tested separately. This allows several programmers to work together. o It facilitates the maintenance, which means programmers can easily alter and test their programs. o Modules can be reused. o It allows separate compilation. o It provides a better design for building programs: encapsulation techniques can be used.
VIII.3.1 Modules Programmers break large programs into several units more maintainable called source files (with the .c extension). Related functions are put into the same source file. Functions and objects can be visible within a source file or shared. To enable the compiler to check if shared objects and functions are correctly used and make the right conversions, the programmer provides an interface called header file. Remember that source files contain the code written by programmers while objet files are
generated by the compiler from source files. Both contain the same information but expressed in different languages: one understandable by human beings and the other one by the computer. Modular programming allows using object files without providing their corresponding source files. Programmers could then supply only header and object files. This means that you do not require the source files developed by someone else: to use functions or objects, you just need to be provided the object files implementing them and the header files providing the declarations. A module consists of a header file acting as an interface and an object file implementing the “services” declared by the module interface. A source module is then composed of a header file and a source file. An object module is composed of the header file and an object file generated by the compiler from the source file. Thus, an object module could be used by anyone without having to rewrite it or even compile it. For example, if you write a C source file that calls a function defined in another module that someone else has written, you simply include the header file in your source file and then specify the object module name at link stage. You do not need to know how a function is coded but only the types of the arguments that you have to pass it and the value it returns as specified in the header file. This also infers that the implementation of objects can be hidden. Programmers do not need to know how objects are actually designed, they have only access to the pieces of information in the header files: the technique is known as an encapsulation. For us, throughout the chapter, unless otherwise expressed, the word module is a synonym for file. Thus, the word module with no qualifier means object module or source module depending on the context. Now, suppose that you wish to put the avg() and square() functions in a separate file called calc.c . The source file calc.c contains the definitions of the avg() and square() functions: $ cat calc.c #include “calc.h” float avg(float x, float y) { return ( (x + y)/2 ); } float square(float x) { return ( x * x );
}
The very first line integrates the header file calc.h into calc.c to avoid any mismatches between the declarations in the header file and the definitions in the source file. The header file calc.h, contains the prototypes of the functions avg() and square() defined in calc.c: $ cat calc.h #ifndef __CALC_H__ #define __CALC_H__ extern float avg(float , float); extern float square(float); #endif /* __CALC_H__ */
By default, a function has file scope (global), and then the storage-class specifier extern can be omitted in declarations for functions: extern means the identifier is defined elsewhere. Header files end with the .h suffix by convention. They contain the declarations of functions and objects that will be seen by the modules that insert the header file. As we explained it earlier, to tell the preprocessor to include header files in a source file, C programmers put the preprocessor directive #include. To prevent header files from being included several times, programmers use the #ifndef, #define and #endif directives. Therefore, the preprocessor will only include once the header file. Header files look like this: #ifndef NAME #define NAME declarations #endif
Where NAME is a combination of letters, underscores and digits defining a macro called NAME. The preprocessor directives means: o #ifndef NAME: if the macro NAME is not defined, every directives and C declarations are processed by the C preprocessor until the #endif directive is met. o #define NAME: the macro is defined. Thus, the header file will no longer be included. This ensure the header will be included solely once. o declarations are C declarations that will be inserted in the source file including the header file o #endif terminates the #ifndef directive. You can use any identifier for the macro NAME provided it is unique. Traditionally, the name of the header file is in capital letters and surrounded by underscores.
In order to create an executable, there must be a single module defining the main() function. The system will give control of the processor to the program by calling the function main(). The main source file, containing the main() function that calls the function avg(), could be written as follows: $ cat main.c #include #include #include “calc.h” int main(void) { float z = 1.2; float w = 3.4; printf(“avg(%g,%g)=%g”, z, w, avg(z,w)); return EXIT_SUCCESS; }
This is equivalent to the following code: $ cat main.c #include external float avg(float , float); external float square(float); int main(void) { float z = 1.2; float w = 3.4; printf(“avg(%g,%g)=%g”, z, w, avg(z,w)); return EXIT_SUCCESS; }
Every identifier should be declared and defined before being used. Since the function avg() (defined in the module calc.c) is referenced in the main source file main.c, you have to provide its declaration. Instead of writing the declaration float avg(float, float) in the source file, a programmer would use the preprocessor directive #include “calc.h”. In the following example, the executable prog is built from the source files calc.c and main.c as follows: $ gcc -c -std=c99 -pedantic main.c
$ gcc -c -std=c99 -pedantic calc.c $ gcc -o prog main.o calc.o $ ./prog avg(1.2,3.4)=2.3
The utility gcc saves you time allowing you to generate a binary file directly from source files without spawning object files: $ gcc -o prog main.c calc.c $ ./prog avg(1.2,3.4)=2.3
The second method for compiling works perfectly but if you alter a source file, you have to recompile all the source files. Compiling two small source files does not take a long time, but if you have to compile a great number of source files, it may take hours. Separate compilation overcomes this issue: each source file is compiled independently so that only modified source files will be recompiled as we did in the first method.
VIII.4 Header Files In modular programming, programmers develop several source files that are compiled individually. Global identifiers of functions and variables, defined in a source file, can be referenced (accessed) in other modules as if they actually were defined in them. Header files are used in modular programming as interfaces to modules. Typically, header files contain: o Structures and unions. For example: struct string { char *s; int len; };
o Function prototype. For example: float avg(float, float);
o New user-defined data types. For example: typedef string string;
o Enumerations. enum task_status { KO, OK };
o Objects. For example:
int max_retry = 10;
o Macros (that will be expanded by the preprocessor). They start with the #define directive. For example: #define ABS(x) ( (x) > 0 ? (x):-(x) )
Thus, declarations of identifiers stored in header files are separated from their implementations (located in source files). Each source file should be accompanied with its header file. There are two kinds of header file: o Standard header files, such as stdio.h, provided by the system or the compiler [56] software . o User-defined header files Header files are inserted into source files using the #include preprocessor directive. There are two ways to include header files in source files (the way they are interpreted depends on the compiler): o A header file is surrounded by quotation marks: #include “filename”
When you compile source files containing a line with this format, the compiler will include the file called filename. The gcc compiler driver will look for filename in the directories listed below in sequential order: ▪ The current directory ▪ The directory list appearing as an argument of the –I option. ▪ default search directories (for UNIX and UNIX-like systems, it is /usr/include)
Programmers tend to use this method to include non-standard header files, because the working directory is normally searched for header files during the compilation phase. For example: ▪ #include “calc.h” ▪ #include “../include/calc.h” o The header file is enclosed between chevrons ( < and >): #include
When you compile source files containing a line with this format, the compiler will insert the file filename. The gcc compiler driver will look for filename in the directories listed below in the following order:
▪ The directory list appearing as an argument of the –I option. ▪ Default search directories (on UNIX and UNIX-like systems, the default directory is /usr/include) Programmers tend to use the latter method to include standard header files. With gcc, you can use the gcc –I option to add a directory to the list of directories that will be searched for header files: gcc –c source_file_list –Iinc_dir1 –Iinc_dir2…
Where: o source_file_list is the list of source files (with the .c suffix) separated by blanks o inc_dir1, inc_dir2… are the directories that will be searched for the header files invoked in the source files (by using #include) In the following example, the header files are located in the directory ../include: $ gcc -c main.c calc.c -I../include
VIII.5 Separate Compilation Separate compilation consists in compiling source files individually, which produces one object file per source file. In our example, we have two source files, main.c and calc.c. First, we compiled them to produce object files and then we invoked the link-editor, also called linker, (gcc -o) to combine them and generate a binary file as explained below (see Figure VIII‑1): o Step 1. Building object files: The following example builds the main.o and calc.o object files from the main.c and calc.c source files: $ gcc -c -std=c99 -pedantic main.c $ gcc -c -std=c99 -pedantic calc.c
o Step 2. Linking: After building the object modules main.o and calc.o, we tell gcc to combine them to generate the executable file called prog as follows: $ gcc -o prog main.o calc.o
Finally, we can run it: $ ./prog
Now, suppose we alter the main.c file as follows: #include #include #include “calc.h” int main(void) { float z = 5; float w = 5.2; printf(“avg(%g,%g)=%g”, z, w, avg(z,w)); return EXIT_SUCCESS; }
We just need to recompile the main.c source file and then call the link-editor to generate a new executable: $ gcc -c -std=c99 -pedantic main.c $ gcc -o prog main.o calc.o
VIII.6 Declaration, definition, initialization and prototype At the stage of the book, we are going to review some concepts that we complete in the context of modular programming. A variable is a memory location, containing a value, identified with a name called identifier. The size of the value in the computer’s memory is indicated by the type of the variable. The value of a variable is dynamic: it may change over time but its size remains unchanged. More generally, in a C program, we work with identifiers to work with objects and functions. An identifier is a series of letters, underscores and digits starting with a letter or an underscore. An object can be of C-predefined type or user-defined type, and the memory allocated for it depends on its type. It is important to make a difference between a definition and a simple declaration. A definition allocates memory for a function or an object while a simple declaration just expresses that we are going to use an identifier with a specific type or a function with a
specific prototype. A definition includes a declaration while a simple declaration supposes the definition is somewhere in a translation unit. Of course, you cannot use an identifier that is only declared: it must be defined somewhere. We will be having a long discussion about those important concepts in C.
VIII.6.1 Identifiers An identifier is a sequence of letters (lowercase or uppercase letters), underscores and digits starting with an underscore or a letter. In C, programmers do not work directly with registers and memory addresses of the computer but with identifiers. There are several kinds of identifiers: o Macro name such as #define LEN 10 o typedef name (defined with typedef) such as typedef long myinteger; o Object name such as int x; o Tag: ▪ Structure tag such as struct string; ▪ Union tag such as union int_val; ▪ Enumeration tag such as enum color { red, green, blue }; o Name of a member of an enumeration, a union or a structure such as struct string { char *s; int len };
o Label (used by the goto statement) o And function name such as double add(double x, double y);
VIII.6.2 Name spaces We recall that identifiers are grouped into four name spaces: o Identifiers for functions, macros, objects, typedef names and enumeration constants o Labels (used by the goto statement) o Identifiers for members of structures, unions, and enumerations, o Tags for structures, unions and enumerations o o Two identifiers can be identical whatever their scope if they belong to different name spaces.
VIII.6.3 C type specifiers VIII.6.3.1 Type hierarchy In this section, we will not describe C predefined types, we amply talked about them so
far. We just are going to complete what we said with some definitions you might meet in C materials. The C language types are listed in Table VIII‑1. Here is how to read it: o Type specifiers (i.e. identifier types) are composed of object types and function types. o Object types are composed of scalar types, aggregate types and union types. o Scalar types are composed of arithmetic types and pointer types. o Arithmetic types are composed of integer types and floating types. o And so on.
Table VIII‑1 C Types
Take note that an object of scalar type holds a single value while an object of type aggregate (arrays, and structures) holds several values. We finish with types by talking about derived types. In C materials, you might see this word: it just means a type built from other types. So, derived types consist of aggregate types, union types, pointer types, and function types. VIII.6.3.2 Incomplete type An object can be used only if it has a complete type so that storage can be allocated for it and its value could be interpreted. A type is said to be incomplete when its size cannot be determined. That is, some pieces of the type misses, which prevents the compiler from determining its size. According to the C standard, there are three kinds to types: object types, function types and incomplete types. A type is considered incomplete in three situations: o A structure or union that does not specify its members. o Declaring an array without specifying the number of elements it contains o void is an incomplete type. Incomplete types allow declaring identifiers that will be defined later. An incomplete type must be completed before being used. VIII.6.3.2.1 Structures and unions
In the following example, we declare the structure string without specifying its members: $ cat incomplete_struct1.c #include #include int main(void) { struct string; return EXIT_SUCCESS; }
The structure string is incomplete and then cannot be used to create objects of this type as long as its members are not defined. In the following snippet of code, we complete it before using it:
$ cat incomplete_struct2.c #include #include int main(void) { struct string; char *msg; struct string { char *s; int len; }; struct string str; return EXIT_SUCCESS; }
Once the structure string has been completed by specifying its members, its size can be computed and then objects of that type can be created but not before. In the following example, we declare the pointer p with an incomplete type: #include #include int main(void) { struct string *p; return EXIT_SUCCESS; }
In the example above, storage can be allocated to the pointer p but no object of type struct string can be allocated by malloc() until the structure be completed. If we attempt to do it, we get an error: $ cat incomplete_struct3.c #include #include int main(void) {
struct string *p; p = malloc( sizeof(struct string) ); return EXIT_SUCCESS; } $ gcc -o incomplete_struct3 -std=c99 -pedantic incomplete_struct3.c incomplete_struct3.c: In function ‘main’: incomplete_struct3:6:22: error: invalid application of ‘sizeof’ to incomplete type ‘struct string’
You have noticed you cannot declare a variable of incomplete type but you can declare a pointer to incomplete type: the compiler cannot know how many bytes it has to allocate for the variable but it can do it for a pointer because the pointer size is always known. Such a pointer is a variable referencing an object of unknown type. Things happen in the same manner for user-defined types created with typedef. In the following example, we create a new type called string but we will not be able to use it until we define the structure string: #include #include int main(void) { typedef struct string string; return EXIT_SUCCESS; }
Is it actually useful? Isn’t it easier to declare a full type? When you can, of course, you will define a full type but it is not always possible. Incomplete types are very useful since they permit to create recursive data structures. For example, this allows you to create highlevel data structures in which members can refer to a structure of the same type as the embedding structure as in the following example: struct list { char s[200]; struct list *next; struct list *prev; };
The pointers next and prev refer to a type that does not exist yet. If the C language did not permit incomplete types, you could not do such things. The C language allows declaring explicitly an incomplete structure or union type like this:
struct list;
This may appear actually a silly declaration but can be of great help in some circumstances. Imagine two structures A and B with file scope (i.e. declared outside functions) have been declared and you want to define new structures, within a block, using the same identifiers (local structures) as in the following snippet code. $ cat incomplete_struct4.c #include #include // global structure A (file scope) struct A { char s[200]; struct B *ptr_b; }; // global structure A (file scope) struct B { char s[100]; struct A *ptr_a; }; void f(void) { // local structure A (block scope) struct A { char s[20]; struct B *ptr_b; // ptr_b references the global structure B }; // local structure B (block scope) struct B { char s[10]; struct A *ptr_a; // ptr_a references the local structure A }; struct A lst_a; lst_a.ptr_b = malloc(sizeof *(lst_a.ptr_b) ); printf(“sizeof lst_a.ptr_b->s=%d\n”, sizeof lst_a.ptr_b->s ); } int main(void) {
f(); return EXIT_SUCCESS; } $ gcc -o incomplete_struct4 -std=c99 -pedantic incomplete_struct4.c $ ./incomplete_struct4 sizeof lst_a.ptr_b->s=100
As shown by the program incomplete_strcut4.c, the member ptr_b of the local structure A, declared in the function f(), points to the global structure B. That is, it points to a complete type. On declaring an incomplete structure type within the body of the function f(), the global structure B will be hidden by the local incomplete structure B as shown below: $ cat incomplete_struct5.c #include #include // global structures struct A { char s[200]; struct B *ptr_b; }; struct B { char s[100]; struct A *ptr_a; }; void f(void) { struct B ; /* new structure B having block scope Incomplete type This declaration hides the global structure B */ // new structure A having block scope struct A { char s[20]; struct B *ptr_b; // ptr_b references the local structure B }; struct B {
char s[10]; struct A *ptr_a; // ptr_a references the local structure A }; struct A lst_a; lst_a.ptr_b = malloc(sizeof *lst_a.ptr_b ); printf(“sizeof s.s=%d\n”, sizeof lst_a.ptr_b->s ); } int main(void) { f(); return EXIT_SUCCESS; } $ gcc -o incomplete_struct5 -std=c99 -pedantic incomplete_struct5.c $ ./incomplete_struct5 sizeof s.s=10
Pointers to incomplete structures and typedef name of incomplete structure type allow hiding the implementation of your types (encapsulation) as we will see it at the end of the chapter. VIII.6.3.2.2 Array
An array declared without dimension is considered incomplete. Storage will be allocated only when its size is specified somewhere with a new declaration as in the following example: $ cat incomplete_type5.c #include #include extern int list_int[]; /* incomplete type. Supposed to be completed elsewhere */ int main(void) { int j; char *s; return EXIT_SUCCESS; } $ cat incomplete_type5_ext.c int list_int[10]; /* array list_int has complete type */
In our example, the array list_int had incomplete type in the source file incomplete_type5.c. In the source file incomplete_type5_ext.c, it was fully declared. We will say more about the definition of identifiers and the keyword extern later. As far as multidimensional arrays are concerned, only the first dimension is permitted to be incomplete. The following declaration is allowed: extern int list_int[][255];
But the following is invalid: extern int list_int[][];
Why using an array of incomplete type? Suppose you had an array shared among your modules. You specify the array size only in one module; in other modules, you can just giving an incomplete declaration of the array. Thus, the array is fully declared only in one place. VIII.6.3.2.3 Void
The type specifier void can never be completed. As stated by the C standard, it is not an object type (neither a function type), which implies an object cannot be of that type. It has two different meanings when used with functions or pointers. Used with a function, it means the function returns nothing or takes no parameter. Used with a pointer (i.e. void *), it means the pointer refers to an object of a type that is not specified yet. An implicit or explicit cast will give the pointed-to object its true type. You will not have access objects pointed to by pointers to void until you dereference them with the correct object type. Here are some examples. Below, the malloc() function allocates memory and returns a void pointer that is assigned to the pointer p. The implicit cast assigns type int * to the newly created object: int *p = malloc(10*sizeof(int);
In the following example, the pointer p can point to any object: void *p;
Thinking of void as a generic type may be misleading. A programmer who wishes to create
a memory area of type void in which he would put objects of different types makes a mistake. The following example is wrong: $ cat incomplete_type6.c #include #include int main(void) { int array_size = 10; void *p= malloc(array_size * sizeof *p); p[0] = 10; p[1] = 10.10; return EXIT_SUCCESS; } $ gcc -o incomplete_type6 -std=c99 -pedantic incomplete_type6.c incomplete_type6.c: In function ‘main’: incomplete_type6.c:7:38: warning: invalid application of ‘sizeof’ to a void type incomplete_type6.c:9:4: warning: pointer of type ‘void *’ used in arithmetic incomplete_type6.c:9:4: warning: dereferencing ‘void *’ pointer incomplete_type6.c:9:3: error: invalid use of void expression incomplete_type6.c:10:4: warning: pointer of type ‘void *’ used in arithmetic incomplete_type6.c:10:4: warning: dereferencing ‘void *’ pointer incomplete_type6.c:10:3: error: invalid use of void expression
The pointer p cannot be allocated memory because sizeof(void) is not allowed. As stated earlier, void is not an object type. The sizeof operator can be used with an object type or an object. The following example shows the pointer p of type void * can refer to any object: $ cat incomplete_type7.c #include #include int main(void) { void *p; char *msg = “Hello”; int i = 10; float f = 12.4;
p = msg; printf(“%s\n”, (char *)p ); p = &i; printf(“%d\n”, *(int *)p ); p = &f; printf(“%f\n”, *(float *)p ); return EXIT_SUCCESS; } $ gcc -o incomplete_type7 -std=c99 -pedantic incomplete_type7.c $ ./incomplete_type7 Hello 10 12.400000
This shows you before getting the value of the object pointed to by a pointer to void, you have to cast it with the right object type. Unlike pointers to object types, additions and subtractions (pointer arithmetic) cannot be used with pointers to void: $ cat incomplete_type8.c #include #include int main(void) { void *p; int a[5] = {1, 2, 3, 4, 5}; p = a; printf(“%d\n”, p[0] ); return EXIT_SUCCESS; } $ gcc -o incomplete_type8 -std=c99 -pedantic incomplete_type8.c incomplete_type8.c: In function ‘main’: incomplete_type8.c:9:19: warning: pointer of type ‘void *’ used in arithmetic incomplete_type8.c:9:19: warning: dereferencing ‘void *’ pointer incomplete_type8.c:9:3: error: invalid use of void expression
If you remember what we said when we described pointers: p[j] means *(p + j *sizeof *p). Since sizeof *p means sizeof(void), you understand why it does not work. For the same reason, the following example will not work:
$ cat incomplete_type9.c #include #include int main(void) { void *p; int a[5] = {1, 2, 3, 4, 5}; p = a; p = p + 1; return EXIT_SUCCESS; } $ gcc -o incomplete_type9 -std=c99 -pedantic incomplete_type9.c incomplete_type9.c: In function ‘main’: incomplete_type9.c:9:9: warning: pointer of type ‘void *’ used in arithmetic
In summary, so that a pointer to void could be used as any pointer it must be cast with the right type as shown below: $ cat incomplete_type10.c #include #include int main(void) { void *p; int a[5] = {1, 2, 3, 4, 5}; int *q; int i; p = a; /* p points to void. Objects cannot be accessed */ q = p; /* q points to int. Objects can be accessed */ for ( i = 0; i < sizeof a / sizeof a[0]; i++ ) printf(“q[%d]=%d \n”, i, q[i] ); printf(“\n”); return EXIT_SUCCESS; } $ gcc -o incomplete_type10 -std=c99 -pedantic incomplete_type10.c $ ./incomplete_type10 q[0]=1 q[1]=2
q[2]=3 q[3]=4 q[4]=5
Here is a last example: $ cat incomplete_type11.c #include #include enum type_list { INT, FLOAT }; /* Function display_num() displays the numbers stored in the array list_num - type is INT or FLOAT. Indicates the type of objects stored in list_num - size is the size of the array list_num */ void display_num(void *list_num, int type, size_t size) { int *p1; float *p2; int i, nb_elt; switch ( type ) { case INT: p1 = list_num; nb_elt = size / sizeof *p1; for ( i = 0; i < nb_elt; i++ ) printf(“list_num[%d]=%d \n”, i, p1[i] ); break; case FLOAT: p2 = list_num; nb_elt = size / sizeof *p2; for ( i = 0; i < nb_elt; i++ ) printf(“list_num[%d]=%f \n”, i, p2[i] ); break; default: printf(“Type %d not supported\n”, type ); } }
int main(void) { int a1[5] = {1, 2, 3, 4, 5}; float a2[4] = {1.1, 1.2, 3.3, 4.8}; display_num( a1, INT, sizeof a1 ); printf(“\n”); display_num( a2, FLOAT, sizeof a2 ); return EXIT_SUCCESS; } $ gcc -o incomplete_type11 -std=c99 -pedantic incomplete_type11.c $ ./incomplete_type11 list_num[0]=1 list_num[1]=2 list_num[2]=3 list_num[3]=4 list_num[4]=5 list_num[0]=1.100000 list_num[1]=1.200000 list_num[2]=3.300000 list_num[3]=4.800000
VIII.6.4 External identifiers Identifiers declared outside functions (file scope) are also called external identifiers. External declarations are declarations placed outside functions and external [57] definitions are definitions appearing outside functions .
VIII.6.5 Functions The definition of a function is a declaration accompanied with a block (function body) containing the C code of the function. Calling a function suppose it is defined somewhere. It is nonsense to call a function defined nowhere! The called function is defined either in a module you have written (or written by someone else) or in a library (this topic will be covered later in the book). Before calling a function defined in another module, [58] programmers provide a prototype of the function in the module calling it: a declaration specifies the type of each parameter and a return type. [59] A function has, by design, file scope: it is global and then exists as long as the
program is running. File scope means defined outside functions. A function defined with [60] no storage-class specifier or with the storage-class specifier extern is shared amongst all the modules. Which means it can be seen everywhere throughout all modules composing the program. A function defined with the storage-class specifier static is visible only within the translation unit in which it is defined. VIII.6.5.1 Shared functions In our previous example, the functions avg() and square() are shared amongst all modules. We express this by preceding the declarations of the functions by the storage-class specifier extern (that can be omitted), which means the identifiers avg and square are shared between modules and defined elsewhere (in our example in calc.c): $ cat calc.h #ifndef __CALC_H__ #define __CALC_H__ extern float avg(float , float); extern float square(float); #endif /* __CALC_H__ */
For functions, the storage-class extern can be omitted; you could also write: #ifndef __CALC_H__ #define __CALC_H__ float avg(float , float); float square(float); #endif /* __CALC_H__ */
Traditionally, in header files, programmers keep the keyword extern to point out the function is shared and defined elsewhere. The definitions of the functions declared in calc.h are stored in the source files calc.c: $ cat calc.c #include “calc.h” float avg(float x, float y) { return ( (x + y)/2 ); } float square(float x) { return ( x * x ); }
Though it is not done traditionally, the extern keyword can also be used when defining a function. The above example can also be written:
#include “calc.h” extern float avg(float x, float y) { return ( (x + y)/2 ); } extern float square(float x) { return ( x * x ); }
In the main.c source file, we just have to include the header file calc.h, and call the function avg() or square(): $ cat main.c #include #include #include “calc.h” int main(void) { float z = 1.2; float w = 3.4; printf(“avg(%g,%g)=%g\n”, z, w, avg(z,w)); return EXIT_SUCCESS; }
Suppose now we define another function called sum() in the source file calc.c. Let us call the new source file calc2.c. Assume we wanted to hide this function so that it could not be used by other modules. One may think if the declaration is omitted in the header file calc2.h, the function will be hidden. This is not the case. It suffices you declare it correctly in the file calling it as shown in the following example: $ cat calc2.h #ifndef __CALC_H__ #define __CALC_H__ extern float avg(float , float); extern float square(float); #endif /* __CALC_H__ */ $ cat calc2.c #include “calc2.h”
float sum(float x, float y) { return x + y; } float avg(float x, float y) { return ( sum(x,y)/2 ); } float square(float x) { return ( x * x ); } $ gcc -c -std=c99 -pedantic calc2.c
In the source file main2.c, we declare the function sum() and we call it: $ cat main2.c #include #include #include “calc2.h” extern float sum(float, float); /* defined in calc2.o */ int main(void) { float x = 1.2; float y = 3.4; printf(“avg(%g,%g)=%g\n”, x, y, avg(x,y)); printf(“sum(%g,%g)=%g\n”, x, y, sum(x,y)); return EXIT_SUCCESS; } $ gcc -c -std=c99 -pedantic main2.c $ gcc -o prog2 main2.o calc2.o $ ./prog2 avg(1.2,3.4)=2.3 sum(1.2,3.4)=4.6
Not giving a declaration of a function does not actually hide it. To make a function unavailable outside of a module, programmers make them static. VIII.6.5.2 Static functions
C programmers can make a function “private” by using the storage-class specifier static. That is, a function, though global, can be made visible only within the source file in which it is defined. In the following example, the function sum() is static, and then is visible only within the source file calc3.c: $ cat calc3.c #include “calc3.h” static float sum(float x, float y) { return x + y; } float avg(float x, float y) { return ( sum(x,y)/2 ); } float square(float x) { return ( x * x ); }
The header file calc3.h holds only the functions we want to export (without the storageclass specifier static): $ cat calc3.h #ifndef __CALC_H__ #define __CALC_H__ extern float avg(float , float); extern float square(float); #endif /* __CALC_H__ */
In the main3.c source file, we can call the functions avg() and square() but we do not have access to the sum() function: $ cat main3.c #include #include #include “calc2.h” int main(void) { float z = 1.2; float w = 3.4; printf(“avg(%g,%g)=%g\n”, z, w, avg(z,w)); return EXIT_SUCCESS; }
$ gcc -c -std=c99 -pedantic main3.c $ gcc -c -std=c99 -pedantic calc3.c $ gcc -o prog3 main3.o calc3.o $ ./prog3 avg(1.2,3.4)=2.3
If we try to access the static function sum() in the module main4.c, even after declaring it, we get an error: $ cat main4.c #include #include #include “calc2.h” extern float sum(float, float); int main(void) { float z = 1.2; float w = 3.4; printf(“avg(%g,%g)=%g\n”, z, w, sum(z,w)); return EXIT_SUCCESS; } $ gcc -c -std=c99 -pedantic main4.c $ gcc -o prog4 main4.c calc3.c Undefined first referenced symbol in file sum /var/tmp//ccE8aiBe.o ld: fatal: symbol referencing errors. No output written to prog4 collect2: ld returned 1 exit status
VIII.6.5.3 Inline functions In this section are going to elaborate on inline functions broached in Chapter VII. According to C99, the function specifier inline is just a hint to the compiler telling it to optimize calls to functions, making them as fast as possible. The standard does not specify the nature of the optimizations but technically, the compiler replaces function calls by the body of the function. The compiler may do it or not. The inline function specifier does not change the linkage of the function (section VIII.7.4). Inline functions are different from ordinary functions. They are not used in the same way. They are supposed to have a few statements and they are subject to some constraints.
There are three ways to declare an inline function: with no storage-class specifier, with the storage-class specifier static or with the storage-class specifier extern. The easiest way to do it is to define inline functions by mentioning the storage-class specifier static (the function is said to have internal linkage) as in the following example. $ cat function_inline1.c static inline double add(double a, double b) { return a + b; } int main(void) { double x = add(4, 2.0); printf(“x=%f\n”, x); return EXIT_SUCCESS; } $ gcc -o function_inline1 -std=c99 -pedantic function_inline1.c $ ./function_inline1 x=6.000000
The inline function add() has internal linkage. That is, it is visible only within the source file function_inlin1.c. In a translation unit, you can declare functions with the function specifier inline as many times you wish but there must be solely a single definition for an inline function in each translation unit. An inline function has internal linkage if declared with the storage-class specifier static or external linkage (i.e. shared between modules) if not declared with the storage-class specifier static. An inline function has two kinds of definitions making it visible by other modules or not: inline definition and external definition. In a translation unit, a definition of a function is called an inline definition if every declaration of the function in the translation unit appears with the inline function specifier without the storage-class specifier extern. An inline definition is not an external definition. It can be viewed as a local definition. Therefore, for such a function, an inline definition is not available for other translation units and an external definition for such a function is allowed in another translation unit (i.e. you can create other definitions for that function in other modules without getting an error because of duplication of definitions). In the following example, the definition of the function add() is an inline definition. It cannot be called from other translation units: inline double add(double a, double b); /* useless declaration. Can be removed */ inline double add(double a, double b) {
return a + b; }
In the following example, the definition of the function add() is not an inline definition but an external definition (there is a declaration that specifies extern). The function can be called from other translation units: extern inline double add(double a, double b); inline double add(double a, double b) { return a + b; }
The same goes for the next example (one declaration does not mention inline): double add(double a, double b); inline double add(double a, double b) { return a + b; }
Table VIII‑2 Type of definition and linkage of inline functions
Table VIII‑2 helps you distinguish the possible cases you may meet: o There is a declaration of the function with an inline specifier without no storage class specifier à the function has an inline definition and external linkage (shared amongst modules). o There is a declaration of the function with the inline specifier with the extern storage class specifier à the function has an external definition and external linkage. o There is a declaration of the function with the inline specifier with the static storage class specifier à the function has an inline definition and internal linkage (not shared with other modules. It is visible only within the module in which it is defined). As we saw it, a function with internal linkage (declared with the static storage-class
specifier) is an inline function if declared with the inline specifier. So that an external function (i.e. declared without static) could be an inline function (otherwise, it is considered a mere function), it is subject to the following rules (things are not as simple as with a static inline function.): o Rule 1: the function has a declaration with an inline specifier, and is defined in the source file in which it is declared. o Rule 2: for each call, the compiler may choose between external and inline definitions. This implies that, if you wish to work with an inline function that has not internal linkage (i.e. you wish to share the function amongst modules), in a single source file, the inline function has external definition and external linkage while others have inline definitions of the functions. According to rule 2, one external definition should be provided. The second rule implies the identifier of an inline function with external linkage having an inline definition is visible by the linker but its definition is not sharable. That is, from the perspective of the link-editor, the identifier is declared but may appear as undefined! Now, let us view how we could share functions amongst modules and use them as inline functions. In the following example, the inline function foo() defined in the file function_line1.1.c is called as a regular function from the file function_line1.1.c. $ cat function_inline1.1.c #include #include /* External definition */ /* Definition is accessible throughout the program */ extern inline void foo(void) { printf (“foo\n”); } extern void f(void); int main(void) { f(); return EXIT_SUCCESS;
} $ cat function_inline1.2.c #include /* not inline. Simple declaration. Function defined elsewhere */ extern void foo(void); void f(void) { foo(); } $ gcc -c -std=c99 -pedantic function_inline1.1.c $ gcc -c -std=c99 -pedantic function_inline1.2.c $ gcc -o function_inline1 function_inline1.1.o function_inline1.2.o $ ./function_inline1 foo
In the source file function_inline1.2.c, the function foo() is not considered inline, we called it as an ordinary function with external linkage. The example worked because we used an external definition for the inline function foo(). If we had provided an inline definition, it would have failed: $ cat function_inline_err1.1.c #include #include /* Inline definition */ /* Definition is not visible from other modules */ inline void foo(void) { printf (“foo\n”); } extern void f(void); int main(void) { f(); return EXIT_SUCCESS; } $ cat function_inline_err1.2.c #include
extern void foo(void); void f(void) { foo(); /* used any function */ } $ gcc -c -std=c99 -pedantic function_inline_err1.1.c $ gcc -c -std=c99 -pedantic function_inline_err1.2.c $ gcc -o function_inline_err1 function_inline_err1.1.o function_inline_err1.2.o Undefined first referenced symbol in file foo function_inline_err1.2.o ld: fatal: symbol referencing errors. No output written to function_inline_err1 collect2: ld returned 1 exit status
Within a source file, if an inline function has not an inline definition (has an external definition), the function is visible within that translation unit: there is no ambiguity. Moreover, it could be visible outside (if the static keyword is not mentioned). The issue arises when inline definitions are used. In the following program, gcc chooses the external definition (rule 2): $ cat function_inline_issue1.1.c #include #include /* Inline definition */ inline void f(void){ printf(“Inline Definition for f()\n”); } int main(void){ f(); return EXIT_SUCCESS; } $ cat function_inline_issue1.2.c #include /* External definition */ extern inline void f(void){ printf(“External definition for f()\n”); }
$ gcc -c -std=c99 -pedantic function_inline_issue1.1.c $ gcc -c -std=c99 -pedantic function_inline_issue1.2.c $ gcc -o function_inline_issue1 function_inline_issue1.1.o function_inline_issue1.2.o $ ./function_inline_issue1 External definition for f()
Each compiler implements its own way to manage inline functions having inline definition and external linkage. So, either you use inline functions with inline definition and internal linkage (i.e. declared with the keyword static), with external definition or with inline definition and external linkage. In the latter case, read carefully the manual of the compiler to learn how it treats them. So, how could we work with inline functions so that our programs could be portable? We propose two simple methods: o First method. Declare static inline functions as in the following example: $ cat function_inline3.c static inline double add(double a, double b) { return a + b; } int main(void) { double x = add(4, 2.0); printf(“x=%f\n”, x); return EXIT_SUCCESS; } $ gcc -o function_inline3 -std=c99 -pedantic function_inline3.c $ ./function_inline3 x=6.000000
o Second method. Declare inline functions in header files. For each inline function, include it, and in a single source file, turns its definition into external definition by declaring the functions with the storage-specifier extern. In the other source files calling the inline functions, includes the header files only: in those source files, the definitions will be inline definitions not visible outside. Said like this, it is not easy to understand the point. Let us clarify it with a simple example. Suppose we wish to use the function add() as an inline function and we wish to share it: ▪ Create a header file holding the definition of the function: $ cat function_inline4.h #ifndef __FUNCTION_INLINE4_H__ #define __FUNCTION_INLINE4_H__
inline double add(double a, double b) { return a + b; } #endif /* __FUNCTION_INLINE4_H__ */
Putting the inline function in a header file allows including the definition of the function in the source files calling it. In source files that will include this file, the definition of the function add() will be an inline definition: the definition will not be shared, it will remain local. ▪ Create a single source file declaring the inline function add() with an external definition: $ cat function_inline4.c #include “function_inline4.h” /* In this file. Function add() has external definition */ extern inline double add(double a, double b); /* inline may be omitted */
Why creating such a source file? This source file holds the external definition of the function. The storage-specifier extern converts the definition of the inline function, placed in the header file, into an external definition. Thus, there is a single external definition of the inline function and several inline definitions in other source files. This method works whether compiler invokes an external or inline definition. ▪
In source files calling the inline function, just include the header file function_inline2.h:
$ cat function_inline4.1.c #include #include #include “function_inline4.h” /* In this file. Function add() has inline definition */ extern void f(void); int main(void) { double x, y = 4, z = 2.1; x = add(y, z); printf(“In main(): x=%f+%f=%f\n”, y, z, x); f();
return EXIT_SUCCESS; } $ cat function_inline4.2.c #include #include “function_inline4.h” /* In this file. Function add() has inline definition */ void f(void) { double t, u = 3.14, v = 1.10; t = add(u, v); printf(“In f(): t=%f+%f=%f\n”, u, v, t); } $ gcc -c -std=c99 -pedantic function_inline4.c $ gcc -c -std=c99 -pedantic function_inline4.1.c $ gcc -c -std=c99 -pedantic function_inline4.2.c $ gcc -o function_inline4 function_inline4.o function_inline4.1.o function_inline4.2.o $ ./function_inline4 In main(): x=4.000000+2.100000=6.100000 In f(): t=3.140000+1.100000=4.240000
Those source file have inline definition of the function add(). What if we did not use the object file function_line4.o? $ gcc -o function_inline4 function_inline4.1.o function_inline4.2.o Undefined first referenced symbol in file add function_inline4.1.o ld: fatal: symbol referencing errors. No output written to function_inline4 collect2: ld returned 1 exit status
The compilation failed with gcc because it searched for external definitions. Could we overcome the issue by declaring the function add() with extern in source file function_line4.1.c and function_line4.2.c? $ cat function_inline_err4.1.c #include #include #include “function_inline4.h” extern double add(double, double);
extern void f(void); int main(void) { double x, y = 4, z = 2.1; x = add(y, z); printf(“In main(): x=%f+%f=%f\n”, y, z, x); f(); return EXIT_SUCCESS; } $ cat function_inline_err4.2.c #include #include “function_inline4.h” extern double add(double, double); void f(void) { double t, u = 3.14, v = 1.10; t = add(u, v); printf(“In f(): t=%f+%f=%f\n”, u, v, t); } $ gcc -c -std=c99 -pedantic function_inline_err4.1.c $ gcc -c -std=c99 -pedantic function_inline_err4.2.c $ gcc -o function_inline_err4 function_inline_err4.1.o function_inline_err4.2.o ld: fatal: symbol ‘add’ is multiply-defined: (file function_inline_err4.1.o type=FUNC; file function_inline_err4.2.o type=FUNC); ld: fatal: file processing errors. No output written to function_inline_err4 collect2: ld returned 1 exit status
It failed again because the function add() had two external definitions. However, if we had declared the function with the storage-class specifier extern only in either source file, it would have worked… To end with inline functions, let us note it remains two constraints on an inline definition of a function with external linkage: o Modifiable variables (declared without const) declared with the storage-class specifier static are not allowed. o References to identifiers with file scope declared with the storage-class specifier static
are not allowed.
VIII.6.6 Objects VIII.6.6.1 What is an object? An object is a piece of memory allocated for storing data. An object is created when defined. That is, a definition allocates storage for an object. An object has a type determining how many bytes will be allocated for storing its value and how its bits will be interpreted. As we saw it, an object has several features defining how it can be used: o The identifier allows manipulating the object. An identifier can be the name of the object itself (given at time of the definition of the variable) or the name of a pointer referencing the object. An anonymous object (allocated by malloc(), calloc()…) is accessed through pointers: indirect access. A variable can be accessed directly through its name. o The type determines its size and how its contents will be interpreted o The value it holds. The way the value is interpreted depends on the type of the object. o Storage duration defines when it is created and destroyed. o The scope defines the places in the program where the object can be used. There are two kinds of objects: objects that are given a name (called an identifier) through declarations (i.e. variables) and unnamed objects (anonymous) created by memory allocation functions (malloc(), calloc()…). Through an identifier, you can manipulate an object directly (variables) or indirectly (pointers). In the following example, the variable i denotes an object of type int holding the value 5: int i = 5;
This definition creates a named object (i.e. variable) called i holding the value 5. The identifier i allows us to read or modify directly the value of the object of type int.
Figure VIII‑2 Objects
An object may have be accessed though several identifiers; the mechanism is known as aliasing. In the following example, the same object is access through two different pointers p and q: char *p = malloc(10); char *q = q;
The function malloc() creates an anonymous object (whose size is 10 bytes) that is accessed through the identifiers p and q (indirect access). Why is it anonymous? Because it has no
name: malloc() allocates a piece of memory and returns a pointer to it. It has not been given a name (see Figure VIII‑2) as we would do when declaring a variable. Anonymous objects are manipulated through pointers. VIII.6.6.2 Scope The portion of the C program in which an identifier is visible is known as the scope of the identifier. There are four kinds of scopes: file scope, block scope, function scope and function prototype. The scope of an identifier is determined by the point of its declaration within a file. The scope is the region of the program within which an identifier is visible.
Table VIII‑3 Scope and storage duration of identifiers
VIII.6.6.2.1 File scope: global identifiers
Identifiers declared outside functions have file scope: such identifiers are sometimes called global (or external). There are two kinds of global identifiers: “shared”
[61] [62] identifiers and static identifiers . A global identifier declared with the storage-class specifier static is visible only within the file in which it is declared. It can be viewed as “private” in contract with “shared”. A global identifier declared with no storage-class specifier or with the storage-class specifier extern is visible within all the files composing the program: it is shared among the modules. Since a function is always defined outside functions, it has file scope: it is global. Functions also can be shared or static. Let us consider the following program composed of two modules: calc3.c and main.c: $ cat calc4.c #include #include #include “calc4.h” #define ERROR_LEN 255 static int nb_calls = 0; /* static variable visible only inside that file */ char error_msg[ ERROR_LEN ]; /* shared array */ float sum(float x, float y) { nb_calls++; return x + y; } float avg(float x, float y) { nb_calls++; return ( sum(x,y)/2 ); } float square(float x) { nb_calls++; return ( x * x ); } long fact(long n) { nb_calls++; if (n < 0) { strncpy(error_msg, “ERROR in function fact(). Unexpected argument”, ERROR_LEN); return -1; } else if ( n == 0 ) { return 1; }
return n * fact( n - 1 ); } int get_nb_calls(void) { return nb_calls; }
In this module: o The four functions have file scope. There are shared among the files constituting the program. o The static variable nb_calls is visible only within that file. The static keyword applied to a global identifier limits its scope to the translation unit. o The global array error_msg is visible within all modules Both nb_calls and error_msg exist and keep their value until the program terminates. As any global identifier, they are created once and are destroyed as the program ends. As we will find it out soon, they have static storage duration. The variable nb_calls will be incremented each time a function within the module is called. The array error_msg is used to store error messages. It is declared in the header file calc4.h so that it can be used in other modules. $ cat calc4.h #ifndef __CALC_H__ #define __CALC_H__ /* Objects */ extern char error_msg[]; /* Functions */ extern float sum(float x, float y); extern float avg(float , float); extern float square(float); extern long fact(long n); extern int get_nb_calls(void); #endif /* __CALC_H__ */
In main5.c, we call the functions and display the string held in the array error_msg. $ cat main5.c #include #include
#include “calc4.h” int main(void) { int n = -1; float x = 2; long k; printf(“Nb calls: %d\n”, get_nb_calls()); if ( (k = fact(n) ) == -1 ) { printf(“Error message:%s\n”, error_msg); } else { printf(“%d!=%d\n”, n, fact(n)); } printf(“After calling fact(). Nb calls: %d\n”, get_nb_calls()); sum(2, 3); printf(“After calling sum(). Nb calls: %d\n”, get_nb_calls()); return EXIT_SUCCESS; } $ gcc -c -std=c99 -pedantic calc4.c $ gcc -c -std=c99 -pedantic main5.c $ gcc –o prog5 calc.o main5.o $ ./prog5 b calls: 0 Error message:ERROR in function fact(). Unexpected argument After calling fact(). Nb calls: 1 After calling sum(). Nb calls: 2
VIII.6.6.2.2 Block scope: local identifiers
Objects declared within a block (function body or compound statement) have block scope (local objects). They can be declared with or without the storage-class specifier auto. They are visible only within the block in which they are declared. In file main5.c, the variables n, x and k has block scope. Parameters of a function in a declaration with definition have also block scope. In file [63] calc4.c, the parameters of the functions x, y and n have block scope . VIII.6.6.2.3 Visibility and hidden objects
Within a given scope, an identifier is visible but it can be hidden by another identifier (representing another object) holding the same name but with another scope. This happens when two scopes overlap: for example, one identifier with file scope and the other with block scope, or two identifiers declared within blocks (block scope), one block embedded in the other. Two object identifiers with the same name space may have the same name if they have different scope. Consider the object o1 with the identifier ident and another object o2 also having the identifier ident. If you declare them as global or within the same block, you will get error at compile-time (same name space): this is not allowed. If you declare one as global (file scope) and the other within a block (block scope), the identifier within the block (inner scope) hides the global identifier (outer scope). If you declare an identifier within a block (outer scope) and the other within a block (inner scope) inside the previous one, the second identifier will hide the first identifier. In the following file main6.c, the local array error_msg declared in the main() function hides the global array error_msg: $ cat main6.c #include #include #include “calc4.h” int main(void) { int n = -1; float x = 2; long k; static char *error_msg = “No error”; /* hides global array error_msg declared in calc4.h */ printf(“Nb calls: %d\n”, get_nb_calls()); if ( (k = fact(n) ) == -1 ) { printf(“Error message:%s\n”, error_msg); } else { printf(“%d!=%d\n”, n, fact(n)); } printf(“After calling fact(). Nb calls: %d\n”, get_nb_calls()); sum(2, 3); printf(“After calling sum(). Nb calls: %d\n”, get_nb_calls());
return EXIT_SUCCESS; } $ gcc -c -std=c99 -pedantic main6.c $ gcc -o prog6 calc4.o main6.o $ ./prog6 Nb calls: 0 Error message:No error After calling fact(). Nb calls: 1 After calling sum(). Nb calls: 2
In the following example, the local identifier k declared in the for loop hides the global identifier k: $ cat hide1.c #include #include int k = 10; int main(void) { int i; printf(“Within for loop:\n”); for (i=0; is = ‘\0’; } else { strncpy(ptr_str->s,s, MAX_LEN); } return ptr_str; } void print_string(struct string *ptr_str) { if (ptr_str != NULL) printf(“s=%s\n”, ptr_str->s); }
The header file could be written like this: $ cat myString1.h #ifndef __MY_STRING1_H__ #define __MY_STRING1_H__ struct string * set_string(const char s[]); void print_string(struct string *ptr_str); #endif
The main file calls the functions defined in the source file myString1.c: $ cat myString_main1.c
#include #include #include #include “myString1.h” int main(void) { struct string *ptr_str = set_string(“Hello”); print_string(ptr_str); return EXIT_SUCCESS; }
Let us compile it and run it: $ gcc -c -std=c99 -pedantic myString1.c $ gcc -c -std=c99 -pedantic myString_main1.c $ gcc -o myString1 myString1.o myString_main1.o s=Hello
Look at the header file. As explained in section VIII.7.2, when included in a source file, the declaration of the function set_string() also declares an incomplete structure string. That is, our header file is equivalent to: #ifndef __MY_STRING1_H__ #define __MY_STRING1_H__ struct string; struct string * set_string(const char s[]); void print_string(struct string *ptr_str); #endif
In such conditions, when the header file is included: o In the source file myString1.c, the incomplete structure type is completed by its definition. All the declarations involving the structure string refer to the same structure type. It can be used to declare a variable since it is complete. o In the source file myString_main1.c, the structure string is an incomplete type. All the declarations involving the structure string refer to the same incomplete structure type. The structures string in the two files are different but are compatible. Now, suppose we swap the declarations of the functions in the header file: $ cat myString1.h #ifndef __MY_STRING1_H__
#define __MY_STRING1_H__ void print_string(struct string *ptr_str); struct string * set_string(const char s[]); #endif
The compiler generates an error: $ gcc -c -std=c99 -pedantic myString1.c In file included from myString1.c:4:0: myString1.h:4:26: warning: ‘struct string’ declared inside parameter list myString1.h:4:26: warning: its scope is only this definition or declaration, which is probably not what you want myString1.c:29:6: error: conflicting types for ‘print_string’ myString1.h:4:6: note: previous declaration of ‘print_string’ was here
What happened? Here again, when the header file is included in a source file, the declaration of the function print_string() declares an incomplete structure string but this time, the structure string has function prototype scope as it appears in the declaration of a parameter (see the first two warnings). Its visibility terminating at the end of the declaration of the prototype, it can never be completed and then it is treated as a new structure type different from any other structure. The declaration of the second function set_string() declares an incomplete structure string that has file scope. This incomplete structure type is completed by the definition of the structure in the file myString.c. This means, the declaration of print_string() within the header file and the source file myString.c do not refer to same structure and then are not compatible, hence the error message. To avoid issues related to automatic declaration of structures, it is then better to declare the structure string as incomplete type before declaring the functions. Finally, the header file should have been written as follows: $ cat myString1.h #ifndef __MY_STRING1_H__ #define __MY_STRING1_H__ struct string; void print_string(struct string *ptr_str); struct string * set_string(const char s[]); #endif
Whatever the order of the function declarations, the compiler will successfully compile the
program.
VIII.9.2 Compatible enumerated types There is no incomplete type for enumerated types, which implies there can be a single declaration of an enumeration in a given scope. Two enumerated types declared in two source files are compatible if they have the same tag, and the same enumeration constants with the same values. In the example below, the enumerations myBool declared in two source files are compatible: $ cat compat_enum1.c #include enum myBool { TRUE = 1, FALSE = 0 }; void show_bool(enum myBool b); int main() { enum myBool b = TRUE; show_bool(b); return EXIT_SUCCESS; } $ cat compat_enum2.c #include enum myBool { TRUE = 1, FALSE = 0 }; void show_bool(enum myBool b) { printf(“b=%d\n”, b); } $ gcc -c -std=c99 -pedantic compat_enum1.c $ gcc -c -std=c99 -pedantic compat_enum2.c $ gcc -o compat_enum compat_enum1.o compat_enum2.o $ ./compat_enum b=1
VIII.10 An example A small C program can be composed of a single source file but large programs are split into several source files. Each source file contains related functions, user-defined types…
Global identifiers that are not to be shared are declared with static. If you can, avoid using shared global variables because they make debugging trickier: it is easier to track variables when modified in a single file. For each source file, a header file is created. It holds prototypes of shared functions, shared enumerations, variables… Source files that reference them will include the right header files. Our example given at the beginning of the chapter can be split into two sources files and one header file: $ cat main.c #include #include float avg(float x, float y) { return ( (x + y)/2 ); } float square(float x) { return ( x * x ); } int main(void) { float z = 1.2; float w = 3.4; printf(“avg(%g,%g)=%g”, z, w, avg(z,w)); return EXIT_SUCCESS; }
This simple example could be broken into two source files and one header file: $ cat calc.c #include “calc.h” float avg(float x, float y) { return ( (x + y)/2 ); } float square(float x) { return ( x * x );
} $ cat calc.h #ifndef __CALC_H__ #define __CALC_H__ extern float avg(float , float); extern float square(float); #endif /* __CALC_H__ */ $ cat main.c #include #include #include “calc.h” int main(void) { float z = 1.2; float w = 3.4; printf(“avg(%g,%g)=%g”, z, w, avg(z,w)); return EXIT_SUCCESS; }
To build the executable, the most efficient way is to compile each source file separately and link the resulting object files to generate an executable: $ gcc -c calc.c $ gcc -c main.c $ gcc –o prog_calc calc.o main.o
If you modify a source file, you will compile it and link the object files to produce the new executable without compiling untouched source files. In the following example, we modify only the source file calc.c: $ cat calc.c #include “calc.h” float avg(float x, float y) { return ( (x + y)/2 ); } float square(float x) { return ( x * x ); } float abs(float x) {
if (x < 0) return –x; else return x; } $ cat calc.h #ifndef __CALC_H__ #define __CALC_H__ extern float avg(float , float); extern float square(float); extern float abs(float x) { #endif /* __CALC_H__ */ $ gcc -c calc.c $ gcc –o prog_calc calc.o main.o
VIII.11 Encapsulation As we explained it in the previous section, a program can be broken down into several files. Headers files contain shared information that will be used by other modules. As far as user-defined types and objects are concerned, programmers have two possibilities: either they provide a full visibility by showing in header files their internal representation or they hide their implementation. In the first case, any modules can manipulate directly the objects as they wish. In the second method, known as an encapsulation, they can only call the provided functions that will manipulate the objects. Maintaining a large program can turn out to be very awkward if you have a whimsical programming style. We have said earlier that using shared variables that can be modified anywhere throughout the program should be avoided as much as possible because this causes debugging to be harder. This holds true for structures and unions. Imagine you have the following structure: struct student_list { char first_name[255]; char last_name[255]; int age; struct student_list *next; }
Suppose you create objects of that type and all translation units have full access to the members. What happens if you change the definition of the structure by adding members or modifying their type? You have to review your whole program. For a small program, it is an easy task, but for large programs, it is a nightmare. To avoid such a catastrophic situation, encapsulation can help you: it allows building maintainable program by hiding the implementation of high-level objects. The idea is to group related data structures along with the functions manipulating them into a single source file and provide a header file with the prototype of the functions and the declaration of the protected data types but without showing their implementation (incomplete type). It enforces safer control of the way some objects are used by other modules. Thus, other modules will not do what is not expected with the objects. In C, encapsulation is performed through incomplete data types. Thus, the incomplete data type is protected, hence its name opaque data type. It is understood that other modules [74] will not be able to instantiate an object of an incomplete type . For this reason, pointers [75] are used : pointers to incomplete types are allowed. For example, if you wish to hide the details of the structure string, in the header file, you could create the type string as follows: typedef struct string *string;
In the header file, you will also provide functions that manipulate the opaque structure string. Other modules will only pass pointers to those functions without knowing what they really point to. Of course, a source file holding the definitions of the functions and the structures is required. In other words, the header file is an interface telling what will be done while the source file contains the definitions of the structures and functions implementing how it will be done. The header file could contain something like this: typedef struct string *string; string create_string(char *s); int delete_string(string p_str); int modify_string(string p_str, char *s); int copy_string(string p_str1, string p_str2);
Other source files will only have to include this header file and call the functions. They never have access to the internal representation of the structure string. If you change the definition of the structure, nothing changes for other modules. In this section, our goal is to provide a simple example showing the encapsulation technique. Assume you are working with another programmer, each one developing modules. For example, you could develop the module student.h/student.c, provide the header
file student.h and the object file student.o. $ cat student.h #ifndef __STUDENT_H__ #define __STUDENT_H__ typedef struct student_node *student_list; student_list new_student_list(void); int add_student(student_list p_sl, char *first_name, char *last_name, int age); void show_student_list(student_list p_sl); #endif /* __STUDENT_H__ */
Your workmate could use your module without having any idea about the way the objects of type student_list are actually built. He just has to call the functions you have provided. He cannot access the members of your objects. The structure student_node is not visible outside the source file student.c. The structure student_node, declared in the header file student.h, has an incomplete type that will be completed within the source file student.c. $ cat student_main.c #include #include #include “student.h” int main(void) { student_list p_sl1 = new_student_list(); /* create first linked list */ student_list p_sl2 = new_student_list(); /* create second linked list */ /* add students into first linked list */ add_student(p_sl1, “Christine”, “Sun”, 22); add_student(p_sl1, “Thomas”, “Brown”, 21); /* add student into second linked list */ add_student(p_sl2, “Michael”, “Smith”, 20); /* Display contents of linked lists */ printf(“List 1\n”); show_student_list(p_sl1); printf(“\nList 2\n”); show_student_list(p_sl2);
return EXIT_SUCCESS; }
If you compile the program, you get this: $ gcc -c -std=c99 -pedantic student_main.c $ gcc -o student student.o student_main.o $ ./student List 1 First Name: Christine Last Name: Sun Age: 22 First Name: Thomas Last Name: Brown Age: 21 List 2 First Name: Michael Last Name: Smith Age: 20
Figure VIII‑4 Structure student_node
Now, let us have look at the source file student.c: $ cat student.c #include #include #include #include “student.h”
/* other source files do not have access to the following structures They are hidden. */ typedef struct student *student; struct student { char *first_name; char *last_name; int age; }; /* Linked list */ struct student_node { student p_student; int nb_student; struct student_node *next; /* next node */ struct student_node *last; /* tail of the linked list */ }; /* FUNCTION new_student() PURPOSE: Allocate memory holding an object of type student, fill it with parameters PARAMETERS: - first_name: First name of the student - last_name: Last name of the student - age: age of the student RETURN: object of type student DESCRIPTION: - allocate memory for an object of type student - fill members of the newly created object with passed parameters */ static student new_student (char *first_name, char *last_name, int age) { student p_student = malloc ( sizeof *p_student ); if ( first_name == NULL || last_name == NULL || p_student == NULL ) return NULL; if ( ( p_student->first_name = malloc( strlen(first_name) + 1 ) ) == NULL ) { free(p_student); return NULL; }
if ( ( p_student->last_name = malloc( strlen(last_name) + 1 ) ) == NULL ) { free(p_student->first_name); free(p_student); return NULL; } strcpy(p_student->first_name, first_name); strcpy(p_student->last_name, last_name); p_student->age = age; return p_student; } /* FUNCTION display_student: PURPOSE: display data in object of type student p_st PARAMETERS: - p_st: display information stored in object of type student RETURN: void */ static void display_student(student p_st) { if ( p_st != NULL ) { if( p_st->first_name != NULL ) printf( “First Name: %s\n”, p_st->first_name ); if( p_st->last_name != NULL ) printf( “Last Name: %s\n”, p_st->last_name ); printf( “Age: %d\n”, p_st->age ); } } /* FUNCTION new_node() PURPOSE: Allocate a node PARAMETERS: None RETURN: returns a node that is an object of type student_list. DESCRIPTION: - Allocate memory holding an object of type student_list - set each member to a null pointer - supposed to be integrated into a linked list by another function */ static student_list new_node(void) { student_list p_node = malloc( sizeof( *p_node) );
if ( p_node == NULL ) return NULL; p_node->p_student = NULL; p_node->next = NULL; p_node->last = NULL; return p_node; } /* FUNCTION new_student_list() PURPOSE: creates a linked list that is denoted by its head PARAMETERS: void RETURN: object of type student_list. It is the very first node (head) of the linked list DESCRIPTION: allocates memory holding an object of type student_list: the head of the linked list The very first node of the linked list represents the linked list */ student_list new_student_list (void) { student_list p_sl_head = new_node(); if ( p_sl_head == NULL ) { printf(“Cannot allocate memory for student_list\n”); return NULL; } p_sl_head->last = p_sl_head; /* the head is also the tail of the linked list */ return p_sl_head; } /* FUNCTION add_student() PURPOSE: Add information about a student into linked list PARAMETERS: - p_sl: head of the linked list - first_name - last_name - age RETURN: - 0: failure - 1: successful
DESCRIPTION: - allocates memory holding an object of type student - insert information (first_name, last_name and age ) into the object of type student - create a new node if p_sl is not the head of the linked list - add the object student into the node - add the node into the linked list */ int add_student(student_list p_sl, char *first_name, char *last_name, int age) { student p_student; student_list p_node; if ( p_sl == NULL ) { printf(“Cannot add student. Nul pointer provided: line %d\n”, __LINE__); return 0; } if ( first_name == NULL ) { printf(“Cannot add student. First name not provided\n”); return 0; } if ( last_name == NULL ) { printf(“Cannot add student. Last name not provided\n”); return 0; } p_student = new_student(first_name, last_name, age); if ( p_student == NULL ) { printf(“Cannot allocate memory for new student\n”); return 0; } p_student = new_student(first_name, last_name, age); if ( p_student == NULL ) { printf(“Cannot allocate memory for new student\n”); return 0; } if ( ! p_sl->nb_student ) { /* No student => The head of list holds no student */ /* Add student into the head of the linked list */ p_sl->p_student = p_student; } else { /* Add new node */
p_node = new_node(); if ( p_node == NULL ) { printf(“Cannot allocate memory for new node in studen_list\n”); return 0; } p_node->p_student = p_student; p_sl->last->next = p_node; /* Add the node to the linked list */ p_sl->last = p_node; /* the newly created node becomes the tail */ } p_sl->nb_student++; return 1; } /* FUNCTION show_student_list() PARAMETERS: - p_sl: head of the linked list PURPOSE: show information about registred students in linked list RETURN: void */ void show_student_list(student_list p_sl) { student_list p; for (p = p_sl; p != NULL; p = p->next) { display_student(p->p_student); printf(“\n”); } } $ gcc -c -std=c99 -pedantic student.c
Now, if you decide to add members to your structures, there will be no consequences on other source files since they do not have access to internal representation of your objects. The same goes if you decide you use arrays instead pointers for the members first_name and last_name. This simple example shows it is quite easy to protect your objects and keep control on the way you want your objects to be used. This avoids bad usage of the objects and eases debugging since objects are modified in a single file.
Of course, our program is not complete, several important functions are missing: remove_student(), remove_student_list(), search_student(), modify_student(), copy_student(), copy_student_list()…We let you completing the program…
VIII.12 Exercise Exercise 1. Complete the following table:
Exercise 2. Consider the following declarations: static int x; extern int x;
int y; extern int y;
What is linkage of the variables x and y? Exercise 3. Is it equivalent to declare a global variable with or without the storage-class specifier extern? Exercise 4. What are the benefits to split a program into several modules? Exercise 5. Why using header files? Could we work without them? Exercise 6. What are the benefits of the separate compilation? Exercise 7. Why allocated memory (with malloc() for example) should be released? Exercise 8. What happens if you do not keep a pointer to a memory allocated by malloc()? Exercise 9. What are the differences between a variable and a object allocated by malloc()? Exercise 10. Describe the reasons causing the following example to fail to compile: $ cat string.h typedef struct string string string create_string(char *s); $ cat main.c int main(void) { string str = create_string(hello); }
Exercise 11. Say if the following declarations are simple declarations, definitions or tentative definitions and indicate the linkage of the identifiers.
$ cat main.c #include #include int k; extern int k; static float f = 10.1; extern float f; extern double x = 10; int main(void) { int k; static int u; extern float f; return EXIT_SUCCESS; }
Exercise 12. Why the program ex12_1.c is permitted and ex12_2.c is not? $ cat ex12_1.c #include #include int main(void) { struct string *p; struct string { char *s; int len; }; return EXIT_SUCCESS; } $ cat ex12_2.c #include #include int main(void) { struct string str;
struct string { char *s; int len; }; return EXIT_SUCCESS; }
Exercise 13. Are the following statements (appearing outside functions) equivalent? extern int list_int[]; int list_int[];
How could we complete such an array? Exercise 14. Why the following program is not correct? Correct it. #include #include int main(void) { void *p = malloc( 10 * sizeof(int) ); p[0] = 10; return EXIT_SUCCESS; }
CHAPTER IX INTERNATIONALIZATION IX.1 Locales Each language, country and culture has its own conventions. Within the same country, there may be different languages and cultures. Several cultures having a common language may have different conventions. For example, the formats for dates, monetary values, numeric values vary from country to country. To ease programming with different cultures, languages and conventions, the concept of locale was adopted. A locale is a set of conventions represented by a name allowing applications to work with different languages and cultures of countries (internationalization of applications). A C program that wishes to take into account their conventions specifies the locale. By Default, the C language uses the “C” locale. Each locale describes a set of convention related to a country, a language or a culture, a character encoding: it indicates how to interpret characters composed of several bytes (multibyte characters), how to sort characters, how to format dates, numeric values, currency quantities…
IX.2 Categories Functions, macros and types related to locales are declared in header file locale.h. The set of conventions of locales are grouped into categories. At least five categories, listed in Table IX‑1, each representing a set of rules of the selected locale, are defined by the implementation. You can set all of them to the same locale at a time by using the macro LC_ALL or alter only one of them depending on your needs. Each category defines a specific convention of a locale, and lays down a set of rules affecting some functions.
Table IX‑1 Locale categories
Additional locales may added by implementations. For example, on UNIX and UNIX-like operating systems (more generally on operating systems compliant with POSIX), the category LC_MESSAGES is used to format notification messages.
IX.3 setlocale #include char *setlocale(int category, const char *locale);
The setlocale() function sets a locale for the category specified by the first argument. The first argument is one of the macro listed in Table IX‑1 or an extra category defined by the implementation. The second parameter can be “C”, ””, or a value defined by the implementation. The locale names depend on the implementation. The name of a locale on Microsoft Windows® operating systems takes one of the following form: language_shortname language_shortname-country_shortname language language_country language_country.codepage .codepage
Some examples of locales on windows systems: o en: language: English o en-US: language: English, country: USA o en-NZ: language: English, country: New-Zealand o zh-CN: language: traditional Chinese, country: China o br-FR: language: Breton, country: France o fr-FR: language: French, country: France o fr-CH: language: French, country: Switzerland o french_France: language: French, country: France o English_United_States: language: English, country: USA o English_United_States.1252: language: English, country: USA, encoding (code page): 1252 On UNIX and UNIX-based operating systems (Linux, BSD systems), the general form of a locale is: language[_country[.encoding[@modifier]]]
Here are some examples on Oracle Solaris®: o en_US.ISO8859-15: language: English, country: USA, encoding: ISO 8859-15 o fr-FR.UTF-8: language: French, country: France, encoding: UTF-8 Some examples, on OpenSUSE (Linux system):
o en_US.iso885915: language: English, country: USA, encoding: ISO 8859-15 o fr-FR.utf8: language: French, country: France, encoding: UTF-8 o fr_LU.utf8: language: French, country: Luxembourg, encoding: UTF-8 If the function cannot set the requested locale, a null pointer is returned and the current locale remains unchanged. If the second argument is ””, the locale set in the environment of the user running the program is selected. If the second argument is a null pointer, the function returns the current locale associated with the category. The default locale is “C”. When a program is executed, the default locale “C” is automatically set for all the categories as if the function call setlocale(LC_ALL, “C”) had been used. The function setlocale() can be explicitly invoked to set a new locale for all or a single category. The following example shows the default locale associated with each category: $ cat setlocale1.c #include #include #include int main(void) { char *s; s = setlocale(LC_ALL, NULL); printf(“LC_ALL: %s\n”, s); s = setlocale(LC_COLLATE, NULL); printf(“LC_COLLATE: %s\n”, s); s = setlocale(LC_CTYPE, NULL); printf(“LC_CTYPE: %s\n”, s); s = setlocale(LC_MONETARY, NULL); printf(“LC_MONETARY: %s\n”, s); s = setlocale(LC_NUMERIC, NULL); printf(“LC_NUMERIC: %s\n”, s); return EXIT_SUCCESS; } $ gcc -o setlocale1 -std=c99 -pedantic setlocale1.c $ ./setlocale1 LC_ALL: C LC_COLLATE: C LC_CTYPE: C LC_MONETARY: C
LC_NUMERIC: C
In the following example, in a UNIX environment, we set the category LC_ALL to the locale fr_FR.UTF-8: $ export LC_ALL=fr_FR.UTF-8 $ cat setlocale2.c #include #include #include int main(void) { char *s; setlocale(LC_ALL, ””); s = setlocale(LC_ALL, NULL); printf(“LC_ALL: %s\n”, s); s = setlocale(LC_COLLATE, NULL); printf(“LC_COLLATE: %s\n”, s); s = setlocale(LC_CTYPE, NULL); printf(“LC_CTYPE: %s\n”, s); s = setlocale(LC_MONETARY, NULL); printf(“LC_MONETARY: %s\n”, s); s = setlocale(LC_NUMERIC, NULL); printf(“LC_NUMERIC: %s\n”, s); return EXIT_SUCCESS; } $ gcc -o setlocale2 -std=c99 -pedantic setlocale2.c $ ./setlocale2 LC_ALL: fr_FR.UTF-8 LC_COLLATE: fr_FR.UTF-8 LC_CTYPE: fr_FR.UTF-8 LC_MONETARY: fr_FR.UTF-8 LC_NUMERIC: fr_FR.UTF-8
The following example shows how the LC_NUMERIC category affects the printf() function: $ export LC_NUMERIC=fr_FR.UTF-8 $ cat setlocale3.c #include #include #include int main(void) { char *s;
printf(“C locale: %f\n”, 3.14159); setlocale(LC_NUMERIC, ””); printf(“locale of environment: %f\n”, 3.14159); return EXIT_SUCCESS; } $ gcc -o setlocale3 -std=c99 -pedantic setlocale3.c $ ./setlocale3 C locale: 3.141590 locale of environment: 3,141590
The available locales depend on the operating system. On UNIX and UNIX-based systems (Linux, BSD systems), within a shell, type in the following command to display the available locales on the system: $ locale -a
To show the user environment variables corresponding to the local categories, type in: $ env | grep LC_
If there is not environment variables setting the locale, the default system-wide locale is used. On Windows operating system, launch a powershell and execute the following command to get the list of locales defined within the system: PS> [globalization.cultureinfo]::GetCultures(“allCultures”)
To show the current locale for the user, type in: PS> get-culture
IX.4 localeconv() #include struct lconv *localeconv(void);
The localeconf() function returns a pointer to type struct lconv that contains the formatting information according to the current locale.
The structure lconv, defined in the header file locale.h, must contains at least the members listed in Table IX‑2. Members can be split into three groups: nonmonetary value, monetary value using the local format and monetary value using the international format.
Table IX‑2 Members of the structure lconv
The member grouping and mon_grouping are strings holding a list of integer values indicating the size of each group of digits. The first item of the string indicates the size of the first group, the second item indicates the size of the second group, and so on. An element of the string takes one of the following values: o 0: The remaining groups have the size indicated by the previous item. o CHAR_MAX: there is no further grouping. o Any other value indicates the size of the current group of the digits. For example, suppose the string contains the list of integers: 3 and 0 (i.e. “\3\0”). The first group is composed of 3 digits and the following groups are also composed of 3 digits. The members p_sign_posn, n_sign_posn, int_p_sign_posn, and int_n_sign_posn are integers taking one of the following values: o 0: Parentheses surround the monetary value and currency symbol o 1: The sign precedes the monetary value and currency symbol o 2: The sign succeeds the monetary value and currency symbol o 3: The sign immediately precedes the currency symbol. o 4: The sign immediately succeeds the currency symbol.
The members p_sep_by_space, n_sep_by_space, int_p_sep_by_space, and int_n_sep_by_space have type char. They can take one of the following values: o 0: there is no space between the monetary value and currency symbol. o 1: if the currency symbol and the sign are adjacent, a space separates them from the monetary value. Otherwise, there is a space between the currency symbol and the monetary value. o 2: if the currency symbol and the sign are adjacent, a space separates them. Otherwise, a space is inserted between the sign and the monetary value. The following example shows some values of the members the structure lconv according to the locale set in the user environment: $ cat localeconv.c #include #include #include #include int main(void) { char *s; char *current_locale; struct lconv *locale_info; current_locale = setlocale(LC_ALL, ””); printf(“Current locale=%s\n”, current_locale); locale_info = localeconv(); printf(“Decimal point:\”%s\”\n”, locale_info->decimal_point); printf(“Thousands seperator:\”%s\”\n”, locale_info->thousands_sep); char *grouping = locale_info->grouping; printf(“\nGrouping seperator for numeric values:\n”); for (int i=0; i < sizeof grouping; i++ ) { printf(“Group %d: %d\n”, i+1, grouping[i]); if ( ! grouping[i] ) break; }
char *mon_grouping = locale_info->mon_grouping; printf(“\nGrouping seperator for monetary values:\n”); for (int i=0; i < sizeof mon_grouping ; i++ ) { printf(“Group %d: %d\n”, i+1, mon_grouping[i]); if ( ! grouping[i] ) break; } printf(“\nMonetary decimal point:\”%s\”\n”, locale_info->mon_decimal_point); printf(“Monetary local thousands seperator:\”%s\”\n”, locale_info->mon_thousands_sep); printf(“Monetary positive sign:\”%s\”\n”, locale_info->positive_sign); printf(“Monetary negative sign:\”%s\”\n”, locale_info->negative_sign); printf(“Local currency symbol:\”%s\”\n”, locale_info->currency_symbol); printf(“Local nb Significant digits for fractional part for monetary value:\”%d\”\n”, locale_info->frac_digits); printf(“International currency symbol:\”%s\”\n”, locale_info->int_curr_symbol); return EXIT_SUCCESS; }
If we compile the program with gcc in a UNIX operating system (Oracle Solaris) or Linux operating system, we would get this: $ export LC_ALL=fr_FR.UTF-8 $ gcc -o localeconv1 -std=c99 -pedantic localeconv1.c $ ./localeconv1 Current locale=fr_FR.UTF-8 Decimal point:”,” Thousands seperator:” “ Grouping seperator for numeric values: Group 1: 3 Group 2: 0 Grouping seperator for monetary values: Group 1: 3 Group 2: 0 Monetary decimal point:”,” Monetary local thousands seperator:” ” Monetary positive sign:”” Monetary negative sign:”-” Local currency symbol:”€” Local nb Significant digits for fractional part for monetary value:“2”
International currency symbol:“EUR “
If we test it with the “C” locale, we would get this: $ export LC_ALL=C $ gcc -o localeconv1 -std=c99 -pedantic localeconv1.c $ ./localeconv1 Current locale=C Decimal point:”.” Thousands seperator:”” Grouping seperator for numeric values: Group 1: 0 Grouping seperator for monetary values: Group 1: 0 Monetary decimal point:”” Monetary local thousands seperator:”” Monetary positive sign:”” Monetary negative sign:”” Local currency symbol:”” Local nb Significant digits for fractional part for monetary value:“127” International currency symbol:””
IX.5 Character encodings In Chapter II Section II.6.1.3, we briefly talked about character encodings introducing some key concepts. In this chapter, we complete what we said. We have learned that we could change the current locale in order to access the appropriate conventions used by a given culture and allow functions to interpret properly multibyte characters of the extended character set of a language associated with a locale. Hence, programmers can work with characters (extended characters) other than those are defined by the basic character set (available with the “C” locale). So far, we have worked only with characters of the basic character set that fits in a single byte (char). ASCII is sufficient to denote English scripts as seven bits suffice to represent the characters of ASCII. To deal with other languages, other character sets extending ASCII, such as ISO/IEC 8859 family used by European languages, whose characters can be represented by eight bits, were developed. However, some languages, in particular Asian languages, such as Chinese, have a number of characters so large that a single byte
was not sufficient: for those languages, specific character encodings, representing a character by several bytes, were conceived. Thus, a number of character sets (and then character encodings) proliferated to accommodate the different scripts around the world. For each group of languages, character sets (and character encodings) were designed over time. In order to unify the great number of character sets and character encodings, to ease the development of applications working with different scripts, and to take into account the majority of the scripts used by computers around the world, a standard universal coded character set (UCS), also known as Unicode, was developed. It is a superset of all the coded character sets that had been conceived so far. It is now the standard used by most of the computers and applications. The Unicode standard (usually referred to as Unicode), whose the first version was published in 1991, not only provides a universal character set (UCS), but also code points, encodings, algorithms and properties allowing working with any script. The Unicode standard includes the international standard ISO/IEC 10646 that defines for each character of UCS a name, a code point, and representations for the code points. That is, Unicode has the same character set, code points and encodings as the standard ISO/IEC 10646. The Unicode consortium and International Organization for Standardization (ISO) work together to evolve the standard ISO/IEC 10646. In Unicode, every character has a unique code point denoted by U+code, where code is a hexadecimal number. For example, the character $ has the code point U+2C. The Unicode standard defines several ways to encode the code points of UCS (i.e. it proposes several character encodings). The encoding forms commonly used with the [76] Unicode standard are UTF-8 , UTF-16 and UTF-32. In UTF-8, a code point is represented by a sequence of octets (8 bits) ranging from one to four: it is a variable length encoding. In UTF-16, a code point is represented by two or four octets. In UTF-32, a code point is represented by four octets (32 bits). The first advantage of UTF-8 is its compatibility with ASCII: the ASCII characters have the same code point in UTF-8 (i.e. it represents code points of ASCII characters by one octet). That is, a program working with ASCII also works with UTF-8 with no change: the characters whose code value (code point) ranging from 0 to 127 (decimal system) are the same in ASCII and UTF-8. The second major advantage is it is not sensitive to the byte ordering as UTF-32 or UTF-16. Let us a look at UTF-8. It is simple to implement, hence it success. Initially, a code point in UTF-8 could be represented by 31-bits but as of the version released in 2003, a code point can be represented by 21 bits. In UTF-8, a code point is sequence of octets ranging from one to four. UTF-8 splits the values of code points into four groups as shown in Table IX‑3. The first group, corresponding to the ASCII encoding, encodes code values in one octet but 7 bits are used for the code points. In the second group, code points fit in two
octets: 11 bits are used for code points. And so on. It worth nothing the code points ranging from 080 to 0FF are the same in the ISO/IEC 8859-1 encodings.
Table IX‑3 UTF-8 encoding
Now, consider the character A whose code point is 65 (decimal value): o It is in the range [0000-0007F], it is in group 1. Seven bits are used to represent it. o Its binary representation is 100 0001 o Its UTF-8 representation is: 0100 0001 The character $ whose code point is 44 (decimal value): o It is in the range [0000-007F], it is in group 1. o Its binary representation is 10 1100 o Since seven bits are used to represent it: 010 1100 o Its UTF-8 representation is: 0010 1100 Now, let us consider a character from a European language fitting in two bytes. For example, the letter à whose code point is 224 (E0 in hexadecimal): o It is in the range [0080-07FF], it is in group 2. 11 bits are used to represent it. The first five binary digits of the code point are placed in the first byte, the next six binary bits of the code point are placed in the second byte o Its binary representation is 1110 0000 o Since eleven bits are used to represent it, we precede its binary representation by three
additional 0: 000 1110 0000. We could write it as 00011 100000 to ease the encoding (first byte: 5 digits, second byte: 6 digits). o Its UTF-8, the first byte starts with 110, and is followed by the five first binary digits of the code point: 1100 0011. The second byte, starting with 10, is followed by the six next binary digits of the code point: 1010 0000. The UTF-8 encoding is then 11000011 10100000: C3 A0 in hexadecimal. Let us finish with a character fitting in three bytes. For example, the symbol € (Euro) whose code point is 20AC (hexadecimal): o It is in the range [0800-FFFF], it is in the third group. 16 bits are used to represent it. The first four binary digits of the code point are placed in the first byte, the next six binary bits of the code point are placed in the second byte and the next six binary bits of the code point are placed in the third byte. o Its binary representation (14 binary digits) is 10 0000 1010 1100 o Since sixteen bits are used to represent it, we precede its binary representation by two additional 0: 0010 0000 1010 1100. We could rearrange it as 0010 000010 101100 to ease the encoding (first byte: 4 digits, second byte: 6 digits and third byte 6 digits). o Its UTF-8, the first byte, starting with 110, is followed by the four first binary digits of the code point: 1110 0010. The second byte, starting with 10, is followed by the six next binary digits of the code point: 1000 0010. The Third byte, starting with 10, is followed by the six next binary digits of the code point: 1010 1100. The UTF-8 encoding is then 11100010 10000010 10101100: E2 82 AC in hexadecimal.
Figure IX‑1 UTF-8 encoding for €
In C, a character of the basic character set is represented by one byte (char). Any other character, an extended character, may be represented by either a wide character or multibyte character. Before talking about wide characters, let us introduce a subject that has nothing to do with C programming: terminal settings. This will be of great help…You will be understanding…
IX.6 Terminal settings The environment running your program must be able to interpret the code values of the extended characters of the locale used within your program. Otherwise, you will not be able to see correctly the output of your program. The examples are executed on UNIX and [77] Linux operating systems . To get the expected output, the character encoding of the terminal must match that of used by the current locale of your program. For example, if
you work with Gnome Desktop Environment (see Figure IX‑2 on Oracle Solaris for x86), follow the follows steps: o Click On terminal o Then click on Set Characters Encoding o Select the character encoding as appropriate
Figure IX‑2 Setting character encoding for Gnome
If you work with KDE, follow the steps below (see Figure IX‑3 and Figure IX‑4 on OpenSuse operating system): o Click on settings o Click on Edit Profile o Click on tab advanced o Select the character encoding from the menu Select
Figure IX‑3 Setting character encoding for KDE: steps 1 and 2
Figure IX‑4 Setting character encoding for KDE: steps 3 and 4
IX.6.1 Wide characters A wide character is a binary representation fitting in more than one byte that can represent any character of any supported locale (that may use an extended character set). In C, it has the integer type wchar_t (defined in the header file stddef.h). In C library, there are a number of functions, such as fgetc(), that reads input and returns a character or EOF when there is no further character to read. EOF is an integer value that
does not represent a character. It has a negative value different from the integer value of any character. So that those functions could return the value EOF, they have the return type int. In the same way, functions returning a wide character do not have the return type wchar_t but wint_t that can both represent a wide character or WEOF. In summary, a wide character is represented by the type wchar_t and the type wint_t represents a wide character and a special value represented the macro WEOF. A wide string is a sequence of wide characters ending with a null wide character (whose bits are all set to 0. Its integer value is then 0). The length of a wide string is the number of wide characters preceding the null wide character. In C, wchar_t and wint_t are integer types whose definition depends on the implementation. For example, in our computer, on Oracle Solaris 11.3, with the compiler gcc, they are aliases of type long: $ cat wchar_t.c #include int main(void) { return 0; } $ gcc -E wchar_t.c | /usr/xpg4/bin/grep -E “wchar_t|wint_t” | grep typedef typedef long int wchar_t; typedef long wint_t;
On the same computer, on Unbuntu 14.0.4, with the compiler gcc, wchar_t is an alias of type int, wint_t is an alias of type unsigned int: $ gcc -E wchar_t.c | grep “wchar_t” | grep typedef typedef int wchar_t; typedef unsigned int wint_t;
On the same computer, on a Windows 7 operating system, with Microsoft Visual Studio 2015, wchar_t and wint_t are aliases of the type unsigned short: c:\Clanguage>cl /E wchar.c | find “wchar” | find “typedef” … typedef unsigned short wchar_t; c:\Clanguage>cl /E wchar.c | find “wint_t” | find “typedef” … typedef unsigned short wint_t;
We have learned that wchar_t represent a wide character. What about wide character constants? How could we print wide characters? In C, a wide character constant is preceded by the letter L. Moreover, to tell the printf() function you are passing a wide character as argument, you must use the qualifier l (ell) preceding the specifier c: %lc. In the following example, we load a locale, named en-US.UTF-8, using UTF-8 encodings to print the wide character €: $ cat wchar_character_lit.c #include #include #include #include #include int main(void) { wchar_t c = L’€’; // wide character. Same as c = L’\x20AC’ char *mylocale = “en_US.UTF-8”; if ( ! setlocale(LC_ALL, mylocale) ) { printf(“Locale %s not available\n”, mylocale); exit(EXIT_FAILURE); } printf(“In locale %s: %lc has code value %X (%d)\n”, mylocale, c, c, c); return EXIT_SUCCESS; } $ gcc -o wchar_character_lit -std=c99 -pedantic wchar_character_lit.c $ ./wchar_character_lit In locale en_US.UTF-8: € has code value 20AC (8364)
Likewise, a wide string constant is preceded by the letter L and the specifier %ls is used in printf() to print it as shown below: $ cat wchar_string_lit1.c #include #include #include #include #include int main(void) { wchar_t s[] = L”命令找不到“; char *mylocale = “zh_TW.UTF-8”; // Chinese locale
if ( ! setlocale(LC_ALL, mylocale) ) { printf(“Locale %s not available\n”, mylocale); exit(EXIT_FAILURE); } printf(“In locale %s: %ls has length %d\n”, mylocale, s, wcslen(s) ); return EXIT_SUCCESS; } $ gcc -o wchar_string_constant1 -std=c99 -pedantic wchar_string_constant1.c $ ./wchar_string_constant1 In locale zh_TW.UTF-8: 命令找不到 has length 5
You have noticed we did not use the strlen() function to get the length of a wide string but wcslen(). You may wonder how you could reproduce such an example with your keyword if you do not have a Chinese computer…The response will be given soon. The following example is the step toward the answer. It displays the code value for each wide character: $ cat wchar_string_lit2.c #include #include #include #include #include int main(void) { wchar_t s[] = L”命令找不到“; size_t len = wcslen(s); char *mylocale = “zh_TW.UTF-8”; // Chinese locale if ( ! setlocale(LC_ALL, mylocale) ) { printf(“Locale %s not available\n”, mylocale); exit(EXIT_FAILURE); } for (int i=0; i < len; i++) printf(“Character %d has code %X\n”, i, s[i] ); return EXIT_SUCCESS; } $ gcc -o wchar_string_constant2 -std=c99 -pedantic wchar_string_constant2.c $ ./wchar_string_constant2
Character 0 has code 547D Character 1 has code 4EE4 Character 2 has code 627E Character 3 has code 4E0D Character 4 has code 5230
Here is a way to display the Chinese characters from their code values: $ ./wchar_string_constant3.c #include #include #include #include #include int main(void) { // s1 and s2 are identical wchar_t s1[] = L”\x547D\x4EE4\x627E\x4E0D\x5230”; wchar_t s2[] = {L’\x547D’, L’\x4EE4’, L’\x627E’, L’\x4E0D’, L’\x5230’, ‘\0’}; size_t len = wcslen(s1); char *mylocale = “zh_TW.UTF-8”; // Chinese locale if ( ! setlocale(LC_ALL, mylocale) ) { printf(“Locale %s not available\n”, mylocale); exit(EXIT_FAILURE); } printf(“s1=%ls\n”, s1); for (int i=0; i < len; i++) printf(“Character %lc has code %X\n”, s1[i], s1[i] ); printf(“\ns2=%ls\n”, s2); for (int i=0; i < len; i++) printf(“Character %lc has code %X\n”, s2[i], s2[i] ); return EXIT_SUCCESS; } $ gcc -o ./wchar_string_constant3 -std=c99 -pedantic ./wchar_string_constant3.c $ ./wchar_string_constant3 s1=命令找不到 Character 命 has code 547D Character 令 has code 4EE4 Character 找 has code 627E
Character 不 has code 4E0D Character 到 has code 5230 s2=命令找不到 Character 命 has code 547D Character 令 has code 4EE4 Character 找 has code 627E Character 不 has code 4E0D Character 到 has code 5230
This example shows two things: o Within a wide string, you can use the code values of the wide characters to represent them as you would do with characters. o A wide string is an array of wide characters in the same way as a string is an array of characters. Basic characters can be used as wide characters and can be part of wide strings: $ ./wchar_string_constant4.c #include #include #include #include int main(void) { wchar_t s[] = L”Hello world”; // wide characters char c_wide = L’A’ ; // basic character used as wide character char c_char = ‘A’ ; setlocale(LC_ALL, ””); // use locale of the user environment printf(“%ls\n”, s ); printf(“Code value of c_char: %d\n”, c_char ); printf(“Code value of c_wide: %d\n”, c_wide ); return EXIT_SUCCESS; } $ export LC_ALL=en_US.UTF-8 $ gcc -o wchar_string_constant4 -std=c99 -pedantic wchar_string_constant4.c $ ./wchar_string_constant4 Hello world Code value of c_char: 65 Code value of c_wide: 65
The following program, compiled by Microsoft Visual Studio®, is executed on a Microsoft Windows® operating system, in PowerShell: PS> more wchar_string_windows.c #include #include #include #include #include int main(void) { wchar_t s[] = L”2500 €”; char *mylocale = “.1252”; // use character encoding 1252 if ( ! setlocale(LC_ALL, mylocale) ) { printf(“Locale %s not available\n”, mylocale); exit(EXIT_FAILURE); } printf(“In locale %s: %ls has length %d\n”, mylocale, s, wcslen(s) ); return EXIT_SUCCESS; } PS>cl wchar_string_windows.c PS>chcp 1252 Page de codes active : 1252 PS>wchar_string_windows.exe In locale .1252: 2500 € has length 6
We used the command chcp 1252 to change the code page (character encoding) to 1252 in order to print properly the character Euro €.
IX.6.2 Multibyte characters A multibyte character is a series of one or more bytes representing a character of the [78] [79] extended character set of the source or executing environment . In C, you have several functions that convert multibyte characters to wide characters and conversely. As explained earlier, multibyte characters allow encoding characters of some extended character sets that do not fit in a byte. For example, characters of Chinese cannot be represented by one byte.
Over time, several kinds of has multibyte character encodings have been developed. They can be state-dependent encodings or state-independent encodings. In a state-dependent [80] encoding (e.g. JIS encodings ), the interpretation of a sequence of bytes depends on the current conversion state that indicates how to group the bytes to form a single extended character of the extended character set of the current locale. Thus, the same sequence of bytes may be interpreted differently according to the current conversion state also called a shift state. According to the shift state, one, two or more bytes may constitute a single extended character of the character set used by the current locale. Not all byte sequences change the state and then the interpretation of the subsequent sequences of bytes but only some of them known as shift sequences. A shift sequence is a sequence of bytes (control characters) that changes the meaning of the succeeding series of bytes: it shifts the states. A multibyte string in a state-depending encoding always starts by an initial shift state telling how to interpret the first succeeding bytes until a new shift sequence, changing the initial state to a shift alternate state, is encountered. In all cases, a byte whose all bits are set to 0 is always interpreted as a null character. In a state-independent encoding, the interpretation of a sequence of bytes does not depend on the previous series of bytes. Unicode encodings are state-independent: they do not use escape sequences or shift sequences to change the meaning of the byte sequences. A multibyte character string is an ordinary character string. Thus, multibyte character strings can be processed easily with no change by programs working with ordinary strings unlike wide strings that require a specific handling. Thus, programs use multibyte characters to perform I/O requests (such as reading and writing data to files) since they can be handled with no charge. Conversely, within a program, manipulating wide characters is much easier because they are treated as a unit having always the same size. For example, the length of a wide string is the number of wide characters if contains while the length of a multibyte string is the number of bytes it holds. Thus, a multibyte character, containing a single multibyte character, might consist in three bytes (char). This implies a program dealing with international languages use both multibyte characters (I/O handling) and wide characters (string handling). For this reason, C libraries provide functions converting wide strings to multibyte strings and conversely. [81] In standard C, if a multibyte character contains a variable number of bytes , it is subject to two limits: o MB_CUR_MAX: the macro, defined in the header file stdlib.h, expands to an integer value, of type size_t, specifying the maximum number of bytes in a multibyte character of the extended character set used by the current locale (of the category LC_TYPE). o MB_LEN_MAX: the macro, defined in the header file limits.h, expands to an integer value specifying the maximum number of bytes in a multibyte character of any supported locale.
So, in a C program, an extended character may be represented by a wide character or a multibyte character. The C libraries provide functions that perform the conversion between them. Let us consider the character € (Euro). A wide character, it can be represented by type wchar_t. As multibyte character, it is represented by three bytes (expressed in hexadecimal) E2, 82 and AC in UTF-8 (see Figure IX‑1). In the following example, we display the extended character € using both the representations: $ ./multichar1.c #include #include #include #include int main(void) { wchar_t c_wide = L’€’; // wide character in any character encoding char *c_multichar = “\xE2\x82\xAC” ; // multibyte character: UTF-8 char *mylocale = “en_US.UTF-8”; // US locale using UTF-8 character encoding if ( ! setlocale(LC_ALL, mylocale) ) { printf(“Locale %s not available\n”, mylocale); exit(EXIT_FAILURE); } printf(“c_wide=%lc\n”, c_wide ); printf(“c_multichar=%s\n”, c_multichar ); return EXIT_SUCCESS; } $ gcc -o multichar1 -std=c99 -pedantic multichar1.c $ ./multichar1 c_wide=€ c_multichar=€
Now, let us consider strings containing multibyte characters: $ ./multichar2.c #include #include #include #include int main(void) { wchar_t *s_wide = L”2500 €”; // wide characters in any character encoding char *s_multichar = “2500 \xE2\x82\xAC” ; // multibyte character: UTF-8
char *mylocale = “en_US.UTF-8”; // US locale if ( ! setlocale(LC_ALL, mylocale) ) { printf(“Locale %s not available\n”, mylocale); exit(EXIT_FAILURE); } printf(“s_wide=%ls\n”, s_wide ); printf(“s_multichar=%s\n”, s_multichar ); return EXIT_SUCCESS; } $ gcc -o multichar2 -std=c99 -pedantic multichar2.c $ ./multichar2 s_wide=2500 € s_multichar=2500 €
The strings s_wide and s_mutlichar produces the same output. The first one has the special type wchar_t while the second one is an ordinary string. Now, let us compute their lengths: $ ./multichar3.c #include #include #include #include #include int main(void) { wchar_t *s_wide = L”2500 €”; // wide characters in any character encoding char *s_multichar = “2500 \xE2\x82\xAC” ; // multibyte character: UTF-8 char *mylocale = “en_US.UTF-8”; // US locale if ( ! setlocale(LC_ALL, mylocale) ) { printf(“Locale %s not available\n”, mylocale); exit(EXIT_FAILURE); } printf(“length of s_wide=%d\n”, wcslen(s_wide) ); printf(“length of s_multichar=%d\n”, strlen(s_multichar) ); return EXIT_SUCCESS; } $ gcc -o multichar3 -std=c99 -pedantic multichar3.c
$ ./multichar3 length of s_wide=6 length of s_multichar=8
The string s_wide has the expected length but the string s_multichar has a larger length. As ordinary string, all the characters of the string s_multichar are counted. To get the expected result, we have to convert the string s_multichar containing multibyte characters to a wide string and then count the number of wide characters it holds. To do this, we can invoke the function mbstowcs(). It has the following prototype: Until C95: #include size_t mbstowcs(wchar_t *ws, const char *mbs, size_t n);
As of C99: #include size_t mbstowcs(wchar_t * restrict ws, const char * restrict mbs, size_t n);
The function converts the string containing multibyte characters pointed to by mbs to a wide string and places it in the memory block pointed to by ws. At most n wide characters will be copied to ws. It returns the number of wide characters copied to ws unless an invalid multibyte character (multibyte character not defined by the character encoding used) is encountered, in which case it returns the value (size_t)-1. If ws is a null pointer, the function returns only the number of wide characters resulting from the conversion (actual size of the string) as shown below: $ ./multichar4.c #include #include #include #include #include int main(void) { wchar_t *s_wide = L”2500 €”; // wide characters in any character encoding char *s_multichar = “2500 \xE2\x82\xAC” ; // multibyte character: UTF-8 char *mylocale = “en_US.UTF-8”; // US locale size_t len_wide; size_t len_multichar; if ( ! setlocale(LC_ALL, mylocale) ) { // load new locale: UTF-8 encodings
printf(“Locale %s not available\n”, mylocale); exit(EXIT_FAILURE); } len_wide = wcslen(s_wide); len_multichar = mbstowcs(NULL, s_multichar, 0); printf(“Nb of characters in s_wide=%d\n”, len_wide); printf(“Nb of characters in s_multichar=%d\n”, len_multichar ); return EXIT_SUCCESS; } $ gcc -o multichar4 -std=c99 -pedantic multichar4.c $ ./multichar4 Nb of characters in s_wide=6 Nb of characters in s_multichar=6
IX.6.3 Universal Character Names (UCN) As of C99, you can use a character of the universal character set (UCS), called universal character name, by using one of the two following forms: \Udddddddd \udddd
Where d is a digit and dddddddd is a hexadecimal eight-digit code point as defined by ISO/IEC 10646. The form \udddd is equivalent to \U0000dddd. The Unicode value can be expressed with lowercase or uppercase letters. Not all characters can be represented in such a manner: o Code points less than 00A0 (which includes the ASCII character set, and then the basic character set) cannot be represented in this way with the exception of $ (U+0024), @ (U+0040) and ` (U+0060) o Code points in the range [D800-DFFF] cannot be represented by UCN. C99 permits to use universal characters and string literals.
[82] in identifiers, comments, character literals,
In the following example, we display the characters $ (U+0024) and € (U+20AC) using universal character names (Unicode code point):
$ ./ucn1.c #include #include #include #include int main(void) { wchar_t euro = L’\u20AC’; char dollar = ‘\u0024’; char *mylocale = “en_US.UTF-8”; // US locale if ( ! setlocale(LC_ALL, mylocale) ) { // load new locale: UTF-8 encodings printf(“Locale %s not available\n”, mylocale); exit(EXIT_FAILURE); } printf(“Euro=%lc (code value %04X)\n”, euro, euro); printf(“Dollar=%c (code value %04X)\n”, dollar, dollar); return EXIT_SUCCESS; } $ gcc -o ucn1 -std=c99 -pedantic ucn1.c $ ./ucn1 Euro=€ (code value U+20AC) Dollar=$ (code value U+0024)
UCN can also be used in a multibyte string constant as in the following example: $ cat ucn2.1.c #include #include #include int main(void) { char *mbs = “1000 \u20AC”; char *mylocale = “en_US.UTF-8”; // US locale if ( ! setlocale(LC_ALL, mylocale) ) { // load new locale: UTF-8 encodings printf(“Locale %s not available\n”, mylocale); exit(EXIT_FAILURE); } printf(“%s\n”, mbs);
return EXIT_SUCCESS; } $ gcc -o ucn2.1 -std=c99 -pedantic ucn2.1.c $ ./ucn2 1000 €
This is equivalent to: $ cat ucn2.2.c #include #include #include int main(void) { char *mbs = “1000 €”; char *mylocale = “en_US.UTF-8”; // US locale if ( ! setlocale(LC_ALL, mylocale) ) { // load new locale: UTF-8 encodings printf(“Locale %s not available\n”, mylocale); exit(EXIT_FAILURE); } printf(“%s\n”, mbs); return EXIT_SUCCESS; } $ gcc -o ucn2.2 -std=c99 -pedantic ucn2.2.c $ ./ucn2.2 1000 €
Using a UCN of a character is not the same as using hexadecimal (or octal) value of an extended character. Compare with the following program: $ cat ucn3.c #include #include #include int main(void) { char *mbs = “1000 \x20AC”; char *mylocale = “en_US.UTF-8”; // US locale if ( ! setlocale(LC_ALL, mylocale) ) { // load new locale: UTF-8 encodings
printf(“Locale %s not available\n”, mylocale); exit(EXIT_FAILURE); } printf(“%s\n”, mbs); return EXIT_SUCCESS; } $ gcc -o ucn3 -std=c99 -pedantic ucn3.c ucn3.c: In function ‘main’: ucn3.c:6:15: warning: hex escape sequence out of range [enabled by default] char *mbs = “1000 \x20AC”;
The compiler generated a warning indicating the hexadecimal value is not valid in a multibyte string. A hexadecimal or octal constant can represent a character constant only if its value can be represented by an unsigned char. In our example, the value 0x20AC (Unicode code point for €) is too large to be supported by the type unsigned int. However, as shown below, the same example would have worked if we had used the type wchar_t (not recommended. Use UCN instead) $ cat ucn4.c #include #include #include int main(void) { wchar_t *mbs = L”1000 \x20AC”; char *mylocale = “en_US.UTF-8”; // US locale if ( ! setlocale(LC_ALL, mylocale) ) { // load new locale: UTF-8 encodings printf(“Locale %s not available\n”, mylocale); exit(EXIT_FAILURE); } printf(“%ls\n”, mbs); return EXIT_SUCCESS; } $ gcc -o ucn4 -std=c99 -pedantic ucn4.c $ ./ucn4 1000 €
IX.7 strcoll() and strxfm() The function strcoll() and strxfm() do not work with wide characters but only with ordinary string and multibyte strings. They are affected by the current locale and are used in the case programmers work with locales other than English or C. The strcoll() function has the following prototype: #include int strcoll(const char *s1, const char *s2);
It is defined the header file string.h. The strcoll() function compares two strings and return 0 if they are equal, an integer value greater than 0 if s1 is greater than s2 and an integer value less than 0 otherwise. Unlike the function strcmp(), it is affected by the locale of the category LC_COLLATE and its behavior depends on the value of LC_COLLATE. For the C locale, strcoll() has the same behavior as strcmp(). The functions strcmp() and strncmp() functions produce the expected comparisons with English and C locales but this may not true with all locales. The rationale is they use the code values of characters (that depend on character encoding of the current locale) to compare strings. That is, the comparisons carried out by the functions strcmp() and strncmp() are based on the character set order which may not be necessarily the same as the lexicographic order of the current locale. For some languages, such as German, in Unicode for example, the letter ß appears before the letter ä while in the German alphabetical order, it is the opposite. This means, with the functions strcmp() and strncmp(), a program cannot sort properly strings written in German. For this reason, the function strcoll() is preferred in such cases. The following example shows, with a German locale, the comparison performed by strcoll() is correct unlike strcmp(): $ ./strcoll.c #include #include #include #include int main(void) { char *s1 = “ß”; char *s2 = “ä”; char *mylocale = “de_DE.utf8”; // German locale if ( setlocale(LC_ALL, mylocale) == NULL ) { printf (“%s not supported\n”, mylocale); exit (EXIT_FAILURE); }
if (strcoll(s1 , s2) > 0) { printf (“strcoll(): %s > %s\n”, s1, s2); } else if (strcoll(s1 , s2) < 0) { printf (“strcoll(): %s < %s\n”, s1, s2); } if (strcmp(s1 , s2) > 0) { printf (“strcmp(): %s > %s\n”, s1, s2); } else if (strcmp(s1 , s2) < 0) { printf (“strcmp(): %s < %s\n”, s1, s2); } return EXIT_SUCCESS; } $ gcc -o strcoll -std=c99 -pedantic strcoll.c $ ./strcoll strcoll(): ß > ä strcmp(): ß < ä
Do not immediately conclude that from now, the function strcmp() is deprecated and you will use only strcoll(). The function strcoll() is very useful but it has a drawback: performing a significant processing, it consumes much more processor time than strcmp(). To give the function strcmp() the same behavior as the function strcoll(), an intermediate function is used: strxfrm(). It has the following prototype: Until C95: #include size_t strxfrm(char * s1, const char * s2, size_t n);
[83] As of C99 : #include size_t strxfrm(char * restrict s1, const char * restrict s2, size_t n);
The function transforms the string pointed to by s2 and places the n first characters of the resulting transformed string into the memory area pointed to by s1 such that the comparison of the strings s1 and s2 with the function strcmp() provides the same result as the comparison with strcoll(). The number of characters, including the terminating null character, copied to s1 does not exceed the value n. If n is less than or equal to the length of
the transformed string, the behavior is undefined. It returns the length of the transformed string pointed to by s1. Be reminded that the transformed string has an implementation-defined contents supposed to be used only with the function strcmp(). Do not attempt to print it or passing it to another function. If s1, is a null pointer, and n is 0, the function performs no copy, it just returns the length of the resulting transformed string. Consequently, the length of memory area pointed to by s1 must be at least 1 + strxfrm(NULL, s2, 0). Here is an example: $ ./strxfrm.c #include #include #include #include int main(void) { char *s1 = “ß”; char *s2 = “ä”; char *mylocale = “de_DE.utf8”; // German locale if ( setlocale(LC_ALL, mylocale) == NULL ) { printf (“%s not supported\n”, mylocale); exit (EXIT_FAILURE); } char s1_conv[ 1 + strxfrm(NULL, s1,0) ]; char s2_conv[ 1 + strxfrm(NULL, s2,0) ]; strxfrm(s1_conv, s1, sizeof s1_conv/sizeof s1_conv[0]); strxfrm(s2_conv, s2, sizeof s2_conv/sizeof s2_conv[0]); if (strcmp(s1, s2) > 0) { printf (“strcmp(): %s > %s\n”, s1, s2); } else if (strcmp(s1 , s2) < 0) { printf (“strcmp(): %s < %s\n”, s1, s2); }
// compare transformed strings if ( strcmp(s1_conv , s2_conv) > 0 ) { printf (“strcmp() after transformation: %s > %s\n”, s1, s2); } else if ( strcmp(s1_conv, s2_conv) < 0 ) { printf (“strcmp() after transformation: %s < %s\n”, s1, s2); } return EXIT_SUCCESS; } $ gcc -o strxfrm -std=c99 -pedantic strxfrm.c $ ./strxfrm strcmp(): ß < ä strcmp() after transformation: ß > ä
The function strxfrm() is used instead of strcoll() if you need to compare several times the same strings, it is faster to transform them with strxfrm() and then compare the transformed [84]
strings with strcmp() and strncmp()
.
IX.8 Conversion functions The functions described in the following sections are affected by the locale of the category LC_TYPE.
IX.8.1 Conversion state The functions mbtowc(), wctomb(), and mblen(), declared in the header file stdlib.h, specified in the C90 standard should not be used if you work with threads because they keep the conversion state of the last multibyte character processed within an internal object (having static storage duration). This prevents the program from processing several multibyte characters at the same time. For these functions, it is required to initialize the conversion state before calling them. Take note if the value of the category LC_TYPE changes, the conversion state is indeterminate. Accordingly, you have to initialize the conversion state after changing LC_CTYPE. As of C90 Amendment 1 (C95), a new type, called mbstate_t, was introduced allowing an object of that type to save the conversion state of a multibyte string or a multibyte character. The functions mbrtowc(), wcrtomb(), and mbrlen() called restartable functions replace the old functions. They take an additional argument of type mbstate_t keeping the current conversion state.
IX.8.2 mbtowc() As of C90 Amendment 1 (C95): #include int mbtowc(wchar_t *pwc, const char *pmbc, size_t n);
As of C99: #include int mbtowc(wchar_t * restrict pwc, const char * restrict pmbc, size_t n);
The function converts the multibyte character pointed to by pmbc to a wide character that is copied into an object of type wchar_t pointed to by pwc (if it is not a null pointer). It reads at most n bytes from the multibyte character pointed to by pmbc. The function stops reading bytes from pbmc when it finds a valid multibyte character, or when it has read n bytes. If pmbc is a null pointer, and the current locale is a state-dependent encoding, it sets the conversion state to the initial shift state and returns a nonzero value. If pmbc is a null pointer, and the current locale is a state-independent encoding, it returns the value of 0. If pmbc is not a null pointer, and pbmc contains only the null character, the function returns 0. Otherwise, if pmbc is not a null pointer, the function returns the number of bytes forming the multibyte character converted, or -1 if the number of bytes read from pbmc cannot form a valid multibyte character. The return value is less than n and MB_CUR_MAX. The function call mbtowc(NULL, NULL, 0) initializes the conversion state to the initial conversion state. If the character encoding used is stateless, it does nothing. The call mbtowc(NULL, pmbc, n) returns the length of the multibyte character leaving the conversion state unchanged. The following example determines if the character encoding used is state-dependant or stateless: $ cat mbtowc1.c #include #include #include #include #include int main(void) {
int r = mbtowc(NULL, NULL, 0); printf(“state of the curren encoding: %s\n”, r == 0 ? “state-independant” : “state-dependant”); return EXIT_SUCCESS; } $ gcc -o mbtowc1 -std=c99 -pedantic mbtowc1.c $ ./mbtowc1 state of the curren encoding: state-independant
Using UTF-8, the following example shows three calls to mbtowc(). The first one converts the three-byte character representing € (i.e. \xE2\x82\xAC) to a wide character, the second one converts the single-byte character representing T to a wide character and the last one is a conversion failure (not enough characters are read to get a valid multibyte character): $ cat mbtowc2.c #include #include #include #include #include int main(void) { char mbc[] = { ‘\xE2’, ‘\x82’, ‘\xac’ }; // UTF-8 multibyte character char c = ‘T’; int r1, r2, r3; char * mylocale = “en_US.UTF-8”; wchar_t w1=0, w2=0, w3=0; if (! setlocale(LC_ALL, mylocale) ) { printf(“locale %s not supported\n”, mylocale); exit(EXIT_FAILURE); } mbtowc(NULL, NULL, 0); // set the initial conversion state r1 = mbtowc(&w1, mbc, MB_CUR_MAX); r2 = mbtowc(&w2, &c, MB_CUR_MAX); r3 = mbtowc(&w1, mbc, 2); // does not read enough character to get a valid a M.B. character printf(“r1=%d, w1=%lc\n”, r1, w1); printf(“r2=%d, w2=%lc\n”, r2, w2); printf(“r3=%d, w3=%lc\n”, r3, w3);
return EXIT_SUCCESS; } $ gcc -o mbtowc2 -std=c99 -pedantic mbtowc2.c $ ./mbtowc2 r1=3, w1=€ r2=1, w2=T r3=-1, w3=
IX.8.3 wctomb() #include int wctomb(char *pmbc, wchar_t wc);
It converts the wide character wc to a multibyte character and stores it into the memory area pointed to by the pointer pmbc (if it is not a null pointer). If wc is null wide character, a null character is placed into the object pointed to by pmbc (if pmbc is not a null pointer); moreover, a shift sequence setting the initial conversion state is placed before the null character and the initial conversion state is saved by the function. If pmbc is a null pointer, and the current locale is a state-dependent encoding, it sets the conversion state to the initial shift state and returns a nonzero value. If pmbc is a null pointer, and the current locale is a state-independent encoding, it returns the value of 0. If pbmc is not a null pointer, and the wide character wc cannot be converted to a multibyte character, it returns -1. Otherwise, it returns the number of bytes in the multibyte character. The return value is less than MB_CUR_MAX. The first call to the function wctomb(NULL, 0) initializes the conversion state. If the character encoding used is stateless, it does nothing. In the following example, we convert the wide character € to a multibyte character: $ cat wctomb.c #include #include #include #include #include int main(void) { wchar_t euro= L’€’; char mb_euro[MB_CUR_MAX+1];
char * mylocale = “en_US.UTF-8”; size_t len ; if (! setlocale(LC_ALL, mylocale) ) { printf(“locale %s not supported\n”, mylocale); exit(EXIT_FAILURE); } wctomb(NULL, 0); // set the initial conversion state len = wctomb(mb_euro, euro); if (len > 0) mb_euro[len] = ‘\0’; else mb_euro[0] = ‘\0’; printf(“mb_euro contains %d bytes\n”, len); printf(“mb_euro=%s euro=%lc (code %X)\n”,mb_euro, euro, euro); return EXIT_SUCCESS; } $ gcc -o wctomb -std=c99 -pedantic wctomb.c $ ./wctomb mb_euro contains 3 bytes mb_euro=€ euro=€ (code 20AC)
IX.8.4 mblen() #include int mblen(const char *pmbc, size_t n);
If pbmc is not a null character, it examines at most n bytes of multibyte character pointed to by pbmc, and returns the number of bytes in the multibyte character pointed to by pbmc. If pmbc is a null pointer, and the current locale is a state-dependent encoding, it sets the conversion state to the initial shift state and returns a nonzero value. If pmbc is a null pointer, and the current locale is a state-independent encoding, it returns the value of 0. Otherwise, it returns 0 if the multibyte character is a null character, -1 if the multibyte character is not valid, or the number of bytes comprising the multibyte character.
$ cat mblen.c #include #include #include #include #include int main(void) { char mbc[] = { ‘\xE2’, ‘\x82’, ‘\xac’ }; // UTF-8 multibyte character char * mylocale = “en_US.UTF-8”; int len; if (! setlocale(LC_ALL, mylocale) ) { printf(“locale %s not supported\n”, mylocale); exit(EXIT_FAILURE); } mblen(NULL, 0); // set the initial conversion state len = mblen(mbc,MB_CUR_MAX); printf(“multibyte character length=%d \n”, len); return EXIT_SUCCESS; } $ gcc -o mblen -std=c99 -pedantic mblen.c $ ./mblen multibyte character length=3
The function is equivalent to mbtowc(NULL, pmbc, n) except that the conversion state saved in the function mbtowc() does not change.
IX.8.5 mbstowcs() Until C95: #include size_t mbstowcs(wchar_t *pwcs, const char *pmbs, size_t n);
As of C99: #include
size_t mbstowcs(wchar_t *restrict pwcs, const char *restrict pmbs, size_t n);
The function converts a multibyte string, starting in the initial conversion state, pointed to by pbms into a wide string that it copies into the memory area pointed to by pwcs. At most n bytes are copied into the memory block pointed to by pwcs. Characters following the terminating null character in the string pointed to by pmbs are ignored. If, while reading the string pointed to by pmbs, it finds an invalid multibyte character, it returns (size_t)-1. Otherwise, it returns the number of wide characters copied to the memory area pointed to by pwcs, excluding the terminating wide null character (if any). The call mbstowcs(NULL, pmbs, 0) returns the length of the resulting wide string. Example: $ cat mbstowcs.c #include #include #include #include #include int main(void) { char *pmbs = “2500 \xE2\x82\xac”; // UTF-8 multibyte character /* If your host environment use UTF-8, you could have written this char *pmbs = “2500 €”; */ char * mylocale = “en_US.UTF-8”; size_t len; if (! setlocale(LC_ALL, mylocale) ) { printf(“locale %s not supported\n”, mylocale); exit(EXIT_FAILURE); } len = mbstowcs(NULL, pmbs, 0); if (len == (size_t)-1) { printf(“Invalid multibyte string\n”); exit(EXIT_FAILURE); } wchar_t pwcs[len+1];
mbstowcs(pwcs, pmbs, len+1); printf(“Multibyte characters examined in \”%s\”: %d \n”, pmbs, strlen(pmbs)); printf(“Resulting wide string: \”%ls\” (len=%d)\n”, pwcs, len); return EXIT_SUCCESS; } $ gcc -o mbstowcs -std=c99 -pedantic mbstowcs.c $ ./mbstowcs Multibyte characters examined in “2500 €”: 8 Resulting wide string: “2500 €” (len=6)
IX.8.6 wcstombs() Until C95: #include size_t wcstombs(char *pmbs, const wchar_t *pwcs, size_t n);
As of C99: #include size_t wcstombs(char *restrict pmbs, const wchar_t *restrict pwcs, size_t n);
The function converts a wide string pointed to by pwcs to a multibyte string that it stores into a memory area pointed to by pmbs. The conversion stops when a null wide character is encountered or the number of bytes comprising the resulting multibyte string reaches the value n. If the length of the multibyte string is n, it is not null-terminated. If the function cannot convert a wide character to a multibyte character, the function returns (size_t)-1. Otherwise, it returns the number of character in the multibyte strings excluding the terminating null character (if any). The call Example:
wcstombs(NULL, pmbc, 0)
$ cat wcstombs.c #include #include #include
returns the length of the resulting multibyte string.
#include #include int main(void) { wchar_t *pwcs = L”2500 \u20AC”; char * mylocale = “en_US.UTF-8”; size_t len; if (! setlocale(LC_ALL, mylocale) ) { printf(“locale %s not supported\n”, mylocale); exit(EXIT_FAILURE); } len = wcstombs(NULL, pwcs, 0); if (len == (size_t)-1) { printf(“Invalid wide string\n”); exit(EXIT_FAILURE); } char pmbs[len+1]; wcstombs(pmbs, pwcs, len+1); printf(“wide string: \”%ls\” (len=%d)\n”, pwcs, wcslen(pwcs)); printf(“Resulting multibyte string: \”%s\” (len=%d)\n”, pmbs, len); return EXIT_SUCCESS; } $ gcc -o wcstombs -std=c99 -pedantic wcstombs.c $ ./wcstombs wide string: “2500 €” (len=6) Resulting multibyte string: “2500 €” (len=8)
IX.8.7 btowc() As of C90 Amendment 1 (C95): #include #include wint_t btowc(int c);
The function returns the wide character corresponding to the character c that is converted to unsigned char before being passed to the function. If c has the value of EOF or is not a valid character in the initial conversion state, the function returns WEOF.
IX.8.8 wctob() As of C90 Amendment 1 (C95): #include #include int wctob(wint_t c);
It returns EOF if c has not a multibyte representation composed of a single byte in the initial conversion state. Otherwise, it returns the byte as unsigned char, converted to int, corresponding to the wide character c.
IX.8.9 mbsinit() As of C90 Amendment 1 (C95): #include int mbsinit(const mbstate_t *p_cv_state);
It returns a nonzero value if p_cv_state points to an object indicating an initial conversion state or is a null pointer. Otherwise, it returns 0. An object of type mbstate_t contains a conversion state that depends on the locale of the LC_CTYPE category.
IX.8.10 Restartable conversion functions The old conversion functions inherited from C90, mbtowc(), wctomb(), mbstowcs(), wcstombs() and mblen() had a major drawback: they used an internal static object to save the current conversion state for the multibyte character or multibyte string being processed. This means, those functions could not be called in parallel by threads. C90 Amendment 1 overcomes the issue by adding a new parameter of type mbstate_t that stores the conversion state of the multibyte string or character being processed. Thus, programmers have entire control of the objects storing the conversion states of their multibyte strings and characters, allowing them to create threads calling, in parallel, functions performing wide/multibyte conversions without causing conflicts between calls. The new functions are qualified restartable. The functions described in the next sections use the parameters ps of type mbstate_t storing the current conversion state of the multibyte character string being processed. If it is a null
pointer, the internal object, keeping the conversion state, defined within the functions, is used instead: it is initialized to the initial conversion state at program startup. Before calling the functions, initialize (initial shift state) the object of type mbstate_t, by setting it to 0 with memset(). If the object mbs_state holds the conversion state, it can be initialized like this: memset(&mbs_state, 0, sizeof mbs_state);
IX.8.10.1 mbrtowc() As of C90 Amendment 1 (C95): #include size_t mbrtowc(wchar_t *pwc, const char *pmbc, size_t n, mbstate_t *ps);
As of C99: #include size_t mbrtowc(wchar_t *restrict pwc, const char *restrict pmbc, size_t n, mbstate_t *restrict ps);
If pmbc is not a null pointer, the function converts the multibyte character pointed to by pmbc to a wide character that is copied into an object of type wchar_t pointed to by pwc (if not a null pointer). It reads at most n bytes from the multibyte character pointed to by pmbc. If the resulting wide character is a null wide character, the conversion state is set to the initial shift state saved into the object pointed to by ps. If ps is a null pointer, an internal object is used to store the conversion state. If pmbc is a null pointer, pcs and n are ignored, and the call is equivalent to: mbrtowc(NULL, ””, 1, ps);
The call sets ps to the initial shift state. There is another way to initialize the conversion state held in the object pointed to by ps with the initial shift state by setting it to the value of 0 with the call: memset(ps, 0, sizeof *ps);
The function mbrtowc() returns one of the following values: o 0: if after examining at most n bytes, the resulting wide character is the null wide character o Value p such that 1 ≤ p ≤ n: if after examining at most n bytes, a valid multibyte character is constituted, it returns p that is the number of characters in the multibyte character. o (size_t)-2: if after reading at n characters, the number of characters read is not sufficient to
build a valid multibyte character (n is too small), it returns -2 without storing anything into the object pointed to by pwc. o (size_t)(-1): if the function cannot convert the multibyte character (invalid multibyte character) to a wide character, it returns (size_t)-1 without storing anything into the object pointed to by pwc. The global variable errno is set to EILSEQ and the conversion state is unspecified. The following example converts (using UTF-8) the three-byte character representing the symbol Euro € (i.e. \xE2\x82\xAC) to a wide character, converts the single-byte character representing the letter T to a wide character and shows a conversion failure in the last call (not enough characters are read to get a valid multibyte character): $ cat mbrtowc.c #include #include #include #include #include void init_mb_state(mbstate_t *ps) { memset(ps, 0, sizeof *ps); } int main(void) { char mbc[] = { ‘\xE2’, ‘\x82’, ‘\xac’ }; // UTF-8 multibyte character char c = ‘T’; int r1, r2, r3; mbstate_t mb_state; char * mylocale = “en_US.UTF-8”; wchar_t w1=0, w2=0, w3=0; if (! setlocale(LC_ALL, mylocale) ) { printf(“locale %s not supported\n”, mylocale); exit(EXIT_FAILURE); } init_mb_state(&mb_state); r1 = mbrtowc(&w1, mbc, MB_CUR_MAX, &mb_state); init_mb_state(&mb_state); r2 = mbrtowc(&w2, &c, MB_CUR_MAX, &mb_state);
init_mb_state(&mb_state); r3 = mbrtowc(&w1, mbc, 2, &mb_state); // does not read enough character to get a valid a M.B. character printf(“r1=%d, w1=%lc\n”, r1, w1); printf(“r2=%d, w2=%lc\n”, r2, w2); printf(“r3=%d, w3=%lc\n”, r3, w3); return EXIT_SUCCESS; } $ gcc -o mbrtowc -std=c99 -pedantic mbrtowc.c $ ./mbrstowc r1=3, w1=€ r2=1, w2=T r3=-2, w3=
MB_CUR_MAX represents the maximum number of bytes comprising a multibyte character
IX.8.10.2 wcrtomb() From C90 Amendment 1 (C95): #include size_t wcrtomb(char * pmbc, wchar_t wc, mbstate_t * ps);
As of C99: #include size_t wcrtomb(char * restrict pmbc, wchar_t wc, mbstate_t * restrict ps);
If pbmc is not a null pointer, the function wcrtomb() converts the wide character wc to a multibyte character that it stores into the memory area pointed to by the pointer pmbc. If wc is a null wide character, a null character is placed into the object pointed to by pmbc (if pmbc is not a null pointer); moreover, a shift sequence setting the initial conversion state is placed before the null character and the initial conversion state is saved into ps. If ps is a null pointer, an internal object is used to store the conversion state.
If pbmc is a null pointer, the call to the function wcrtomb() is equivalent to: wcrtomb(buf, L’\0’, ps);
Where buf is an internal buffer of the function. The initial conversion state is saved into ps. If wc is not a valid wide character, the conversion state is unspecified and the function returns (size_t)-1 after setting the global variable errno to EILSEQ. Otherwise, it returns the number of characters constituting the multibyte character. The return value is less than MB_CUR_MAX.
A multibyte character always contains at most MB_CUR_MAX bytes.
In the following example, we convert the wide character € to a multibyte character: $ cat wcrtomb.c #include #include #include #include #include int main(void) { wchar_t w_euro= L’€’; // same as wchar_t w_euro= L’\u20AC’ char mb_euro[MB_CUR_MAX+1]; char * mylocale = “en_US.UTF-8”; size_t len ; mbstate_t ps; if (! setlocale(LC_ALL, mylocale) ) { printf(“locale %s not supported\n”, mylocale); exit(EXIT_FAILURE); } memset(&ps, 0, sizeof ps); // initial conversion state
len = wcrtomb(mb_euro, w_euro, &ps); if (len > 0) mb_euro[len] = ‘\0’; else mb_euro[0] = ‘\0’; printf(“mb_euro contains %d bytes\n”, len); printf(“mb_euro=%s w_euro=%lc (code %X)\n”,mb_euro, w_euro, w_euro); return EXIT_SUCCESS; } $ gcc -o mbrtowc -std=c99 -pedantic mbrtowc.c $ ./mbrtowc mb_euro contains 3 bytes mb_euro=€ w_euro=€ (code 20AC)
IX.8.10.3 mbrlen() As of C90 Amendment 1 (C95): #include size_t mbrlen(const char * pmbc, size_t n, mbstate_t * ps);
As of C99: #include size_t mbrlen(const char * restrict pmbc, size_t n, mbstate_t * restrict ps);
It is equivalent to: mbrtowc(NULL, pbmc, n, ps != NULL ? ps : &internal_ps);
Where internal_ps is an object storing the conversion state managed internally by mbrlen(). If pmbc is not a null pointer, the function converts the multibyte character pointed to by pmbc to a wide character that is copied into an object of type wchar_t pointed to by pwc (if not a null pointer). It reads at most n bytes from the multibyte character pointed to by pmbc. If the resulting wide character is a null wide character, the conversion state is set to the initial shift state saved into the object pointed to by ps. If ps is a null pointer, an internal object is used to store the conversion state.
If pmbc is a null pointer, pcs and n are ignored, and the call is equivalent to: mbrlen(””, 1, ps);
or mbrtowc(NULL, ””, 1, ps);
which set ps to the initial shift state. The function mbrlen() returns one of the following values: o 0: if after examining at most n bytes, the resulting wide character is the null wide character o Value p such that 1 ≤ p ≤ n: if after examining at most n bytes, a valid multibyte character is constituted, it returns p that is the number of character of the multibyte character. o (size_t)-2: if after reading at n characters, the number of characters read is not sufficient to build a valid multibyte character (n is too small), it returns -2 without storing anything into the object pointed to by pwc. o (size_t)(-1): if the function cannot convert the multibyte character (invalid multibyte character) to w wide character, it returns (size_t)-1 without storing anything into the object pointed to by pwc. The global variable errno is set to EILSEQ and the conversion state is unspecified. IX.8.10.4 mbsrtowcs() As of C90 Amendment 1 (C95): #include size_t mbsrtowcs(wchar_t *wcs, const char **pmbs, size_t n, mbstate_t *ps);
As of C99: #include size_t mbsrtowcs(wchar_t *restrict wcs, const char **restrict pmbs, size_t n, mbstate_t * restrict ps);
The function converts the multibyte string (including the null character), in the shift state stored in ps, pointed to by *pmbs to a wide string that is copied into an object pointed to by wcs (if not a null pointer). The argument ps stores the shift state of the multibyte string. The function stops reading bytes from the multibyte string if one of the following events occurs: o It finds a null character, terminating the multibyte string, that is also converted to a null wide character.
o It has stored n wide characters into the array wcs (if not a null pointer) including the null wide character if any. If wcs is a null pointer, the argument n is ignored. o An invalid multibyte character is encountered. If wcs is not a null pointer, the function modifies the value of the pointer pointed to by pmbs (i.e. *pmbs is altered) in either way describe below: o The pointer *pmbs is set to a null pointer if a terminating null character has been read, converted and copied to the array wcs. The conversion state is the initial shift state. o If after copying n wide characters to the array wcs, it remains multibyte characters, *pmbs points to the multibyte characters that has not been converted. If an encoding error occurs (invalid multibyte character found), it returns (size_t)-1, sets the global variable errno to EILSEQ, and the conversion state is left unspecified. Otherwise, it returns the number of wide characters resulting from the conversion, excluding the terminating null wide character if any.
If wcs is a null pointer, it returns the number of wide characters resulting from the conversion, excluding the null wide character, ignoring the argument n.
If the conversion state is held in the object mbs_state, it may be initialized with the initial shift state by the call: memset(&mbs_state, 0, sizeof mbs_state);
The following example converts the multibyte string “2500 \u20AC” to a wide string (we will use the UTF-8 encoding): $ cat mbsrtowcs.c #include #include #include #include #include void init_mb_state(mbstate_t *ps) { memset(ps, 0, sizeof *ps); }
int main(void) { const char *mbs = “2500 \u20AC”; const char **ptrc_mbs; size_t nb_wlen; mbstate_t mb_state; char * mylocale = “en_US.utf8”; // UTF-8 encoding if (! setlocale(LC_ALL, mylocale) ) { printf(“locale %s not supported\n”, mylocale); exit(EXIT_FAILURE); } // get the number of resulting wide characters (excluding null wide character) ptrc_mbs = &mbs; init_mb_state(&mb_state); // set inital shift state nb_wlen = mbsrtowcs(NULL, ptrc_mbs, 0, &mb_state); if (nb_wlen == (size_t)-1) { fprintf(stderr, “Invalid mb string\n”); return EXIT_FAILURE; } nb_wlen++; // one extra wide character for null wide character wchar_t wcs[nb_wlen]; init_mb_state(&mb_state); ptrc_mbs = &mbs; mbsrtowcs(wcs, ptrc_mbs, nb_wlen, &mb_state); printf(“nb wide chars (including L’\0’): %d, wcs=%ls, ptrc_mbs=%p\n”, nb_wlen, wcs, *ptrc_mbs); return EXIT_SUCCESS; } $ gcc -o mbsrtowcs -std=c99 -pedantic mbsrtowcs.c $ ./mbsrtowcs nb wide chars (including L’\0’): 7, wcs=2500 €, ptrc_mbs=0
IX.8.10.5 wcsrtombs()
From C90 Amendment 1 (C95): #include size_t wcsrtombs(char *mbs, const wchar_t **pwcs, size_t n, mbstate_t *ps);
As of C99: #include size_t wcsrtombs(char *restrict mbs, const wchar_t **restrict pwcs, size_t n, mbstate_t *restrict ps);
The function converts the wide string (including the null wide character) pointed to by *pwcs to a multibyte string (beginning in the conversion state specified by the object pointed to by ps) and copies it into an object pointed to by mbs (if not a null pointer). The argument ps stores the shift state of the multibyte string. The function stops reading bytes from the wide string if one of the following events occurs: o It finds a null wide character, terminating the wide string, which is also converted to a null character. o It has stored n bytes into the array mbs (if not a null pointer) including the null character if any. If mbs is a null pointer, the argument n is ignored. o A wide character cannot be converted to a multibyte character. If mbs is not a null pointer, the function modifies the value of the pointer pointed to by pwcs (i.e. *pcws is altered) in either way describe below: o The pointer *pwcs is set to a null pointer if a terminating null wide character has been read, converted and copied to the array mbs. The conversion state is the initial shift state. o If after copying n bytes to the array mbs, it remains wide characters, *pwcs points to the wide characters that has not been converted. If an encoding error occurs (a wide character could not be converted to a multibyte character), it returns (size_t)-1, sets the global variable errno to EILSEQ, and the conversion state is left unspecified. Otherwise, it returns the number of bytes resulting from the conversion excluding the terminating null character if any.
If mbs is a null pointer, it returns the number of byte resulting from the conversion, excluding the null character, ignoring the argument n.
If the conversion state is held in the object mbs_state, it may be assigned the initial shift state by the call: memset(&mbs_state, 0, sizeof mbs_state)
The following example converts the wide string “2500 \u20AC” to a multibyte string (UTF-8 encoding): $ cat wcsrtombs.c #include #include #include #include #include void init_mb_state(mbstate_t *ps) { memset(ps, 0, sizeof *ps); } int main(void) { const wchar_t *wcs = L”2500 \u20AC”; const wchar_t **ptrc_wcs; size_t nb_mblen; mbstate_t mb_state; char * mylocale = “en_US.utf8”; if (! setlocale(LC_ALL, mylocale) ) { printf(“locale %s not supported\n”, mylocale); exit(EXIT_FAILURE); } ptrc_wcs = &wcs; init_mb_state(&mb_state); // set inital shift state // get the number of charaters in the mb string (excluding null character) nb_mblen = wcsrtombs(NULL, ptrc_wcs, 0, &mb_state); if (nb_mblen == (size_t)-1) { fprintf(stderr, “Invalid wide string\n”); return EXIT_FAILURE;
} nb_mblen++; // one extra character for null character char mbs[nb_mblen]; init_mb_state(&mb_state); ptrc_wcs = &wcs; wcsrtombs(mbs, ptrc_wcs, nb_mblen, &mb_state); printf(“nb multibyte chars (including ‘\0’): %d, mbs=%s, ptrc_wcs=%p\n”, nb_mblen, mbs, *ptrc_wcs); return EXIT_SUCCESS; } $ gcc -o wcsrtombs -std=c99 -pedantic wcsrtombs.c $ ./wcsrtombs nb multibyte chars (including ‘\0’): 9, mbs=2500 €, ptrc_wcs=0
IX.9 Functions manipulating wide characters Each function, of the form str…(), declared in the header file string.h, processing strings has its equivalent, of the form wcs…(), declared in the header file wchar.h, dealing with wide strings. They have similar behaviors. The functions described in the following sections are not affected by the categories of the current locale unless otherwise stated. In C11, most of the functions, introduced in C90 Amendment 1 (also known as C95), described in the following sections were replaced by functions, having the same name with the extension _s, checking boundaries. As far C99 is concerned, it just changed the prototype of some functions of C90 by adding the keyword restrict without altering their behaviors.
IX.9.1 Copy and concatenation functions IX.9.2 wcscpy() As of C90 Amendment 1 (C95): #include wchar_t *wcscpy(wchar_t * tgt, const wchar_t * src);
As of C99:
#include wchar_t *wcscpy(wchar_t * restrict tgt, const wchar_t * restrict src);
The wcscpy() function is the version of strcpy() that deals with wide strings. It copies the wide characters (including the null wide character) of the string pointed into by src to the memory block pointed to by tgt. The copy stops when a null character is encountered. It returns the pointer tgt.
Ensure the target object (pointed to by tgt) receiving the wide characters is large enough to hold all of them. Otherwise, the behavior of the function is undefined.
IX.9.3 wcsncpy() As of C90 Amendment 1 (C95): #include wchar_t *wcsncpy(wchar_t * tgt, const wchar_t * src, size_t n);
As of C99: #include wchar_t *wcsncpy(wchar_t *restrict tgt, const wchar_t *restrict src, size_t n);
The wcsncpy() function is the version of strncpy() that deals with wide strings. It copies at most n wide characters (including the null character ending the string) from the string pointed to by src into the memory block pointed to by tgt. Wide characters following the first null wide characters encountered are not copied. If the length of the source wide string pointed to by src is less than n, the whole source wide string is copied up to the null wide character (included) and additional null wide characters are appended to the target string until the total number of character written reaches the value n. If the length of the source wide string pointed to by src is greater than n, the memory area pointed to by tgt will not be terminated by the null wide character. In such a case, take care to append it to the target string in your code. The function returns the pointer tgt.
Ensure the target object (pointed to by tgt) receiving the wide characters is large enough to hold all of them. Otherwise, the behavior of the function is undefined.
IX.9.4 wmemcpy() As C90 Amendment 1 (C95): #include wchar_t *wmemcpy(wchar_t *tgt, const wchar_t *restrict src, size_t n);
As of C99: #include wchar_t *wmemcpy(wchar_t *restrict tgt,const wchar_t *restrict src,size_t n);
The wmemcpy() function is the version of memcpy() that deals with wide characters. It copies n wide characters of the memory area pointed to by src into the memory block pointed to by tgt. It returns the pointer tgt. Do not confuse, wmemcpy() with strncpy(). The former function is not affected by the null wide character.
Ensure the target object (pointed to by tgt) receiving the wide characters is large enough to hold all of them. Otherwise, the behavior of the function is undefined. Do not pass overlapping pointers (see Chapter VII Section VII.18.2). Otherwise, the behavior of the function is undefined.
IX.9.5 wmemmove() As of C90 Amendment 1: #include wchar_t *wmemmove(wchar_t *tgt, const wchar_t *src, size_t n);
The wmemmove() function is the version of memmove() that deals with wide characters. It copies n wide characters of the memory area pointed to by src into the memory block pointed to by tgt. It returns the pointer tgt. It performs the same job as wmemcpy() except you can pass overlapping pointers (the restrict keyword is not used). It uses an intermediate memory block to perform the copy. (see Chapter VII Section VII.18.2 talking about overlapping pointers).
Ensure the target object (pointed to by tgt) receiving the wide characters is large enough to hold all of them. Otherwise, the behavior of the function is undefined.
IX.9.6 wmemset() As of C90 Amendment 1 (C95): #include wchar_t *wmemset(wchar_t *s, wchar_t c, size_t n);
The wmemset() function is the version of memset() that deals with wide characters. It copies the wide character c into each of the n first wide characters of the memory area pointed to by s. It returns s.
Ensure the target object (pointed to by s) receiving the wide characters is large enough to hold all of them. Otherwise, the behavior of the function is undefined.
IX.9.7 wcscat() As of C90 Amendment 1 (C95): #include wchar_t *wcscat(wchar_t * tgt, const wchar_t * src);
As of C99: #include wchar_t *wcscat(wchar_t * restrict tgt, const wchar_t * restrict src);
The wcscat() function is the version of strcat() that deals with wide characters. The function concatenates two wide strings. It copies each wide characters of the wide string pointed to by src (including the null wide character) to the end of the object (i.e. memory area) pointed to by tgt. The null wide character of the wide string pointed to by tgt is overwritten by the copy of the first character of string pointed to by src.
Ensure the target object (pointed to by tgt) receiving the wide characters is large enough to hold all of them. Otherwise, the behavior of the function is undefined.
IX.9.8 wcsncat() As of C90 Amendment 1 (C95): #include wchar_t *wcsncat(wchar_t * tgt, const wchar_t * src, size_t n);
As of C99: #include wchar_t *wcsncat(wchar_t * restrict tgt, const wchar_t * restrict src, size_t n);
The wcsncat() function is the version of strncat() that deals with wide string. It performs the same task as wcscat() except it concatenates at most n wide characters from the source wide string src. A null wide character is appended to the string pointed to by tgt.
Ensure the target object (pointed to by tgt) receiving the wide characters is large enough to hold all of them. Otherwise, the behavior of the function is undefined.
IX.9.9 Comparision functions IX.9.10 wcscmp() As of C90 Amendment 1 (C95): #include int wcscmp(const wchar_t *s1, const wchar_t *s2);
The wcscmp() function is the version of strcmp() that deals with wide string. It compares two wide strings and return 0 if they are equal, an integer value greater than 0 if s1 is greater than s2 and an integer value less than 0 otherwise.
IX.9.11 wcsncmp() As of C90 Amendment 1 (C95): #include int wcsncmp(const wchar_t *s1, const wchar_t *s2, size_t n);
The wcsncmp() function is the version of strncmp() that deals with wide string. It compares at most n characters of two wide strings and return 0 if they are equal, an integer value greater than 0 if s1 is greater than s2 and an integer value less than 0 otherwise.
IX.9.12 wmemcp() As of C90 Amendment 1 (C95): #include int wmemcmp(const wchar_t *s1, const wchar_t *s2, size_t n);
The wcmemcmp() function is the version of memcmp() that deals with wide characters. It compares the first n wide characters of the objects pointed to by s1 and s2 and return 0 if they are equal, an integer value greater than 0 if s1 is greater than s2 and an integer value less than 0 otherwise. Unlike wcscmp(), it is not affected by the wide null character.
IX.9.13 wcscoll() As of C90 Amendment 1 (C95): #include int wcscoll(const wchar_t *s1, const wchar_t *s2);
The wcscoll() function is the version of strcoll() that deals with wide string. It compares two wide strings and return 0 if they are equal, an integer value greater than 0 if s1 is greater than s2 and an integer value less than 0 otherwise. If differs from wcscmp() in that it is affected by the locale of the category LC_COLLATE. The comparison functions wcscmp(), wcsncmp(), strcmp() and strncmp() function use the code points of characters (depending on the character encoding) to compare strings. If the
characters of English in character encodings are sorted in the same order as the alphabetical order, this is not true for all languages. For example, in Unicode, the German letter ß appears before the letter ä while in the German alphabetical order, it is the opposite. The function wscoll() uses the locale alphabetical order to compare string unlike wcscmp(). The following example shows the difference: $ ./wcscoll1.c #include #include #include #include #include int main(void) { wchar_t *s1 =L”ß”; wchar_t *s2 =L”ä”; char *mylocale = “de_DE.utf8”; // German locale if ( setlocale(LC_ALL, mylocale) == NULL ) { printf (“%s not supported\n”, mylocale); exit (EXIT_FAILURE); } printf(“code of %ls=0x%04X code of %ls=0x%04X\n”, s1, *s1, s2, *s2); if (wcscoll(s1 , s2) > 0) { printf (“wcscoll(): %ls > %ls\n”, s1, s2); } else if (wcscoll(s1 , s2) < 0) { printf (“wcscoll(): %ls < %ls\n”, s1, s2); } if (wcscmp(s1 , s2) > 0) { printf (“wcscmp(): %ls > %ls\n”, s1, s2); } else if (wcscmp(s1 , s2) < 0) { printf (“wcscmp(): %ls < %ls\n”, s1, s2); } return EXIT_SUCCESS; } $ gcc -o wcscoll1 -std=c99 -pedantic wcscoll1.c $ ./wcscoll1 code of ß=0x00DF code of ä=0x00E4 wscoll(): ß > ä
wcscmp(): ß < ä
The output of wcscmp() is not correct unlike that of wcscoll(). The function wcscoll() is affected by the current locale, by the category LC_COLLATE. The LC_COLLATE category specifies the lexicographical order (order as used in a dictionary) of characters used by a language. Moreover, the function wcscoll() takes into account digraphs and trigraphs used by some languages, which is not the case for the function wcscmp(). For example, in English, according to the alphabetical order of the language, the letter c appears before the letter h: therefore, the string “chab” is considered less than “hab”. In the Czech language, the letter ch, that is a digraph (composed of two characters), appears after the letter h: therefore, the string “chab” is greater than “hab”. In the following example, the function wcscoll() compares correctly the strings “hab” and “chab” taking into account the distinctive features of the current locale: $ ./wcscoll2.c #include #include #include #include int main(void) { wchar_t *s1 = L”chab”; wchar_t *s2 = L”hab”; char *aLocale[] = {“C”, “en_US.UTF-8”, “cs_CZ.UTF-8” }; // C, US and Czech locales for (int i=0; i < 3; i++ ) { char *mylocale= aLocale[i]; if ( ! setlocale(LC_ALL, mylocale) ) { // load new locale: UTF-8 encodings printf(“Locale %s not available\n”, mylocale); continue; } printf(“Using locale %s: “, mylocale); int coll_val = wcscoll(s1, s2); if (coll_val == 0 ) { printf(“%ls == %ls”, s1, s2); } else if ( coll_val < 0 ) { printf(“%ls < %ls”, s1, s2); } else if ( coll_val > 0 ) { printf(“%ls > %ls”, s1, s2);
} printf(“\n”); } return EXIT_SUCCESS; } $ gcc -o ./wcscoll2 -std=c99 -pedantic ./wcscoll2.c $ ./wcscoll2 Using locale C: chab < hab Using locale en_US.UTF-8: chab < hab Using locale cs_CZ.UTF-8: chab > hab
Contrast with the output of the function wcscmp() that does not compare correctly the strings “hab” and “chab” for the Czech language, ignoring the alphabetical order of the current locale: $ ./wcscmp.c #include #include #include #include int main(void) { wchar_t *s1 = L”chhab”; wchar_t *s2 = L”hab”; char *aLocale[] = {“C”, “en_US.UTF-8”, “cs_CZ.UTF-8” }; // C, US and Czech locales for (int i=0; i < 3; i++ ) { char *mylocale= aLocale[i]; if ( ! setlocale(LC_ALL, mylocale) ) { // load new locale: UTF-8 encodings printf(“Locale %s not available\n”, mylocale); continue; } printf(“Using locale %s: “, mylocale); int cmp_val = wcscmp(s1, s2); if (cmp_val == 0 ) { printf(“%ls == %ls”, s1, s2); } else if ( cmp_val < 0 ) { printf(“%ls < %ls”, s1, s2); } else if ( cmp_val > 0 ) {
printf(“%ls > %ls”, s1, s2); } printf(“\n”); } return EXIT_SUCCESS; } $ gcc -o wcscmp -std=c99 -pedantic wcscmp.c $ ./wcscmp Using locale C: chhab < hab Using locale en_US.UTF-8: chhab < hab Using locale cs_CZ.UTF-8: chhab < hab
IX.9.14 wcsxfrm() As of C90 Amendment 1 (C95): #include size_t wcsxfrm(wchar_t * s1,const wchar_t * s2, size_t n);
As of C99: #include size_t wcsxfrm(wchar_t * restrict s1,const wchar_t * restrict s2, size_t n);
The function transforms the wide string pointed to by s2 and places the n first wide characters of the resulting wide string in the memory area pointed to by s1 such that the comparison of the strings s1 and s2 with the function wcscmp() provides the same result as the comparison with wcscoll(). The number of wide characters, including the terminating null wide character, copied to s1 does not exceed the value n. If n is less than or equal to the length of the transformed wide string, the behavior is undefined. It returns the length of the transformed wide string (i.e. the number of wide characters put into s1 excluding the terminating null wide character). The resulting transformed string pointed to by s1 has implementation-defined contents that should be used only with the function wcscmp(). Do not to pass it to a function other than wcscmp(). If s1, is a null pointer, and n is 0, the function performs no copy, it just does the transformation and returns the length of the resulting transformed wide string. Consequently, the length of the memory area pointed to by s1 must be at least 1 +
wcsxfrm(NULL, s2, 0).
Here is an example: $ ./wcsxfrm1.c #include #include #include #include #include int main(void) { wchar_t *s1 =L”ß”; wchar_t *s2 =L”ä”; wchar_t s1_conv[64]; wchar_t s2_conv[64]; char *mylocale = “de_DE.utf8”; // German locale if ( setlocale(LC_ALL, mylocale) == NULL ) { printf (“%s not supported\n”, mylocale); exit (EXIT_FAILURE); } wcsxfrm(s1_conv, s1, sizeof s1_conv/sizeof s1_conv[0]); wcsxfrm(s2_conv, s2, sizeof s2_conv/sizeof s2_conv[0]); printf(“code of %ls=0x%04X code of %ls=0x%04X\n”, s1, *s1, s2, *s2); if (wcscmp(s1, s2) > 0) { printf (“wcscmp(): %ls > %ls\n”, s1, s2); } else if (wcscmp(s1 , s2) < 0) { printf (“wcscmp(): %ls < %ls\n”, s1, s2); } if ( wcscmp(s1_conv , s2_conv) > 0 ) { printf (“wcscmp() after transformation : %ls > %ls\n”, s1, s2); } else if ( wcscmp(s1_conv, s2_conv) < 0 ) { printf (“wcscmp() after transformation: %ls < %ls\n”, s1, s2); }
return EXIT_SUCCESS; } $ gcc -o wcsxfrm1 -std=c99 -pedantic wcsxfrm1.c $ ./wcsxfrm1 code of ß=0x00DF code of ä=0x00E4 wcscmp(): ß < ä wcscmp() after transformation : ß > ä}
The program above has a drawback, we fixed arbitrarily the size of the array receiving the string transformed by wcsxfrm(). We can improve it by using the call wcsxfrm(NULL, s, 0) that returns the length of the transformed wide string: $ ./wcsxfrm2.c #include #include #include #include #include int main(void) { wchar_t *s1 =L”ß”; wchar_t *s2 =L”ä”; char *mylocale = “de_DE.utf8”; // German locale if ( setlocale(LC_ALL, mylocale) == NULL ) { printf (“%s not supported\n”, mylocale); exit (EXIT_FAILURE); } wchar_t s1_conv[ 1 + wcsxfrm(NULL, s1,0) ]; wchar_t s2_conv[ 1 + wcsxfrm(NULL, s2,0) ]; wcsxfrm(s1_conv, s1, sizeof s1_conv/sizeof s1_conv[0]); wcsxfrm(s2_conv, s2, sizeof s2_conv/sizeof s2_conv[0]); printf(“code of %ls=0x%04X code of %ls=0x%04X\n”, s1, *s1, s2, *s2); if (wcscmp(s1, s2) > 0) { printf (“wscmp(): %ls > %ls\n”, s1, s2); } else if (wcscmp(s1 , s2) < 0) { printf (“wscmp(): %ls < %ls\n”, s1, s2);
} // compare transformed strings if ( wcscmp(s1_conv , s2_conv) > 0 ) { printf (“wcscmp() after transformation: %ls > %ls\n”, s1, s2); } else if ( wcscmp(s1_conv, s2_conv) < 0 ) { printf (“wcscmp() after transformation: %ls < %ls\n”, s1, s2); } return EXIT_SUCCESS; } $ gcc -o wcsxfrm2 -std=c99 -pedantic wcsxfrm2.c $ ./wcsxfrm2 code of ß=0x00DF code of ä=0x00E4 wscmp(): ß < ä wcscmp() after transformation: ß > ä
So, why using wcsxfrm() and wcscmp() instead of wcscoll()? The rationale is the function wcscoll() is slower than wcscmp(). If you need to compare several times the same strings, it is better to transform them with wcsxfrm() and then compare the transformed strings with wcscmp().
IX.9.15 Other useful functions IX.9.16 wcslen() As C90 Amendment 1 (C95): #include size_t wcslen(const wchar_t *s);
The function returns the length of the wide string pointed to by s. That is, it returns the number of characters in the wide string pointed to by s, excluding the terminating null wide character.
CHAPTER X INPUT/OUTPUT X.1 Introduction Most of programs are supposed to perform specific tasks based on dynamic data varying over time and on resources of the computer. A piece of data is usually provided by users through their keyboard (terminal) or by files. The program has to resort functions performing I/O requests (input/output) to communicate with the operating system to send [85] to or get data from a device . In this chapter, we will not learn how a program can communicate with another program within the same operating system or with remote systems: it is out of scope of the book. In the chapter, we will learn to communicate with I/O devices through files.
X.2 Files A file can be a container storing data or just an interface used to interact with an I/O device that does not necessarily contains data. For example, the file /dev/tty denotes a terminal on UNIX and UNIX-based systems (Linux, and BSD systems) while the file /etc/hosts (on UNIX and UNIX-based system) or C:\Windows\System32\drivers\etc\hosts (on Windows operating systems) is a file with a backing store holding sequences of characters that can be read or modified by users. A file has several attributes, depending on the operating system, such as its type, its size, and its access permissions. In C, before working with a file, you have to open it, with fopen(), to indicate to the system, you want to work it. Keep in mind, if you cannot open an existing file, it just means the right permissions set on that file do not permit you to use it with the specified open mode. An open mode specifies the way you wish to work with the file such as reading data. The C language allows managing files through functions provided by the C standard library or though system calls provided by system libraries of the operating system. A nonportable C program may invoke system calls to manage files. A C portable program invokes only functions of the C standard library for managing files. On UNIX systems and UNIX-based systems (such as Linux and BSD systems), and more generally on POSIX operating systems, the system calls open(), read(), write(), close(), dup()… manage files. We will not talk about POSIX calls but only about C functions of the C standard library.
The I/O functions presented in this chapter are declared in the header file stdio.h. Which means, before calling them, ensure you have included it in your source files. The C standard defines two macros called EOF and WEOF to indicate the end of a file has been reached. The macro EOF has a negative value of type int (usually -1). The macro WEOF may have any value of type wint_t provided it represents no extended character. EOF is used by functions working with characters (bytes) while WEOF is used by functions working with wide characters.
X.2.1 Opening a file Before a program could access a file for reading, writing or both (i.e. updating), it has to open it. A portable C program invokes the C function fopen() to open a file. The fopen() function, declared in the header file stdio.h, has the following prototype: Until C95: #include FILE *fopen(const char * filename, const char * mode);
As of C99: #include FILE *fopen(const char * restrict filename, const char * restrict mode);
Where filename is the pathname to the file and mode is a string describing the way to open the file. The function returns a pointer to type FILE. In following example, we open the file info.txt for reading: $ cat info.txt Line one Line two $ cat io_open1.c #include #include int main(void) { FILE *pf; pf = fopen(“info.txt”, “r”); return EXIT_SUCCESS;
}
The object type FILE associated to a file when opened is defined in stdio.h: it holds information on data read from or written to the I/O device (such as a data file stored on a hard drive, or a terminal) you have opened. However, users do not actually need to know how the data structure FILE is implemented. Data read or written through an object of type FILE is a series of characters called streams. By extension, the object of type FILE allowing manipulating the data is also called stream. An object of type FILE, a stream, has several fields including a buffer that will store the data, a field storing the position within the file, known as an offset, a field telling if the end-of-file (end-of-file indicator) has been reached and a field indicating if an error has occurred while reading or writing (error indicator). A data stream can take two forms: binary and text. The parameter mode specifies the type of stream. A text stream is a series of characters broken down into lines. A line is a sequence of characters terminated by a newline character. Take note the C standard allows the very last line of a stream to have or not a newline character: this is defined by the implementation. It is safer to terminate the last line of a text file with the newline character. Characters of a text file, on input or output, may be cancelled, added or altered depending on the conventions applying on the operating system to represent textual data. As an example, depending on the operating system, even with ASCII encoding, the newline character denoted by \n is represented by one or two bytes. On Windows operating system, the newline character ‘\n’ is mapped to two characters: the character carriage return (‘\r’, represented by the symbol CR whose ASCII and Unicode code point is 0x0D) + newline character (‘\n’, also known as a line feed, denoted by LF or NL whose ASCII and Unicode code point is 0x0A) while one UNIX and UNIX-Like systems, it is represented by a single character line feed (‘\n’, code point 0x0A, also called a newline character). That’s why, within a text file from a Microsoft windows system read on a UNIX or UNIX-Like system, some extra characters appear as ^M (the character CR) at the end of each line. This means, depending on the operating system, data you read from a text stream does not necessarily compare equal to the data you have written to the text stream! Data read from a text stream compares equal to the data written to the text stream if: o The data is composed of printing characters and the control characters ‘\t’ and ‘\n’. o There is no space characters before newline characters o The last character is a newline character. Practically, you not have to worry about mappings of some characters (such as ‘\n’) as long as you do not exchange text files between different operating systems. Otherwise, a conversion is required…
A binary stream is also a sequence of characters but not split into lines. This type of stream can be used to read or store data structures. Unlike a text stream, data read from a binary stream compares equal to the data written to the stream. No character will be altered, deleted or added when writing to or read from a binary stream. Such a file let you store your objects into binary files and read them later. However, keep in mind, a binary file depends on the implementation. A binary file created on a computer may not be read properly on another computer. The parameter filename is the pathname to the file. On most operating systems, files are grouped into directories. There may be several files with the same name located in different directories but within in a given directory, the file name is unique. If you provide only the name of the file (without specifying its directory), the fopen() function will search within the working directory (directory in which the program has been executed) for the file holding the given name as in example io_open1.c. In the following example, we open the file info.txt located in the directory /opt/projects/C/data: $ cat io_open2.c #include #include int main(void) { FILE *pf; pf = fopen(“/opt/projects/C/data/info.txt”, “r”); return EXIT_SUCCESS; }
The third parameter mode is a string indicating the way the file is to open. Table X‑1 shows the list of allowed open modes.
Table X‑1 Available modes for fopen()
If your work on POSIX operating systems (UNIX operating systems), there is no distinction between a file opened as binary or text: they are stored in the same way. This holds true for UNIX–like systems (Linux, BSD systems). In those systems, the open mode b is just ignored. If the file cannot be opened (file missing or access denied), the fopen() function returns a null pointer. The following example attempts to open a file that does exist:
$ cat io_open3.c #include #include int main(void) { FILE *pf; char *myfile = “/opt/projects/C/data/info_file.txt”; pf = fopen(myfile, “r” ); if ( pf == NULL ) { printf(“Cannot open file %s\n”, myfile); } return EXIT_SUCCESS; } $ gcc -o io_open3 -std=c99 -pedantic io_open3.c $ ./io_open3 Cannot open file /opt/projects/C/data/info_file.txt
In the following example, the file info2.txt cannot be opened for writing because the write permission is not granted to the file: $ cp info.txt info2.txt $ chmod a-w info2.txt $ ls –l info2.txt -r—r—r— 1 user staff 18 Nov 15 17:34 info2.txt $ cat io_open4.c #include #include int main(void) { FILE *pf_read, *pf_write; char *myfile = “info2.txt”; pf_write = fopen(myfile, “w” ); if ( pf_write == NULL ) printf(“Cannot open file %s for writing\n”, myfile); else printf(“file %s opened for writing\n”, myfile); pf_read = fopen(myfile, “r” ); if ( pf_read == NULL ) printf(“Cannot open file %s for reading\n”, myfile); else
printf(“file %s opened for reading\n”, myfile); return EXIT_SUCCESS; } $ gcc -o io_open4 -std=c99 -pedantic io_open4.c $ ./io_open4 Cannot open file info2.txt for writing file info2.txt opened for reading
Explanation: o The command cp info.txt info2.txt copies the file info.txt and gives it the name info2.txt o The command chmod a-w info2.txt removes the write permission o The command ls -l info2.txt shows information on the file info.txt: only the read permission was set in our example. o The first call to fopen() opened the file for writing: it failed o The second call to fopen() successfully opened the file for reading. If you open a file for reading, and fopen() returns a null pointer, it means the file is missing or you cannot have access to it as shown below: $ cat io_open5.c #include #include int main(void) { FILE *pf; char *myfile[2] = {“info2.txt”, “info_missing.txt”}; for (int i=0; i < 2; i++) { pf = fopen(myfile[i], “r” ); if ( pf == NULL ) printf(“File %s missing\n”, myfile[i]); else { printf(“File %s exists\n”, myfile[i]); fclose(pf); } } return EXIT_SUCCESS;
} $ gcc4.9.2 -o io_open5 -std=c99 -pedantic io_open5.c $ ./io_open5 File info2.txt exists File info_missing.txt missing
Table X‑1 shows several open modes for modifying a file: o Open for writing (w, wb). The open file is truncated if it exists, or created if missing. Then, you can write within the file. The stream is used for output only. o Open for writing and reading (w+, wb+). It has the same behavior as above except you can also move within the file (with fseek(), or rewind()) for reading. The same stream is used for input and output. o Open for appending (a, ab). The open file is open for writing keeping its contents if it exists, or created if missing. Then, you can append data to the file. The stream is used for output only. o Open for appending and reading (a+, ab+). It has the same behavior as above except you can also move within the file (with fseek(), or rewind()) for reading. The same stream is used for input and output.
X.3 closing a file #include int fclose(FILE *stream);
Once you have finished to work with a file, you have to close the associated object of type FILE returned by the fopen() function. The following example opens a file and closes it: $ cat io_close.c #include #include int main(void) { FILE *pf; char *myfile = “info.txt”; pf = fopen(myfile, “r”); if ( pf == NULL ) { printf(“Cannot open file %s for reading\n”, myfile); return EXIT_FAILURE; } else
printf(“file %s opened for reading\n”, myfile); fclose(pf); return EXIT_SUCCESS; } $ gcc -o io_close -std=c99 -pedantic io_close.c $ ./io_close file info.txt opened for reading
Once the file has been closed, you can no longer access the file through the pointer returned by fopen().
X.4 Reading a file X.4.1 fgetc() #include int fgetc(FILE *stream);
The function fgetc() extracts a character as unsigned char from the input stream, converts it to int, moves the position indicator (offset) to the next character, and returns the character retrieved, or EOF if the end-of-file has been reached or an error has occurred. EOF is a macro expanding to an integer value indicating no character has been read caused by an error or because the end of the file has been reached. In order to differentiate EOF from any character (byte), the return type is int and not a character type. If an error occurs while reading characters from stream, the error indicator of the stream is set and the function returns EOF. The following example reads character by character the contents of the file info.txt until the end-of-file is reached: $ cat io_fgetc.c #include #include int main(void) { FILE *pf; char *myfile = “info.txt”; int c; pf = fopen(myfile, “r”);
if ( pf == NULL ) { printf(“Cannot open file %s for reading\n”, myfile); return EXIT_FAILURE; } else printf(“file %s opened for reading\n”, myfile); printf(“Read character by character until EOF (=%d) is returned\n”, EOF); while ( ( c = fgetc(pf) ) != EOF ) { printf(“read char=%c\n”, c ); } fclose(pf); return EXIT_SUCCESS; } $ cat info.txt Line one Line two $ gcc -o io_fgetc -std=c99 -pedantic io_fgetc.c $ ./io_fgetc file info.txt opened for reading Read character by character until EOF (=-1) is returned read char=L read char=i read char=n read char=e read char= read char=o read char=n read char=e read char= read char=L read char=i read char=n read char=e read char= read char=t read char=w read char=o read char=
X.4.2 getc() The function getc() is equivalent to fgetc() except it is a macro: #include int getc(FILE *stream);
The function fgetc() is however preferred to getc() for the reasons explained when we talked [86] about macros (see Chapter VII Section VII.27.2). If most of the time they have the same behavior, they differ when the argument has side effects.
X.4.3 ungetc() #include int ungetc(int c, FILE *stream);
The function ungetc() pushes the character c, converted to unsigned char, back onto the input stream. The file associated with the stream is not modified by the function calls. Pushedback characters can then be read from the stream in the reverse order they were pushed back. It returns the wide character that has been put back onto stream or EOF on error. If the character c equals EOF, the function call fails leaving untouched the input stream. The following example reads one character from the input stream, puts it back onto the input stream and read it again: $ cat io_fungetc.c #include #include int main(void) { FILE *pf; char *myfile = “info.txt”; int c; pf = fopen(myfile, “r”); if ( pf == NULL ) { printf(“Cannot open file %s for reading\n”, myfile);
return EXIT_FAILURE; } else printf(“file %s opened for reading\n”, myfile); c = fgetc(pf); /* read one character */ printf(“read char=%c\n”, c ); /* give back the character */ ungetc(c, pf); c = fgetc(pf); printf(“read char=%c\n”, c ); fclose(pf); return EXIT_SUCCESS; } $ gcc -o io_fungetc -std=c99 -pedantic io_fungetc.c $ ./io_fungetc file info.txt opened for reading read char=L read char=L
The function fungetc() allows giving back a character read from the stream as if it has not been read. However, the character you put back onto the stream with the function fungetc() does not have to be the same as the last character read from the stream. Only a single character is guaranteed to be pushed back onto the input stream. If the function is called several times for the same stream and if between the calls no pushedback character has been read from the stream or discarded, the call may fail. A successful call to the function clears the end-of-file indicator of the stream. For a text stream, after calling successfully the function, the file position indicator remains unspecified until the pushed-back characters are read or discarded. For a binary stream, the file position indicator is decremented by each successful call to the function until it reaches the value of 0. If its value is 0 before calling the function, its value is indeterminate. Take note, the pushed back characters are cancelled if the function fsetpos, rewind() or fseek() is called before the pushed back character are read.
X.4.4 fgets()
Until C95: #include char *fgets(char *s, int n,FILE *stream);
As of C99: #include char *fgets(char * restrict s, int n,FILE * restrict stream);
The fgets() function reads from the input stream at most n-1 characters and places them into the given memory area pointed to by s. The function adds the null character to the end of string copied into s. It stops reading if one of following events occurs: o the end-of-file is reached. o a newline is encountered (it is copied to the object pointed to by s) o n-1 characters have been read. o A read error occurs. The fgets() functions returns s or a null pointer. If no error occurs, it returns s. If the end-offile is encountered and no character is read, a null pointer is returned: s is left untouched. If an error occurs while reading, a null pointer is returned: the object pointed to by s has indeterminate contents. The following example reads each line or at most 255 characters and displays the strings read: $ cat io_fgets.c #include #include #define ARRAY_LEN 255 int main(void) { FILE *pf; char *myfile = “info.txt”; char s[ ARRAY_LEN ]; int s_len = sizeof s; char *ret_s; pf = fopen(myfile, “r”); if ( pf == NULL ) { printf(“Cannot open file %s for reading\n”, myfile);
return EXIT_FAILURE; } else printf(“file %s opened for reading\n”, myfile); while ( (ret_s = fgets(s, s_len, pf)) != NULL ) printf(“String read=[%s]\n”, s ); fclose(pf); return EXIT_SUCCESS; } $ gcc -o io_fgets -std=c99 -pedantic io_fgets.c $ ./io_fgets file info.txt opened for reading String read=[Line one ] String read=[Line two ]
We can notice that the newline character read is part of the strings retrieved from the input stream.
X.4.5 fread() Until C95: #include size_t fread(void *s, size_t sz, size_t n, FILE *stream);
As of C99: #include size_t fread(void * restrict s, size_t sz, size_t n, FILE * restrict stream);
The fread() function reads n elements of size sz (bytes) from the input stream and copies them into the memory area pointed to by s. It returns the number of elements read. If this number is different from n, either the end-of-file was reached or an error occurred. If n or sz is zero, no element is read, the function returns zero, s and stream are left unchanged. Unlike fgets(), the fread() function does not append the null character. If you want to work with strings, do not forget to append the null character.
In the following example, we read by group of four characters from the file info.txt until there remains nothing to read (end-of-file): $ cat io_fread.c #include #include #define ARRAY_LEN 5 int main(void) { FILE *pf; char *myfile = “info.txt”; char s[ ARRAY_LEN ]; size_t s_len = sizeof s; size_t nb_elt; pf = fopen(myfile, “r”); if ( pf == NULL ) { printf(“Cannot open file %s for reading\n”, myfile); return EXIT_FAILURE; } else printf(“file %s opened for reading\n”, myfile); while ( (nb_elt = fread(s, 1, s_len-1, pf)) != 0 ) { s[s_len-1] = ‘\0’; /* placing the string terminator */ printf(“String read=[%s]. Nb Chars Read=%d\n”, s, nb_elt ); } fclose(pf); return EXIT_SUCCESS; } $ gcc -o io_fread -std=c99 -pedantic io_fread.c $ ./io_fread file info.txt opened for reading String read=[Line]. Nb Chars Read=4 String read=[ one]. Nb Chars Read=4 String read=[ Lin]. Nb Chars Read=4 String read=[e tw]. Nb Chars Read=4 String read=[o tw]. Nb Chars Read=2
X.4.6 fscanf() Until C95: #include int fscanf(FILE *stream, const char *fmt, …);
As of C99: #include int fscanf(FILE * restrict stream, const char * restrict fmt, …);
The fscanf() function reads a series of characters from the input data specified by the pointer stream, matches them against the specifiers within the string fmt, called format, interpret them according to the corresponding specifier (from the format fmt) and copies them into the memory blocks pointed to by the pointers given in the argument list following the format fmt. The format fmt is a string composed of characters (that can be multibyte), whitespace [87] characters and specifiers. It is similar to that of used by the function printf(). A specifier is letter introduced by the sign % that describes the type of the item read from the input stream that you want to copy into an object pointed to by the pointer passed to fscanf(). For example, the specifier %f represents a floating-point number, %c denotes a character… The fscanf() function returns the number of elements copied to the objects pointed to by the provided arguments or EOF if the end-of-file is reached or an error occurs. The function returns if one of the following condition occurs: o The end-of-file is reached: it returns EOF. o An error occurs: it returns EOF. o Matching failure: it returns the number of items matched so far. o All the format fmt has been scanned: it returns the number of items that have been successfully matched. Before going further, here is a very simple example calling the fscanf() function. Suppose we would like to extract the three items in the file info4.txt: $ cat info4.txt 12 2.1 Hello
The fscanf() function can help us to retrieve them and copy them into objects pointed to by pointers in the argument list passed to the function as shown in the following example:
$ cat io_fscanf1.c #include #include #include int main(int argc, char **argv) { FILE *pf = NULL; char *myfile = NULL; int x = 0; float y = 0; char s[100]; int nb_elt = 0; if (argc == 1) { printf(“USAGE: %s file\n”, argv[0]); return EXIT_FAILURE; } myfile = argv[1]; if ( ( pf = fopen(myfile, “r”) ) == NULL ) { printf(“Cannot open file %s for reading\n”, myfile); return EXIT_FAILURE; } else printf(“file %s opened for reading\n”, myfile); strcpy(s, “NOTHING READ”); /* initialize the array s */ nb_elt = fscanf(pf, “%d%f%s\n”, &x, &y, s); printf(“Elements read (%d): %d %f %s\n”, nb_elt, x, y, s ); fclose(pf); return EXIT_SUCCESS; } $ gcc -o io_fscanf1 -std=c99 -pedantic io_fscanf1.c $ ./io_fscanf1 info4.txt file info4.txt opened for reading Elements read (3): 12 2.100000 Hello
Let us analyze the line involving fscanf(): nb_elt = fscanf(pf, “%d%f%s”, &x, &y, s);
A call to fscanf() is composed of four parts:
o The return value stored in nb_elt. It holds the number of matching elements copied into the provided arguments. o Input stream: pf is a pointer to FILE denoting the input stream. o The format “%d%f%s” composed of three specifiers: %d, %f and %s. The specifier %d denotes a number of type int, %f a number of type float and %s a string. o The argument list &x, &y, s. An argument is a pointer to an object of type corresponding to a specifier within the format. The first input item matching %d is converted to int and copied into the memory location pointed to by &x. The second input item matching %f is converted to float and copied into the memory location pointed to by &y. The third input item matching %s is copied into the memory area pointed to by s. The function fscanf() reads the input stream and matches sequences of characters against the specifiers within the format “%d%f%s”. The function fscanf() reads from the input stream the longest sequences of characters that matches %d, then %f, and finally %s: o It reads from the input stream the longest sequence of characters forming the first element (integer 12) that matches %d, then converts it to int and copies it into the object x. The specifier %d matches a decimal integer represented by type int. o If the second input element matches %f, it is converted to float and copied into the object y. o If the third element read matches %s, each character is copied into the array s. The [88] copy stops when a whitespace character is encountered. At the end of the copied string, a null character is inserted. This simple example leads to two questions: how are whitespace characters treated and what happens if input items do not match specifiers? Input whitespace characters are ignored unless the specifiers [], c and n are used. In the following example, our program io_fscanf1 reads the input file containing many blanks (combination of spaces ‘ ‘, and horizontal tabs ‘\t’ ) that are ignored (we get the same output as earlier): $ cat info4.1.txt 12 2.1 Hello $ ./io_fscanf1 info4.1.txt file info4.1.txt opened for reading Elements read (3): 12 2.100000 Hello
If an item does not match a specifier, the function fscanf() stops reading and returns the number of items successfully matched so far (copied into arguments). Let us run again our program with the input file info4.2.txt: $ cat info4.2.txt 12 noval Hello
$ ./io_fscanf1 info4.2.txt file info4.2.txt opened for reading Elements read (1): 12 0.000000 NOTHING READ
The argument x was assigned the input item 12 but the object y and s was not assigned a value by fscanf() keeping their current value. The function returns after failing to match the item noval against the specifier %f. Remember that fscanf() extracts the longest sequence of characters corresponding to a specifier. If we run our previous program with the input data file info4.3.txt, we get this: $ cat info4.3.txt 122.1 Hello $ ./io_fscanf1 info4.3.txt file info4.3.txt opened for reading Elements read (3): 122 0.100000 Hello
Our program io_fscanf1.c contained a subtle error. We called fscan() like this: fscanf(pf, “%d%f%s”, &x, &y, s);
We declared the array s with a length of 100. What happens if, in an input file, fscanf() finds a matching element composed of 200 characters…? This would generate a bug in the program. The function fscanf() lets you specify a maximum number of characters to read for a specifier. The call must have been written as follows: fscanf(pf, “%d%f%99s”, &x, &y, s);
Here is another example: $ cat io_fscanf2.c #include #include #include #define NB_EXPECTED_ELT 3 /* number of matching items */ int main(void) { FILE *pf = NULL; char *myfile = “info5.txt”; char name[100]; char unit[5]; float capa = 0; int nb_elt = 0;
if ( ( pf = fopen(myfile, “r”) ) == NULL ) { printf(“Cannot open file %s for reading\n”, myfile); return EXIT_FAILURE; } else printf(“file %s opened for reading\n”, myfile); while ( ( nb_elt = fscanf(pf, “disk %99s has capacity of %f %4s\n”, name, &capa, unit)) > 0 ) { if ( nb_elt != NB_EXPECTED_ELT ) printf(“Input stream badly formed\n”); else printf(“disk %s: %f %s\n”, name, capa, unit ); } fclose(pf); return EXIT_SUCCESS; } $ cat info5.txt disk hdisk1 has capacity of 50 GB disk hdisk2 has capacity of 1.5 TB $ gcc -o io_fscanf2 -std=c99 -pedantic io_fscanf2.c $ ./io_fscanf2 file info5.txt opened for reading disk hdisk1: 50.000000 GB disk hdisk2: 1.500000 TB
In our program above, if the number of matching items nb_elt does not compare equal to the value of the macro NB_EXPECTED_ELT (3), it prints an error message. The while loop ends if the end-of-file is reached, an error occurs or there is no matching element. Have a look at the fscanf() call: fscanf(pf, “disk %99s has capacity of %f %4s
We specified the number 99 before first specifier %s and 4 before the last one. We told fscanf() to read at most 99 characters for the first item matching %s and 4 characters at most for the last item matching the specifier %s. Why did we do that? Because we declared the array name with a length of 100 (99 characters for storing the matching item plus one character for the null character) and the array unit with a length of 5 (4 characters to store the matching item plus one for the null character). The value indicating the maximum number of characters to read for a given specifier is called a width.
[89] The parameter fmt is a string composed of literal characters, blanks , and specifiers. A specifier is introduced by the percent character %. It may be preceded, in order of appearance, with an asterisk, an integer called a width and a set of one or two characters called length as shown below: %[*][width][length]specifier
Where: o * indicates the element read will not be stored in a argument (optional). o width is an integer that indicates the maximum number of characters to read for the specifier (optional). o length is one or two letters indicating the size of the object that will store the element (optional). It alters the default size corresponding to the specifier. o specifier is a letter indicating the type of the input element that will be matched against.
Table X‑2 Specifiers of fscanf()
If the number of specifiers in the format fmt is greater than the number of arguments that will hold the matching elements, the result is undefined. If the number of specifiers in fmt is less than the number of arguments that will hold the matching elements, the extra arguments are ignored. Consider the following text file: $ cat info6.txt x=13 y=51 z=0xa t=70 s=Hello x=1 y=5 z=0xF t=0.5 s=World x=10 x=11 y=75 z=0xFF t=0.1 s=END
The example io_fscanf3.c extracts the value of each field and displays it. The expected format of the input data is of the following form: x=integer y=integer z=hexadecimal t=float s=string
If a line does not conform to that format, a matching failure will occur and the program
will print an error message. In our program, fscanf() is expected to read five elements from each line. In the input file info6.txt, we have voluntarily inserted an error in the third line. $ cat io_fscanf3.c #include #include #define NB_EXPECTED_ELT 5 int main(void) { FILE *pf = NULL; char *myfile = “info6.txt”; int x = 0, y = 0, z = 0; float t = 0; char s[100]; int nb_elt = 0; int line = 0; if ( ( pf = fopen(myfile, “r”) ) == NULL ) { printf(“Cannot open file %s for reading\n”, myfile); return EXIT_FAILURE; } else printf(“file %s opened for reading\n”, myfile); while ((nb_elt = fscanf(pf, “x=%d y=%d z=%i t=%f s=%100s\n”, &x, &y, &z, &t, s)) > 0 ) { line++; if ( nb_elt != NB_EXPECTED_ELT ) printf(“Line %d bad format. Elements read %d\n”, line, nb_elt); else printf(“Elements read (%d): %d %d %d %f %99s\n”, nb_elt, x, y, z, t, s ); } fclose(pf); return EXIT_SUCCESS; } $ gcc -o io_fscanf3 -std=c99 -pedantic io_fscanf3.c $ ./io_fscanf3 file info6.txt opened for reading Elements read (5): 13 51 10 70.000000 Hello Elements read (5): 1 5 15 0.500000 World Line 3 bad format. Elements read 1 Elements read (5): 11 75 255 0.100000 END
The function fscanf() extracts data from the input stream and assigns them to the arguments as long as sequences of characters match the given format. The expression fscanf(…) > 0 is true as long as fscanf() assigns input items to the arguments. If nothing is assigned, it means no input element matches (line entirely badly formatted), it returns 0 and then the while loop stops. If the end-of-file is reached, EOF is returned and the loop stops as well. A specifier can be altered by preceding it by one or two fields: o The width field tells how many characters is to be read at most. o The length modifier alters the expected object size induced by the specifier. It indicates the size of the object to which the argument points. For example, an item matching %d is to be stored in an object of type int. An item matching %ld is to be stored in an object of type long int. An item matching %lld is to be stored in an object of type long long int…Table X‑3 shows the expected types of the objects that will store matching input items depending on the specifiers and the length modifier. The arguments passed to fscanf() are pointers to those objects.
Table X‑3 Expected types of arguments for fscanf()
Table X‑4 gives additional examples.
Table X‑4 Examples with fscanf()
Of course, fscanf() is supposed to be used when input data has fixed and known format, which allows retrieving items according to the format. Otherwise, the functions fgets(), fgetc(), fread()…are more appropriate.
X.4.7 sscanf() Until C95: #include int sscanf(const char * s,const char * fmt, …);
As of C99: #include int sscanf(const char * restrict s,const char * restrict fmt, …);
The sscanf() function works in the same way as fscanf() except it reads a string pointed to by the parameters s instead of a stream. Here is an example: $ cat io_sscanf.c #include #include #include #define NB_EXPECTED_ELT 3 /* number of matching items */ int main(void) { char *input_data = “disk hdisk1 has capacity of 50 GB”; char name[100]; char unit[5]; float capa = 0; sscanf(input_data, “disk %99s has capacity of %f %4s”, name, &capa, unit); printf(“disk %s: %f %s\n”, name, capa, unit ); return EXIT_SUCCESS; } $ gcc -o io_sscanf -std=c99 -pedantic io_sscanf.c $ ./io_sscanf disk hdisk1: 50.000000 GB
X.4.7.1 vfscanf() As of C99: #include #include int vfscanf(FILE *restrict stream, const char *restrict fmt, va_list arg);
The function vfscanf() has the same behavior as fscanf(). Instead of a variable list of arguments, it uses the parameters arg of type va_list that must be initialized by the macro va_start(). Since the function does not invoke the macro va_end, the call va_end(arg) is used after invoking vfscanf(). X.4.7.2 vsscanf() As of C99: #include #include int vsscanf(const char *restrict s, const char *restrict fmt, va_list arg);
The function vsscanf() has the same behavior as sscanf(). Instead of a variable list of arguments, it uses the parameters arg of type va_list that must be initialized by the macro va_start(). Since the function does not invoke the macro va_end, the call va_end(arg) is used after invoking vsscanf(). X.4.7.3 scanf() Until C95: #include int scanf(const char *fmt, …);
As of C99: #include int scanf(const char * restrict fmt, …);
The function scanf() has the same behavior as fscanf(). Instead of reading data from a stream associated with a physical file, it gets data from the standard input (stdin). X.4.7.4 vscanf()
As of C99: #include #include int vscanf(const char * restrict fmt, va_list arg);
The function vscanf() has the same behavior as scanf(). Instead of a variable list of arguments, it uses the parameters arg of type va_list that must be initialized by the macro va_start(). Since the function does not call the macro va_end, the call va_end(arg) is used after invoking vscanf().
X.5 Writing to a file X.5.1 fputc() #include int fputc(int c, FILE *stream);
The fputc() function copies the characters c, after converting it to unsigned char, into the output stream represented by the parameter stream. The output stream is a pointer returned by the fopen() function that has opened a file for writing, reading/writing or appending. The function returns the character written unless a write error occurs; in which case, it returns the value of macro EOF and sets the error indicator of the stream. The following example writes characters to a new file (it is created): $ cat data_fpuc1.txt cat: cannot open data_fputc1.txt: No such file or directory $ cat io_putc1.c #include #include #include int main(void) { FILE *pf = NULL; char *myfile = “data_fputc1.txt”; if ( ( pf = fopen(myfile, “w”) ) == NULL ) { printf(“Cannot open file %s for writing\n”, myfile); return EXIT_FAILURE;
} else { printf(“file %s opened for writing\n”, myfile); fputc(‘H’, pf); fputc(‘e’, pf); fputc(‘l’, pf); fputc(‘l’, pf); fputc(‘o’, pf); fputc(‘\n’, pf); } fclose(pf); return EXIT_SUCCESS; } $ gcc -o io_fputc1 -std=c99 -pedantic io_fputc1.c $ ./io_fputc1 file data_fputc1.txt opened for writing $ cat data_fputc1.txt Hello
The following example appends the characters of the string “World” to the file data_fputc1.txt: $ cat io_fputc2.c #include #include #include int main(void) { FILE *pf = NULL; char *myfile = “data_fputc1.txt”; int char_written = 0; if ( ( pf = fopen(myfile, “a”) ) == NULL ) { printf(“Cannot open file %s for appending\n”, myfile); return EXIT_FAILURE; } else { printf(“file %s opened for appending\n”, myfile); char_written = fputc(‘W’, pf); printf(“char writen %c\n”, char_written); char_written = fputc(‘o’, pf); ; printf(“char written %c\n”, char_written); char_written = fputc(‘r’, pf); ; printf(“char written %c\n”, char_written); char_written = fputc(‘l’, pf); ; printf(“char written %c\n”, char_written); char_written = fputc(‘d’, pf); ; printf(“char written %c\n”, char_written); fputc(‘\n’, pf); ; printf(“newline written\n”); }
fclose(pf); return EXIT_SUCCESS; } $ gcc -o io_fputc2 -std=c99 -pedantic io_fputc2.c $ ./io_fputc2 file data_fputc1.txt opened for appending char written W char written o char written r char written l char written d newline written $ cat data_fputc1.txt Hello World
The following example is wrong, the file data_fputc1.txt is opened for reading and we attempt to write to it. $ cat io_fputc3.c #include #include #include int main(void) { FILE *pf = NULL; char *myfile = “data_fputc1.txt”; if ( ( pf = fopen(myfile, “r”) ) == NULL ) { printf(“Cannot open file %s for reading\n”, myfile); return EXIT_FAILURE; } else { int char_written = 0; printf(“file %s opened for reading\n”, myfile); char_written = fputc(‘W’, pf); if (char_written == EOF ) printf(“No char written. Return value: %d\n”, char_written); } fclose(pf);
return EXIT_SUCCESS; } $ gcc -o io_fputc3 -std=c99 -pedantic io_fputc3.c $ ./io_fputc3 file data_fputc1.txt opened for reading No char written. Return value: -1
X.5.2 putc() The function putc() is equivalent to fputc() except it is a macro: #include int putc(int c, FILE *stream);
The function fputc() is preferred to the macro putc() for the reasons explained in Chapter VII Section VII.27.2. If most of the time they have the same behavior, they differ when the argument stream has side effects.
X.5.3 fputs() Until C95: #include int fputs(const char *s, FILE *stream);
As of C99: #include int fputs(const char * restrict s,FILE * restrict stream);
The function fputs() copies the string pointed to by s to stream. The output stream is a pointer returned by the fopen() function that has opened a file for writing, reading/writing or appending. It returns EOF if an error occurs. Otherwise, it returns a nonnegative integer value. The following example writes the string “hello” to the new file data_fputs1.txt: $ cat data_fputs1.txt cat: cannot open data_fputs1.txt: No such file or directory $ cat io_fputs1.c #include #include
#include int main(void) { FILE *pf = NULL; char *myfile = “data_fputs1.txt”; int nb_char = 0; if ( ( pf = fopen(myfile, “w”) ) == NULL ) { printf(“Cannot open file %s for writing\n”, myfile); return EXIT_FAILURE; } else { printf(“file %s opened for writing\n”, myfile); nb_char = fputs(“Hello\n”, pf); printf(“Nb char written: %d\n”, nb_char); } fclose(pf); return EXIT_SUCCESS; } $ gcc -o io_puts1 -std=c99 -pedantic io_puts1.c $ ./io_puts1 file data_fputs1.txt opened for writing Nb char written: 6 $ cat data_fputs1.txt
Now, can you guess what could be the difference between the programs io_fputs_bin.c and io_fputs_txt.c shown below? $ cat io_fputs_txt.c #include int main(void) { char *myfile = “data_fputs.txt”; FILE *fh = fopen(myfile, “w”); //text stream fputs(“\n”, fh); return 0; } $ cat io_fputs_bin.c #include int main(void) {
char *myfile = “data_fputs.bin”; FILE *fh = fopen(myfile, “wb”); // binary stream fputs(“\n”, fh); return 0; }
The program io_fputs_txt.c opens a text stream causing the newline character to have physical representation depending on the operating system. The program io_fputs_bin.c opens a binary stream causing the newline character to be written as ‘\n’ whatever the operating system. On a UNIX or UNIX-like operating system, both the programs are equivalent but on Microsoft Windows, the first one writes two characters \r\n while the second produces a single one ‘\n’.
X.5.4 fwrite() Until C95: #include size_t fwrite(const void *s,size_t sz_elt, size_t n,FILE *stream);
As of C99: #include size_t fwrite(const void * restrict s,size_t sz_elt, size_t n,FILE * restrict stream);
The fwrite() function writes to the output steam (stream) the object pointed to by s composed of n elements of size sz_elt. It returns the number of items written. If that number is less than the expected number of elements to be written n, an error has occurred. The following example writes the string “hello” to the new file data_fwrite1.txt: $ cat data_fwrite1.txt cat: cannot open data_fputs1.txt: No such file or directory $ cat io_fwrite1.c #include #include #include int main(void) { FILE *pf = NULL; char *myfile = “data_fwrite1.txt”; size_t nb_char = 0;
char *s = “Hello\n”; int string_len = strlen(s); if ( ( pf = fopen(myfile, “w”) ) == NULL ) { printf(“Cannot open file %s for writing\n”, myfile); return EXIT_FAILURE; } else { printf(“file %s opened for writing\n”, myfile); nb_char = fwrite(s, 1, string_len, pf); printf(“Nb char written: %d\n”, nb_char); } fclose(pf); return EXIT_SUCCESS; } $ gcc -o io_fwrite1 -std=c99 -pedantic io_fwrite1.c $ ./io_fwrite1 file data_fwrite1.txt opened for writing Nb char written: 6 $ cat data_fwrite1.txt Hello
The following example creates a binary file called students.db in which it will store objects of type structure student. $ cat io_fwrite2.c #include #include #include typedef struct student student; struct student { char first_name[255]; char last_name[255]; int age; }; int main(void) { FILE *pf = NULL; char *myfile = “students.db”; size_t nb_elt_written = 0; student st1, st2;
strcpy(st1.first_name, “David”); strcpy(st1.last_name, “Young”); st1.age = 20; strcpy(st2.first_name, “Albert”); strcpy(st2.last_name, “Hilbert”); st2.age = 21; if ( ( pf = fopen(myfile, “wb”) ) == NULL ) { printf(“Cannot open file %s for writing\n”, myfile); return EXIT_FAILURE; } else { printf(“file %s opened for writing\n”, myfile); nb_elt_written = fwrite(&st1, sizeof st1, 1, pf); printf(“Nb elts written: %d\n”, nb_elt_written); nb_elt_written = fwrite(&st2, sizeof st1, 1, pf); printf(“Nb elts written: %d\n”, nb_elt_written); } fclose(pf); return EXIT_SUCCESS; } $ gcc -o io_fwrite2 -std=c99 -pedantic io_fwrite2.c $ ./io_fwrite2 file data_fwrite2.db opened for writing Nb elts written: 1 Nb elts written: 1
The following program is similar to the program io_fwrite2.c but instead of writing one item at each call, it writes several structures at a time (two) to the output stream: $ cat io_fwrite3.c #include #include #include typedef struct student student; struct student { char first_name[255]; char last_name[255]; int age; };
int main(void) { FILE *pf = NULL; char *myfile = “students.db”; size_t nb_elt_written = 0; int nb_struct = 2; student *p = malloc( nb_struct * sizeof *p ); strcpy(p[0].first_name, “David”); strcpy(p[0].last_name, “Young”); p[0].age = 20; strcpy(p[1].first_name, “Albert”); strcpy(p[1].last_name, “Hilbert”); p[1].age = 21; if ( ( pf = fopen(myfile, “wb”) ) == NULL ) { printf(“Cannot open file %s for writing\n”, myfile); return EXIT_FAILURE; } else { printf(“file %s opened for writing\n”, myfile); nb_elt_written = fwrite(p, sizeof *p, nb_struct, pf); printf(“Nb elts written: %d\n”, nb_elt_written); } fclose(pf); return EXIT_SUCCESS } $ gcc -o io_fwrite3 -std=c99 -pedantic io_fwrite3.c $ ./io_fwrite3 file students.db opened for writing Nb elts written: 2
The following program reads with fread() our binary file students.db created by the previous program and displays the contents of the extracted structures: $ cat io_fread2.c #include #include #include typedef struct student student;
struct student { char first_name[255]; char last_name[255]; int age; }; int main(void) { FILE *pf = NULL; char *myfile = “students.db”; size_t nb_elt_read = 0; int i; student st; if ( ( pf = fopen(myfile, “rb”) ) == NULL ) { printf(“Cannot open file %s for reading\n”, myfile); return EXIT_FAILURE; } else { printf(“file %s opened for reading\n”, myfile); while ( nb_elt_read = fread(&st, sizeof st, 1, pf) ) { for (i = 0; i < nb_elt_read; i++) { printf(“First name: %s\n”, st.first_name); printf(“Last name: %s\n”, st.last_name); printf(“Age: %d\n\n”, st.age); } } } fclose(pf); return EXIT_SUCCESS; } $ gcc -o io_fread2 -std=c99 -pedantic io_fread2.c $ ./io_fread2 file students.db opened for reading First name: David Last name: Young Age: 20 First name: Albert Last name: Hilbert Age: 21
X.5.5 fprintf() Until C95: #include int fprintf(FILE * stream,const char * fmt, …);
As of C99 #include int fprintf(FILE * restrict stream,const char * restrict fmt, …);
The fprintf() function writes a series of characters including the arguments to the output stream according to the format fmt. The format fmt is a string composed of literal characters and conversion specifiers indicating how the arguments have to be interpreted. The specifiers are similar to those of the function fscanf() but with some differences. The arguments are interpreted against their corresponding specifiers. Of course, the file should be opened for writing, reading/writing or appending. Otherwise, no write will be performed. The function returns if one of the following condition is met: o An error occurs, a negative value is returned o The format has been completely scanned: the number of character written is returned. Always ensure you provide enough arguments: if there are not enough arguments, the behavior of the function is undefined. If there are too many arguments, arguments that [90] have not been written are ignored . Let us show some examples before going further. The example io_fprintf1.c given below performs the following tasks: o The statement pf = fopen(myfile, “w”) opens the file info_fprintf1.txt for writing. If the file exists, it is truncated: all its contents will be lost. It the call is successful an object of type FILE is associated with the opened file and a pointer to it is returned. Otherwise, a null pointer is returned. o The statement nb_elt=fprintf(pf, “x=%d and f=%f\n”,x, f) writes the string x= followed by the integer value of the variable x following by and f= followed by the floating-point value of f. That is, if x=10 and f=1.23, the string x=10 and f=1.230000 is written to the output stream. $ cat io_fprintf1.c #include
#include int main(void) { FILE *pf; char *myfile = “info_fprintf1.txt”; int nb_char; int x = 10; float f = 1.23; pf = fopen(myfile, “w”); if ( pf == NULL ) { printf(“Cannot open file %s for writing\n”, myfile); return EXIT_FAILURE; } else printf(“file %s opened for writing\n”, myfile); nb_char = fprintf(pf, “x=%d and f=%f\n”,x, f); printf(“Nb characters written=%d\n”, nb_char ); fclose(pf); return EXIT_SUCCESS; } $ cat info_fprintf1.txt x=10 and f=1.230000
The first conversion specification is just composed of the conversion specifier %d that interprets the argument x as type int. The second conversion specification is only composed of the specifier %f that interprets the argument f as type double. If an argument has not a type matching the conversion specifier, the behavior is undefined (Table X‑7 shows the expected types of the arguments). For example, if the specifier is %d and its corresponding argument is of type float, the output will be wrong. You have noticed that, by default, the specifier %f displays six digits after the decimal points. You can change it by specifying the number of digits after the decimal points such as %.3f as shown below: $ cat io_fprintf2.c #include #include int main(void) { FILE *pf;
char *myfile = “info_fprintf2.txt”; int nb_char; int x = 10; float f = 1.23; pf = fopen(myfile, “w”); if ( pf == NULL ) { printf(“Cannot open file %s for writing\n”, myfile); return EXIT_FAILURE; } else printf(“file %s opened for writing\n”, myfile); nb_char = fprintf(pf, “x=%d and f=%.3f\n”,x, f); printf(“Nb characters written=%d\n”, nb_char ); fclose(pf); return EXIT_SUCCESS; } $ gcc -o io_fprintf2 -std=c99 -pedantic io_fprintf2.c $ ./io_fprintf2 file info_fprintf2.txt opened for writing Nb characters written=17 $ cat info_fprintf2.txt x=10 and f=1.230
The conversion specification %.3f tells fprintf() to write three digits after the decimal point. The sequence of characters .3 is called a precision. When used with a floating-point number (specifiers a or A, e or E, f or F), it specifies the number of digits after the decimal point. A precision used with the specifier s means the maximum number of characters to write. A precision used with the specifier %d, %i, %o, %u, %x or %X indicates the minimum number of digits to write as in the following example (leading zeros are added for padding): $ cat io_fprintf3.c #include #include int main(void) { FILE *pf; char *myfile = “info_fprintf3.txt”; int nb_char; int x = 10;
float f = 1.23; char *str = “World”; pf = fopen(myfile, “w”); if ( pf == NULL ) { printf(“Cannot open file %s for writing\n”, myfile); return EXIT_FAILURE; } else printf(“file %s opened for writing\n”, myfile); nb_char = fprintf(pf, “x=%.5d, f=%.3f and str=%.3s\n”,x, f, str); printf(“Nb characters written=%d\n”, nb_char ); fclose(pf); return EXIT_SUCCESS; } $ gcc -o io_fprintf3 -std=c99 -pedantic io_fprintf3.c $ ./io_fprintf3 file info_fprintf3.txt opened for writing Nb characters written=29 $ cat info_fprintf3.txt x=00010, f=1.230 and str=Wor
You can specify the minimum number of characters to write by using the field width (preceding the precision if any). For example, fprintf(pf, “x=%5d\n”,x) outputs the object x with at least 5 characters, using padding leading spaces if required. Here is a complete example: $ cat io_fprintf4.c #include #include int main(void) { FILE *pf; char *myfile = “info_fprintf4.txt”; int nb_char; int x = 10; float f = 1.23; char *str = “World”; pf = fopen(myfile, “w”);
if ( pf == NULL ) { printf(“Cannot open file %s for writing\n”, myfile); return EXIT_FAILURE; } else printf(“file %s opened for writing\n”, myfile); nb_char = fprintf(pf, “x=%5d, f=%5.2f and str=%10s\n”,x, f, str); printf(“Nb characters written=%d\n”, nb_char ); fclose(pf); return EXIT_SUCCESS; } $ gcc -o io_fprintf4 -std=c99 -pedantic io_fprintf4.c $ ./io_fprintf4 file info_fprintf4.txt opened for writing Nb characters written=39 $ cat info_fprintf4.txt x= 10, f= 1.23 and str= World
Leading spaces are added if the number of characters of the argument is less than the number specifying the width. For numbers, instead of spaces, zeroes can be written by preceding the width with the flag 0. For example, fprintf(pf, “x=%05d\n”,x) outputs the variable x with at least 5 digits, using padding leading zeroes if required. Here is a complete example: $ cat io_fprintf5.c #include #include int main(void) { FILE *pf; char *myfile = “info_fprintf5.txt”; int nb_char; int x = 10; float f = 1.23; char *str = “World”; pf = fopen(myfile, “w”); if ( pf == NULL ) { printf(“Cannot open file %s for writing\n”, myfile); return EXIT_FAILURE; } else
printf(“file %s opened for writing\n”, myfile); nb_char = fprintf(pf, “x=%05d, f=%05.2f and str=%10s\n”,x, f, str); printf(“Nb characters written=%d\n”, nb_char ); fclose(pf); return EXIT_SUCCESS; } $ gcc -o io_fprintf5 -std=c99 -pedantic io_fprintf5.c $ ./io_fprintf5 file info_fprintf5.txt opened for writing Nb characters written=36 $ cat info_fprintf5.txt x=00010, f=01.23 and str= World
The width can be also an argument by using the character *. For example, fprintf(pf, “x=%0*d\n”,5, x). Here is a complete example: $ cat io_fprintf6.c #include #include int main(void) { FILE *pf; char *myfile = “info_fprintf6.txt”; int nb_char; int x = 10; float f = 1.23; char *str = “World”; pf = fopen(myfile, “w”); if ( pf == NULL ) { printf(“Cannot open file %s for writing\n”, myfile); return EXIT_FAILURE; } else printf(“file %s opened for writing\n”, myfile); nb_char = fprintf(pf, “x=%0*d, f=%0*.2f and str=%*s\n”,5, x, 6, f, 10, str); printf(“Nb characters written=%d\n”, nb_char ); fclose(pf); return EXIT_SUCCESS;
} $ gcc -o io_fprintf6 -std=c99 -pedantic io_fprintf6.c $ ./io_fprintf6 file info_fprintf6.txt opened for writing Nb characters written=37 $ cat info_fprintf6.txt x=00010, f=001.23 and str= World
The precision can also be passed as an argument by using the character *: for example, fprintf(pf, “f=%0*.*f\n”,6, 1, f). Here is a complete example: $ cat io_fprintf7.c #include #include int main(void) { FILE *pf; char *myfile = “info_fprintf7.txt”; int nb_char; int x = 10; float f = 1.23; char *str = “World”; pf = fopen(myfile, “w”); if ( pf == NULL ) { printf(“Cannot open file %s for writing\n”, myfile); return EXIT_FAILURE; } else printf(“file %s opened for writing\n”, myfile); nb_char = fprintf(pf, “x=%0*d, f=%0*.*f and str=%*s\n”,5, x, 6, 1, f, 10, str); printf(“Nb characters written=%d\n”, nb_char ); fclose(pf); return EXIT_SUCCESS; } $ gcc -o io_fprintf7 -std=c99 -pedantic io_fprintf7.c $ ./io_fprintf7 file info_fprintf7.txt opened for writing Nb characters written=37 $ cat info_fprintf7.txt x=00010, f=0001.2 and str= World
More generally, for fprintf(), a conversion specification takes the following form: %[flag][width][.precision][length]specifier
o flag is one of the following characters: space, #, -, and +. Several flags can be combined in any order (Table X‑5). o width is an integer specifying the minimum numbers of characters to write. It can be an integer or an asterisk (*). The asterisk means the width is passed as an argument. o precision is an integer number, an asterisk (*) or no character. Used with the specifiers a, A, f, F, g or G, it indicates the maximum number of digits after the decimal point for the matching argument. Used with the specifier s, it states the maximum number of characters to write. Used with the specifiers d, i, u, o, x or X, it indicates the minimum number of digits to display adding leading zeroes for padding if required. If there is no precision after the decimal point, it means the fractional part will be discarded. The asterisk means the precision is given by an argument. o length is composed of one or two letters specifying the size of the argument matching specifier (Table X‑7). o specifier is a letter indicating how to interpret the matching argument (see Table X‑6). Flag
Meaning It is used with the specifiers converting numbers o, x, X, a, A, f, F, g, and G. o Used with the specifier o, it adds a leading zero (symbolizing an octal number)
#
o Used with the specifier x or X, it adds the leading characters 0x or 0X (symbolizing an hexadecimal number) o Used with a, A, f, F, g, or G, it keeps the decimal point even if there is no fractional part.
+
By default, the + sign of a positive is not shown but if the flag + is specified, the + sign appears before positive numbers. The output is left justified. By default, the output is right justified.
-
It is used with specifiers converting numbers (d, i, o, u, x, X, a, A, f, F, g, and G)
and the field width.
0
o If the flag – is not used, the flag 0 appears and the number of characters composing the argument (matching specifier) is less than width, leading zeroes are added. o If the flags – and 0 are both is used, the flag 0 is ignored. o If none of the flags – and 0 are used and the number of characters composing the matching argument is less than width, leading spaces are added. o If the flag – is used, the number of characters composing the converted number is less than width, trailing spaces added.
space
The flag space is ignored if the flag + appears. If the argument to output is positive, a space character is used instead of the + sign. Table X‑5 Flags for fprintf()
Table X‑6 Specifiers for fprintf()
Table X‑7 Types of the arguments passed to fprintf()
X.5.6 sprintf() Until C95: #include int sprintf(char *s, const char *fmt, …);
As of C99: #include int sprintf(char *restrict s, const char *restrict fmt, …);
The sprintf() function works in the same way as fprintf() except it writes to an object pointed to by s instead of a stream. Here is an example:
$ cat io_sprintf.c #include #include #include int main(void) { char message[255]; sprintf( message, “sizeof(int)=%d\nsizeof(long)=%d\nsizeof(float)=%d\n”, sizeof(int), sizeof(long), sizeof(float) ); printf(“%s”, message ); return EXIT_SUCCESS; } $ gcc -o io_sprintf -std=c99 -pedantic io_sprintf.c $ ./io_sprintf sizeof(int)=4 sizeof(long)=4 sizeof(float)=4
X.5.6.1 snprintf() As of C99: #include int snprintf(char *restrict s, size_t n, const char *restrict fmt, …);
The function snprintf() has the same behavior as fprintf(). Instead of writing to a stream, it writes to a memory area pointed to by s at most n characters (including the null character). If n is zero, nothing is written and s may be a null pointer. The functions appends a null character to the array s unless n is zero. It returns the number of characters that would have been written (excluding the null character in the count) if n had been large enough or a negative integer if an error has occurred. Therefore, if the integer number returned by the function is not negative and is less than n, the whole output has been written to the memory area pointed to by s. It could be used to convert arguments holding wide characters to a multibyte string as in the following example:
$ cat io_snprintf.c #include #include #include #include int main(void) { wchar_t *wide_s = L”2000 \u20AC”; // Unicode code point \u20AC is the symbol € char multibyte_output[64]; char *mylocale = “en_US.UTF-8”; if ( ! setlocale(LC_ALL, mylocale) ) { printf(“Locale %s not available\n”, mylocale); exit(EXIT_FAILURE); } //null character appended to multibyte_output snprintf(multibyte_output, 64, “multibyte_output=%ls”, wide_s); printf(“wide_s=%ls. %s \n”, wide_s, multibyte_output); return EXIT_SUCCESS; } $ gcc -o io_snprintf -std=c99 -pedantic io_snprintf.c $ ./io_snprintf wide_s=2000 €. multibyte_output=2000 €
X.5.6.2 vfprintf() Until C95: #include #include int vfprintf(FILE *stream, const char *fmt, va_list arg);
As of C99: #include #include int vfprintf(FILE *restrict stream, const char *restrict fmt, va_list arg);
The function vfprintf() has the same behavior as fprintf(). Instead of a variable list of
arguments, it uses the parameters arg of type va_list that must be initialized by the macro va_start().Since the function does not invoke the macro va_end, the call va_end(arg) has to be used after invoking vfprintf(). The following example writes strings to the file logerror: $ cat io_vfprintf.c #include #include #include #define LOG_FILE “logerror” void log_error(const char *fmt,…) { va_list arg; static FILE *logfh = NULL; if ( ! logfh ) // if logh is a null pointer, set it to a valid stream if ( ! (logfh = fopen(LOG_FILE, “a”)) ) { // cannot create logfile fprintf( stderr, “cannot create logfile %s”, LOG_FILE ); perror(“Open logfile”); logfh = stdout; // use standard output instead } va_start(arg, fmt); vfprintf(logfh, fmt, arg); va_end(arg); } int main(void) { wchar_t message[] = L”INFO: example of vfprintf”; log_error(“INFO:%s\n”, message); return EXIT_SUCCESS; } $ gcc -o io_vprintf -std=c99 -pedantic io_vprintf.c $ ./io_vprintf
X.5.6.3 vsprintf() Until C95:
#include #include int vsprintf(char *s, size_t n, const char *fmt, va_list arg);
As of C99: #include #include int vsprintf(char *restrict s, size_t n, const char *restrict fmt, va_list arg);
The function vsprintf() has the same behavior as sprintf(). Instead of a variable list of arguments, it uses the parameter arg of type va_list that must be initialized by the macro va_start(). Since the function does not invoke the macro va_end, the call va_end(arg) has be placed after invoking vsprintf(). X.5.6.4 printf() Until C95: #include int printf(const char * fmt,…);
As of C99: #include int printf(const char * restrict fmt,…);
The function printf() has the same behavior as fprintf(). Instead of writing to stream associated with a physical file, it writes to the standard output (stdout). X.5.6.5 vprintf() Until C95: #include #include int vprintf(const char *fmt, va_list arg);
As of C99: #include #include
int vprintf(const char *restrict fmt, va_list arg);
The function vprintf() has the same behavior as printf(). Instead of a variable list of arguments, it uses the parameter arg of type va_list that must be initialized by the macro va_start(). As the function does not invoke the macro va_end, the call va_end(arg) has to be used after invoking vprintf().
X.6 Position indicator The position indicator is an integer of type long denoting a position within a stream. The functions described in the following sections manipulate it. Take note a position indicator is just a way to save a position within a file.
X.6.1 ftell() #include long int ftell(FILE *stream);
The ftell() function returns the current value of the position indicator of the stream. If an error occurs, the value -1L is returned. For a binary file, it returns the number of characters from the beginning of the file. For a text file, there may be no relationship between characters read or written and file position indicator. The value returned by ftell() can then be used by fseek() to set the position again at that point. The following example, executed on a Linux operating system, shows the position indicator before and after reading characters: $ cat ftell1.txt Line 1: hello Line 2: world $ cat io_ftell1.c #include #include int main(void) { FILE *pf = NULL; char *myfile = “ftell1.txt”; long pos = 0; char c; char s[100]; int nb_char_read = 0;
pf = fopen(myfile, “r”); if ( pf == NULL ) return EXIT_FAILURE; pos = ftell(pf); printf(“Init: pos=%ld\n”, pos); c = getc(pf); pos = ftell(pf); printf(“After reading character %c, pos=%ld\n”, c, pos); fscanf(pf, “%s”, s); pos = ftell(pf); printf(“After reading characters %s. pos=%ld\n”, s, pos); fclose(pf); return EXIT_SUCCESS; } $ gcc -o io_ftell1 -std=c99 -pedantic io_ftell1.c $ ./io_ftell1 Init: pos=0 After reading character L, pos=1 After reading characters ine. pos=4
The following example, executed on a UNIX machine, shows the position indicator before and after writing characters: $ cat io_ftell2.c #include #include int main(void) { FILE *pf = NULL; char *myfile = “ftell2.txt”; long pos = 0; char c; char *s = “ine 1”; int nb_char_read = 0; pf = fopen(myfile, “w”); if ( pf == NULL ) return EXIT_FAILURE;
pos = ftell(pf); printf(“Init: pos=%ld\n”, pos); c = ‘L’; putc(pf,c); pos = ftell(pf); printf(“After writing %c: pos=%ld\n”, c, pos); puts(pf, s); pos = ftell(pf); printf(“After writing characters %s, pos=%ld\n”, s, pos); fclose(pf); return EXIT_SUCCESS; } $ gcc -o io_ftell2 -std=c99 -pedantic io_ftell2.c $ ./io_ftell2 Init: pos=0 After writing L: pos=1 After writing characters ine 1, pos=6
Again, for a text file, there may be no relationship between characters read or written and file position indicator. If on UNIX, UNIX-based Microsoft operating systems, the file position indicator measures the number of bytes from the beginning of the file, this is not a general rule.
X.6.2 fseek() #include int fseek(FILE *stream, long int offset, int reference);
The function fseek() allows you to move to a certain position within the stream pointed to by stream. It sets the position indicator to the value offset against the point indicated by reference (see Table X‑8). The interpretation of offset depends on the type of the stream: o For a binary stream: reference is one the macros listed in Table X‑8 and offset is an integer of type long representing the new position from reference. The position within the stream is moved by offset bytes (characters) from the starting point indicated by one of the macros SEEK_SET, SEEK_CUR, or SEEK_END. However, SEEK_END may not be [91] supported . To be portable, your program should avoid using SEEK_END. o For a text stream: reference is the macro SEEK_SET (beginning of the file) and offset is an integer of type long. Be cautious that offset may not count the number of bytes from the beginning of the text file. So that your program be portable, offset should be 0 or a value
returned by the function ftell(). On UNIX, UNIX-based systems and Microsoft systems, offset counts the number of characters from the beginning of the text file but this is not true for every operating system. On some systems, there is no relationship between the file position indicator and the character count. In POSIX operating systems (UNIX operating systems), there is no distinction between a file opened as binary or text. This holds true for UNIX–like systems (Linux, BSD systems). Take note the characters put back onto the stream by the function ungetc() are cancelled after a call to fseek(). The function returns zero if the call succeeds. Otherwise, it returns a non-zero value. Reference
Meaning
SEEK_SET
Beginning of the file
SEEK_CUR
Current position
SEEK_END
End of the file Table X‑8 fseek(): reference position
For text file, the position indicator does not always count the number of characters from the beginning of the file. This is true for UNIX system, UNIX-based systems (Linux, BSD systems) and Microsoft systems. For binary files, the position indicator always denotes the number of characters from the beginning of the file.
The following example sets the file position indicator within the stream at 7 bytes from the beginning and reads the string from that position. The program, working on UNIX and UNIX-based systems (Linux, BSD) and Windows systems, is not portable: $ cat io_fseek1.c #include #include int main(void) { FILE *pf = NULL; char *myfile = “fseek1.txt”; long offset = 0L; int array_len = 10; char s[array_len];
pf = fopen(myfile, “r”); if ( pf == NULL ) return EXIT_FAILURE; offset = 7L; fseek(pf, offset, SEEK_SET); fgets(s, array_len, pf ); printf(“string read=%s”, s); fclose(pf); return EXIT_SUCCESS; } $ cat fseek1.txt Line 1:Hello Line 2:world $ gcc -o io_fseek1 -std=c99 -pedantic io_fseek1.c $ ./io_fseek1 string read=Hello
The following example sets the position indicator to the value 7 (seven characters from the beginning) and writes the string “HELLO” from that position. The program, working on UNIX and UNIX-based systems (Linux, BSD) and Microsoft Windows systems, is not portable: $ cat fseek2.txt Line 1:Hello Line 2:world $ cat io_fseek2.c #include #include int main(void) { FILE *pf = NULL; char *myfile = “fseek2.txt”; long offset = 0L; char s[] = “HELLO”; /* open for reading and writing without truncating the file */ pf = fopen(myfile, “r+”);
if ( pf == NULL ) return EXIT_FAILURE; offset = 7L; fseek(pf, offset, SEEK_SET); fputs(s, pf ); fclose(pf); return EXIT_SUCCESS; } $ cat fseek2.txt Line 1:HELLO Line 2:world
Take note the file was opened with mode r+, which allowed modifying the file.
X.6.3 rewind() #include void rewind(FILE *stream);
The call rewind(stream) is equivalent to (void)fseek(stream, 0L, SEEK_SET). It moves the position indicator to the beginning of the stream. It also clears the error indicator for the stream.
X.6.4 fgetpos() and fsetpos() Until C95: #include int fgetpos(FILE * stream,fpos_t *pos); int fsetpos(FILE *stream, const fpos_t *pos);
As of C90: #include int fgetpos(FILE * restrict stream,fpos_t * restrict pos); int fsetpos(FILE *stream, const fpos_t *pos);
The fgetpos() function saves the position indicator (and potentially other pieces of data) into
an object pointed to by pos. The structure fpos_t is an opaque structure (encapsulated) that cannot be accessed. The fsetpos() function sets the position indicator saved in an object pointed to by pos returned by fgetpos(). The functions force programmers to use correctly the file position indicator, making programmers portable. Here is an example: $ cat fgetpos.txt Line 1:hello Line 2:world $ cat io_fgetpos.c #include #include #include int main(void) { FILE *pf = NULL; char *myfile = “fgetpos.txt”; int array_len = 255; char s[array_len]; fpos_t pos; char c; pf = fopen(myfile, “r”); if ( pf == NULL ) return EXIT_FAILURE; /* get position of the first character matching colon and store into into pos */ while ( c = fgetc(pf) ) { if ( c == ‘:’ ) { if ( fgetpos(pf, &pos) ) { printf(“fgetpos failed”); return EXIT_FAILURE; } break; } } rewind(pf); /* set the position to the beginning of the stream */ /* read all characters from the stream */ while ( fgets(s, array_len, pf) != NULL ) printf(“String read=%s”, s );
/* set position indicator to value stored in pos */ if ( fsetpos(pf, &pos) ) { printf(“fsetpos failed”); return EXIT_FAILURE; } fgets(s, array_len, pf); printf(“String read=%s”, s ); fclose(pf); return EXIT_SUCCESS; } $ gcc -o io_fgetpos -std=c99 -pedantic io_fgetpos.c $ ./io_fgetpos String read=Line 1:hello String read=Line 2:world String read=hello
X.7 Managing errors The C library implements several functions, declared in stdio.h, allowing you to manage errors occurring after calling a system or C library function. In order to call the functions described in the following sections, with the exception of strerror(), do not forget the directive #include .
X.7.1 perror() #include void perror(const char *s);
If an error occurs, the perror() functions writes to the standard error the message pointed to by s followed by a colon and the cause of the last error. In the following example, we attempt to write to a file opened for reading: $ cat perror1.txt Line 1 Line 2 $ cat io_perror1.c #include
#include int main(void) { FILE *pf = NULL; char *myfile = “perror1.txt”; pf = fopen(myfile, “r”); if ( pf == NULL ) { perror(“Cannot open file”); return EXIT_FAILURE; } if ( fprintf(pf, “Hello” ) < 0 ) { perror(“Error while writing to file”); return EXIT_FAILURE; } return EXIT_SUCCESS; } $ gcc -o io_perror1 -std=c99 -pedantic io_perror1.c $ ./io_perror1 Error while writing to file: Bad file number
The following example attempts to open a missing file for reading: $ cat io_perror2.c #include #include int main(void) { FILE *pf = NULL; char *myfile = “perror2.txt”; const int error_msg_len = 255; char error_msg[ error_msg_len ]; pf = fopen(myfile, “r”); if ( pf == NULL ) { sprintf(error_msg, “Cannot open file %s”, myfile); perror(error_msg); return EXIT_FAILURE; } else { printf(“File %s open for reading”, myfile); }
return EXIT_SUCCESS; } $ gcc -o io_perror2 -std=c99 -pedantic io_perror2.c $ ./io_perror2 Cannot open file perror2.txt: No such file or directory
X.7.2 errno After calling a system or C-library function, the global integer variable errno is set if an error has occurred. It denotes the cause of the error. The global variable errno is declared in the header file errno.h. The following example is equivalent to the previous example io_perror2.c except we use errno instead of perror(). $ cat io_errno.c #include #include #include int main(void) { FILE *pf = NULL; char *myfile = “ERRNO.txt”; pf = fopen(myfile, “r”); if ( pf == NULL ) { printf(“Cannot open file %s. Errno: %d\n”, myfile, errno); return EXIT_FAILURE; } else { printf(“File %s open for reading”, myfile); } return EXIT_SUCCESS; } $ comp io_error $ gcc -o io_errno -std=c99 -pedantic io_errno.c $ ./io_error Cannot open file ERRNO.txt. Errno: 2
X.7.3 strerror() #include
char * strerror(int err_number);
The function strerror() returns the error message associated with the integer err_number as the function perror() would do. It is declared in the header file string.h. The following example is equivalent to the example io_perror2.c except we use strerror() and errno instead of perror(). $ cat io_strerror1.c #include #include #include #include int main(void) { FILE *pf = NULL; char *myfile = “ERRNO.txt”; pf = fopen(myfile, “r”); if ( pf == NULL ) { char *cause = strerror(errno); printf(“Cannot open file %s. Errno: %d. Cause: %s\n”, myfile, errno, cause); return EXIT_FAILURE; } else { printf(“File %s open for reading”, myfile); } return EXIT_SUCCESS; } $ gcc -o io_strerror1 -std=c99 -pedantic io_strerror1.c $ ./io_strerror1 Cannot open file ERRNO.txt. Errno: 2. Cause: No such file or directory
The argument passed to strerror() does not have to be the errno variable as shown below: $ cat io_strerror2.c #include #include #include int main(void) { int i;
for (i=0; i < 10; i++) printf(” Errno: %d. Cause: %s\n”, i, strerror(i)); return EXIT_SUCCESS; } $ gcc -o io_strerror2 -std=c99 -pedantic io_strerror2.c $ ./io_strerror2 Errno: 0. Cause: Error 0 Errno: 1. Cause: Not owner Errno: 2. Cause: No such file or directory Errno: 3. Cause: No such process Errno: 4. Cause: Interrupted system call Errno: 5. Cause: I/O error Errno: 6. Cause: No such device or address Errno: 7. Cause: Arg list too long Errno: 8. Cause: Exec format error Errno: 9. Cause: Bad file number
X.7.4 feof() #include int feof(FILE *stream);
The FILE structure contains a field, the end-of-file indicator, indicating if the end of the file has been reached. The feof() function, declared in stdio.h, returns 0 if the end-of-file has not been reached. Otherwise, if the end-of-file indicator is set, it returns a nonzero value. Here is an example: $ cat feof.txt Line 1:hello Line 2:world $ cat io_feof.c #include #include int main(void) { FILE *pf = NULL; char *myfile = “feof.txt”; int array_len = 255; char s[array_len];
pf = fopen(myfile, “r”); if ( pf == NULL ) return EXIT_FAILURE; if ( feof(pf) ) printf(“End-of-file reached\n”); else printf(“End-of-file not reached\n”); /* read all characters from the stream */ while ( fgets(s, array_len, pf) != NULL ) printf(“String read=%s”, s ); if ( feof(pf) ) printf(“End-of-file reached\n”); else printf(“End-of-file not reached\n”); fclose(pf); return EXIT_SUCCESS; } $ gcc -o io_feof -std=c99 -pedantic io_feof.c $ ./io_feof End-of-file not reached String read=Line 1:hello String read=Line 2:world End-of-file reached
X.7.5 ferror() #include int ferror(FILE *stream);
The FILE structure contains a field, error indicator, indicating if an error has occurred while accessing a stream. The function ferror(), declared in stdio.h, returns 0 if the error indicator is set. Otherwise, it returns a nonzero value. In the following example, the error indicator is set after an attempt to write to a file
opened for reading: $ cat ferror.txt Line 1:hello Line 2:world $ cat io_ferror.c #include #include int main(void) { FILE *pf = NULL; char *myfile = “ferror.txt”; pf = fopen(myfile, “r”); if ( pf == NULL ) { perror(“Cannot open file”); return EXIT_FAILURE; } else { printf(“File %s open for reading\n”, myfile); } if ( ferror(pf) ) { printf(“Error indicator set\n”); return EXIT_FAILURE; } else { printf(“Error indicator not set\n”); } printf(“Attempt to write to a file opened for reading\n”); fprintf(pf, “Hello” ); /* ERROR */ if ( ferror(pf) ) { printf(“Error indicator set\n”); return EXIT_FAILURE; } else { printf(“Error indicator not set\n”); } fclose(pf); return EXIT_SUCCESS; } $ gcc -o io_ferror -std=c99 -pedantic io_ferror.c $ ./io_ferror
File ferror.txt open for reading Error indicator not set Attempt to write to a file opened for reading Error indicator set
X.7.6 clearerr() #include void clearerr(FILE *stream);
The clearerr() function clears the end-of-file and error indicators related to the given stream. The following example takes again the previous example and calls the function clearerr(). $ cat clearerr.txt Line 1:hello Line 2:world $ cat io_clearerr.c #include #include int main(void) { FILE *pf = NULL; char *myfile = “clearerr.txt”; pf = fopen(myfile, “r”); if ( pf == NULL ) { perror(“Cannot open file”); return EXIT_FAILURE; } else { printf(“File %s open for reading\n”, myfile); } if ( ferror(pf) ) { printf(“Error indicator set\n”); } else { printf(“Error indicator not set\n”); } printf(“Attempt to write to a file opened for reading\n”); fprintf(pf, “Hello” ); /* ERROR */
if ( ferror(pf) ) { printf(“Error indicator set\n”); } else { printf(“Error indicator not set\n”); } clearerr(pf); printf(“\nAfter calling clearerr()\n”); if ( ferror(pf) ) { printf(“Error indicator set\n”); return EXIT_FAILURE; } else { printf(“Error indicator not set\n”); } fclose(pf); return EXIT_SUCCESS; } $ gcc -o io_clearerr -std=c99 -pedantic io_clearerr.c $ ./io_clearerr File clearerr.txt open for reading Error indicator not set Attempt to write to a file opened for reading Error indicator set After calling clearerr() Error indicator not set
X.8 Buffers X.8.1 Buffered and unbuffered streams A portable C program does not have a direct access to a file but it accesses it through a stream that is associated with a file after the fopen() function is called. A buffer, whose size is BUFSIZ, is then automatically allocated to the stream. C library functions declared in stdio.h deal with streams (see Figure X‑1). A write request (output) transmits data to the buffer before being transferred to the file. A read request (input) gets data from the stream if present; otherwise, it retrieves it from the file and copies it into the buffer before being accessed by the caller. For example, the first call to an input function, let say fgets(), invokes a system call to request the operating
system to get a series of characters (block) from a physical file and place them into the buffer. The next calls to fgetc() may read the next characters present in the buffer without requesting the operating system to perform additional I/O, which generally makes I/O requests more efficient. Likewise, the function fprintf() write characters to the buffer before writing them to the file. The buffer is cleaned after its contents are actually written to the file (buffer is said to be flushed) depending on the buffering mode. When a file is opened, its associated stream is fully buffered unless it is associated with an interactive device (terminal). That is, if the file is a true file (with physical storage) in which data can be stored, it is fully buffered. Which means the set of characters (within the buffer) forming a block are transmitted to the file or to the caller when buffer is full. To highlight how buffer works, we will resort to the POSIX function sleep() that is not a C function. In POSIX environment (UNIX systems) and UNIX-based systems (Linux, BSD systems), it is declared in unistd.h. On Microsoft windows systems, Sleep() (note the capital letter S) is declared in the header file windows.h. The call sleep(n) tells the program to become inactive for n seconds. For Microsoft windows systems, use Sleep(n). $ cat io_buffer1.c #include #include #include int main(void) { FILE *pf; char *myfile = “info_buffer.txt”; pf = fopen(myfile, “w”); if ( pf == NULL ) { printf(“Cannot open file %s for writing\n”, myfile); return EXIT_FAILURE; } else printf(“file %s opened for writing\n”, myfile); fprintf(pf, “hello\n”); /* write to the file */ printf(“Sleep 10 s. On another window type cat %s. You will see nothing.\n”, myfile); sleep(10); printf(“Now type again cat %s.\n”, myfile); fclose(pf); return EXIT_SUCCESS;
} $ gcc -o io_buffer1 -std=c99 -pedantic io_buffer1.c $ ./io_buffer1 file info_buffer.txt opened for writing Sleep 10 s. On another window type cat info_buffer.txt. You will see nothing. Now type again cat info_buffer.txt.
Explanation: o The fopen() call opens the file info_buffer.txt for writing. If it exists, it is truncated. Otherwise, it is created. The stream is fully buffered. o The call fprintf(pf,“hello\n”) writes the string “hello\n” to the stream associated with the file info_buffer.txt. o We ran the program and while the program was sleeping, in another terminal, we type cat info_buffer.txt to display the contents of the file info_buffer.txt. We could see nothing in it as if nothing was written. The reason is the function fprintf() wrote data into the buffer and the characters it contains was not transmitted yet to the file because the stream was fully buffered. o Next, the program awakened after 10 s, the stream was closed and the program terminated. Again, we ran the command cat info_buffer.txt. That time, the string was actually written to it. The reason is since the buffer was not full, nothing was transmitted to the file but after closing the file, data was sent to the file. Normally, you do not have to care about buffers but it may happen you need to flush it or disable it. The following sections describe the three buffering mode for streams, how to change the buffering mode and how to flush a buffer for an output stream.
X.8.2 setvbuf() Until C95: #include int setvbuf(FILE * stream,char *buf, int mode, size_t sz);
As of C99: #include int setvbuf(FILE * restrict stream,char * restrict buf, int mode, size_t sz);
Instead of using the built-in buffer, programmers can provide their own buffer through the function setvbuf() declared in the header file stdio.h. The function takes four parameters:
o A stream. o A pointer to a memory block buf that will replace the default buffer. o mode is a macro defining the way the stream will be buffered. When buffered, characters are not transmitted as soon as read from or written to the file but are copied to the buffer to form a block (group of characters). Thus, data is transmitted block-byblock not character-by-character. The argument mode takes one of the following values: ▪ _IOFBF: I/O requests are fully buffered. The transfer of characters from or to the file occurs when the buffer is full. For an output stream, the contents of the buffer are written to the file (buffer is flushed) when the buffer is full or an input request from [92] an unbuffered stream (also associated with the file) occurs. ▪ _IOLBF: I/O requests will be line buffered. That is, the transfer of characters from or to the file occurs when the newline character is encountered. For an output stream, the buffer is flushed (written to the file) as soon the newline character is encountered, the buffer is full, or an input request from an unbuffered stream (associated with the file) occurs. ▪ _IONBF: the buffer is not used. Characters are transmitted as soon as read from to written to the file. o The size of the buffer specified by sz. If the argument buf is a null pointer, the function may allocate its own buffer whose size may be defined by sz: it depends on the implementation. For example, on some systems, if buf is a null pointer, the stream is unbuffered. As a consequence, do not pass a null pointer for portability reasons. Otherwise, consult the documentation of your system.
The buffer pointed to by buf must have storage duration greater or equal to that of the stream. If you allocate a local array as a buffer (object with automatic storage duration), do not forget to close the stream before the end of scope of the array. Otherwise, the behavior is undefined since the array is destroyed as its scope is left. The buffer must remain allocated until the stream is closed.
The function setvbuf() returns zero if successful. On error (unexpected argument), it returns a nonzero value. If the requested mode is not implemented, an error is returned. The function setvbuf() is supposed to be called after a file is opened but before performing any access to the stream. $ cat io_setvfbuf.c #include
#include int main(void) { FILE *pf = NULL; char *myfile = “setvbuf.txt”; size_t array_len = 1024; char s[array_len]; char c; pf = fopen(myfile, “w”); if ( pf == NULL ) { perror(“Cannot open file”); return EXIT_FAILURE; } if ( setvbuf(pf, s, _IOLBF, array_len)) { perror(“setvbuf”); return EXIT_FAILURE; } else { printf(“setvbuf successful\n”); } fprintf(pf, “Hello world\n”); fclose( pf ); return EXIT_SUCCESS; } $ gcc -o io_setvbuf -std=c99 -pedantic io_setvbuf.c $ ./io_setvbuf setvbuf successful $ cat setvbuf.txt Hello world
Figure X‑1 Data transfer between stream and file
X.8.3 setbuf() Until C95: #include int setbuf(FILE *stream, char *buf);
As of C99: #include
int setbuf(FILE * restrict stream, char * restrict buf);
The setbuf() function is equivalent to: #include setvbuf(stream, buf, _IOFBF, BUFSIZ);
However, if buf is a null pointer, the I/O requests will be unbuffered. The function is equivalent to: #include setvbuf(stream, NULL, _IONBF, 0);
The macro BUFSIZ is defined in the header file stdio.h. Its value depends on the operating system. It is as least 256.
X.8.4 fflush() Until C95: #include int fflush(FILE *stream);
As of C99: #include int fflush(FILE * restrict stream);
Flushing a buffer means sending characters it contains to the file that is associated with if they are not written yet. Flushing buffer is supposed to be performed on an output stream, which means stream is an output or input/output stream (file opened for writing, reading/writing or updating). The buffer is normally flushed if one of following condition is met: o The stream is closed, o The program terminates o Buffer is full o An input request reads an unbuffered stream associated with the file o For line buffered stream, the buffer is also flushed when a newline character is
encountered. The fflush() function dumps output data of the buffer to the file. Characters in the buffer that have not been transmitted yet to the file are sent to the file. If stream is a null pointer, buffers of all streams are flushed. The function returns zero if successful. Otherwise, it returns EOF. Here is an example: $ cat io_fflush.c #include #include int main(void) { FILE *pf = NULL; char *myfile = “flush.txt”; pf = fopen(myfile, “w”); if ( pf == NULL ) { perror(“Cannot open file”); return EXIT_FAILURE; } if ( setvbuf(pf, s, _IOLBF, array_len)) { perror(“setvbuf”); return EXIT_FAILURE; } else { printf(“setvbuf successful\n”); } fprintf(pf, “Hello world\n”); fclose( pf ); return EXIT_SUCCESS; }
X.9 freopen() Until C95: #include
FILE *freopen(const char *filename,const char *mode,FILE *stream);
As of C99: #include FILE *freopen(const char *restrict filename,const char *restrict mode,FILE * restrict stream);
The freopen() opens the file identified by filename as the function fopen() would do and associates it with an existing stream pointed to by stream. The stream is first closed before perform the new binding. An error occurring while closing it is ignored. The parameters mode is a string as used by fopen() and described in Table X‑1. If filename is a null pointer, the current file remains associated with stream but with the new mode passed as an argument. The new mode may be rejected causing the call to freopen() to fail. The function returns stream if successful or NULL on failure. The following example associates the stream stdout, normally bound to the standard output, to the file freopen.log: $ cat freopen.c #include #include int main(void) { FILE *pf = NULL; char *myfile = “freopen.log”; printf(“This line is written to the terminal\n”); if ( freopen(myfile, “w”, stdout) == NULL ) { perror(“Cannot rebind stdout”); return EXIT_FAILURE; } printf(“This line is written to the log file %s\n”, myfile); return EXIT_SUCCESS; }
$ gcc -o freopen -std=c99 -pedantic freopen.c $ ./freopen This line is written to the terminal $ cat freopen.log This line is written to the log file freopen.log
The first call to printf() writes to the standard output associated with the stream stdout (see section X.10). The second call to printf() still writes to the stream stdout but this time it is associated with the file freopen.log.
X.10 Standard input, standard input, standard error X.10.1 Definitions By default, at startup of a program, three streams are automatically created: o stdin is an input stream associated with the standard input. It is fully buffered unless it [93] is associated with an interactive device (generally a terminal ). If it is associated with a terminal, it can be unbuffered or line-buffered depending on the implementation. By default, the standard input is synonym for the keyboard. o stdout with is an output stream associated with the standard output. It is fully buffered if not associated with a terminal. If it is associated with a terminal, it may be unbuffered or line-buffered depending on the implementation. By default, the standard output is synonym for the monitor. o stderr with is an output stream associated with the standard output. It is used to display error messages. It may be unbuffered or line-buffered. By default, the standard error is synonym for the monitor.)
X.10.2 Data reading X.10.2.1 getchar() #include int getchar(void);
The function is equivalent to: int getc(stdin);
It gets the next character from the stream characters typed: $ cat getc.c #include
stdin.
The following example prints the
#include int main(void) { int c; printf(“Type characters and press \n”); while ( ( c = getchar() ) != EOF ) { if ( c == ‘\n’ ) { printf(“char=newline (code %d)\n”, c ); printf(“\nType characters\n”); } else printf(“char=%c (code %d)\n”,c, c ); } printf(“END OF PROGRAM\n”); return EXIT_SUCCESS; } $ gcc -o getc -std=c99 -pedantic getc.c $ ./getc Type characters and press abcd char=a (code 97) char=b (code 98) char=c (code 99) char=d (code 100) char=newline (code 10) Type characters and press
END OF PROGRAM $
Through this example, we can see the stream stdin is line-buffered in our operating system (Linux operating system). The function getchar() can retrieve characters from the buffer once a newline character is encountered. On UNIX and UNIX-based systems (Linux, BSD systems), the key (press d while holding the key CTRL) is synonym for end-of-file for the standard input. In our example, after pressing , the function getchar() gets the end-of-file indicator (EOF) terminating the while loop.
You may wonder why we did not declare the variable c as type char. The rationale is the function getchar() return a value of type int that can be EOF (negative integer, usually -1). If we had declared it as char, we might have been in trouble because the type char can be signed or unsigned depending on the implementation. X.10.2.2 gets() #include char *gets(char *s);
The gets() function retrieves characters from the stream stdin until the end-of-file (the user hits ) or a newline character is encountered and copies them into the memory block pointed to by s. The newline character is discarded and the string copied is ended with a null character. If returns the value of the pointer s if successful. If an error occurs, it returns a null pointer. If the end-of-file is encountered and no character has been read, it also returns a null pointer. Even though, often used by beginners, this function should be avoided because it has a harmful weakness. C11 removed it from the standard. The memory area provided may be too small to hold retrieved data, which would cause a buffer overflow. Remember that you have no way to determine the size of a memory area from a pointer alone. In the following example, we provide an array that can hold at most five characters while ten characters are copied into it! $ cat io_gets.c #include #include int main(void) { int c; char s[5]; printf(“Type characters and press \n”); gets(s); /* dangerous */ printf(“%s\n”,s ); return EXIT_SUCCESS; } $ gcc -o io_gets -std=c99 -pedantic io_gets.c $ ./io_gets Type characters and press
abcdefghi string read:abcdefghi
Use instead the function fgets() to read the standard input: #include char *fgets(char *s, int n, stdin);
The previous example should be written as follow: $ cat io_gets2.c #include #include int main(void) { int c; char s[5]; printf(“Type characters and press \n”); fgets(s, 4, stdin); printf(“%s\n”,s ); return EXIT_SUCCESS; } $ gcc -o io_gets2 -std=c99 -pedantic io_gets2.c $ ./io_gets2 Type characters and press abcdefghi abc
X.10.2.3 scanf() The function call scanf(fmt, …) is equivalent to int fscanf(stdin, fmt …).
X.10.3 Writing X.10.3.1 putchar() #include int putchar(int c);
The function call putchar(c) is equivalent to putc(c, stdout).
It writes a character to the output stream stdout. The following example writes one character to the standard output: $ cat io_putchar.c #include #include int main(void) { int c = ‘A’; putchar(c); printf(“\n”); return EXIT_SUCCESS; } $ gcc -o io_putchar -std=c99 -pedantic io_putchar.c $ ./io_putchar A
X.10.3.2 puts() #include int puts(const char *s);
The function call puts(s) is equivalent to char *fputs(s, stdout). It writes to the stream stdout (standard output) the string pointed to by s. For example: $ cat io_puts.c #include #include int main(void) { puts(“Hello world\n”); return EXIT_SUCCESS; } $ gcc -o io_puts -std=c99 -pedantic io_puts.c $ ./io_puts Hello world
X.10.3.3 printf()
The call printf(fmt, …) is equivalent to fprintf(stdout, fmt, …).
X.11 Removing a file #include int remove(const char *filename);
The function deletes the file known under the name filename. It returns zero if successful. Otherwise, it returns a non-zero value. The following examples remove the file testfile.txt created by a UNIX shell command: $ echo hello > testfile.txt $ cat testfile.txt hello $ cat io_remove.c #include #include int main(void) { char *myfile = “testfile.txt”; if ( remove(myfile) ) { perror(“Cannot remove file”); return EXIT_FAILURE; } else { printf(“File %s removed\n”, myfile); } return EXIT_SUCCESS; } $ echo hello > testfile.txt $ cat testfile.txt hello $ gcc -o io_remove -std=c99 -pedantic io_remove.c $ ./io_remove File testfile.txt removed $ cat testfile.txt cat: cannot open testfile.txt: No such file or directory
Under a UNIX shell, the command echo hello > testfile.txt creates the file testfile.txt if it does not exist (truncates it if it exists), and writes the word hello to the file testfile.txt. The command
cat testfile.txt displays the contents of the file.
X.12 Renaming a file #include int rename(const char *filename, const char *new_filename);
The function renames the file identified by the string filename to new_filename. If there is an existing file with the name new_filename, the behavior depends on the implementation. It returns zero if successful. Otherwise, it returns a non-zero value. The following example renames the file testfile.txt to testfile2.txt: $ cat io_rename.c #include #include int main(void) { char *myfile = “testfile.txt”; char *myfile_new = “testfile2.txt”; if ( rename( myfile, myfile_new ) ) { perror(“Cannot rename file”); return EXIT_FAILURE; } else { printf(“File %s renamed to %s\n”, myfile, myfile_new); } return EXIT_SUCCESS; } $ gcc -o io_rename -std=c99 -pedantic io_rename.c $ ./io_rename Cannot rename file: No such file or directory $ echo hello > testfile.txt $ cat testfile.txt hello $ ./io_rename File testfile.txt renamed to testfile2.txt $ cat testfile.txt cat: cannot open testfile.txt: No such file or directory $ cat testfile2.txt
hello
X.13 Temporary files It often happens that programmers need to store data in temporary files used for a specific processing and remove them when performed. Instead of creating a file with fopen(), close it (with fclose()) and then remove it (with remove()), programmers may resort to the function tmpfile(), declared in stdio.h, that creates a temporary file and returns its associated stream. The file is automatically removed when the program terminates or when closed. #include FILE *tmpfile(void);
The temporary file is opened with mode “wb+”. If the temporary file cannot be created, the function returns a null pointer. For example: $ cat io_tmpfile.c #include #include int main(void) { FILE *pf = NULL; const int array_len = 255; char s[ array_len ]; if ( ( pf = tmpfile() ) == NULL ) { /* temporary file created */ perror(“Cannot create temp file”); return EXIT_FAILURE; } fprintf(pf, “Temp file created for tests\n”); rewind( pf ); fgets(s, array_len, pf ); printf(“String read: %s\n”, s); fclose ( pf ); /* temporary file removed */ return EXIT_SUCCESS; } $ gcc -o io_tmpfile -std=c99 -pedantic io_tmpfile.c $ ./io_tmpfile String read: Temp file created for tests
X.14 Wide and Multibyte I/O functions
Table X‑9 Byte and wide-characters I/O functions
X.14.1 Stream orientation When a stream is created by fopen(), it has no orientation but after its first use, it takes an orientation. If a wide-character I/O function accesses it, it has a wide orientation. If a byte I/O function accesses it, it has a byte orientation. The stream orientation does not change unless the function freopen() or fwide() is called: freopen() removes its current orientation making it with no orientation while fwide() can set it to a specific orientation if it has no orientation. The fwide() function has the following prototype: As of C90 Amendment 1: #include #include int fwide(FILE *stream, int mode);
If mode is a positive integer, the stream becomes wide-oriented if the stream has no orientation. If mode is a negative integer, the stream becomes byte-oriented if the stream has no orientation. If mode is zero, the orientation of the stream is left unchanged: this mode is used to query the current orientation of a stream.
The function returns a positive integer if the stream is wide-oriented. The function returns a negative integer if the stream is byte-oriented. The function returns zero if the stream has no orientation. Keep in mind, the function fwide() does alter the orientation of a stream if it is already oriented. The only way to do it is to invoke freopen() that closes and reopens the stream with no orientation, and then call fwide() or an I/O function that will set a new orientation. Never use wide-character I/O functions with byte-oriented stream and byte I/O functions with wide-oriented stream. Consequently, do not mix byte I/O functions and widecharacter functions for the same stream unless you call freopen() to reset the orientation. The following example gets the orientations of the stream stdout before being used, after it is accessed and attempts to modify its orientation by calling fwide() (unsuccessfully): $ ./io_fwide.c #include #include #include int main(void) { int stream_orientation; // Orientation before printing stream_orientation = fwide(stdout, 0); fprintf(stdout, “Orientation of stdout before accessing stdout: %d\n”, stream_orientation); // Orientation after printing stream_orientation = fwide(stdout, 0); fprintf(stdout, “Orientation of stdout after accessing stdout: %d\n”, stream_orientation); stream_orientation = fwide(stdout, 1); fprintf(stdout, “Orientation of stdout after fwide(): %d\n”, stream_orientation); return EXIT_SUCCESS; } $ gcc -o io_fwide -std=c99 -pedantic io_fwide.c $ ./io_fwide Orientation of stdout before accessing stdout: 0 Orientation of stdout after accessing stdout: -1 Orientation of stdout after fwide(): -1
X.14.2 Files and encodings In order to ease the processing of extended characters, wide characters are used internally by a C program as units but are written to files as multibyte characters. Likewise, wide characters are not directly read as such by wide-character input functions from a file but as multibyte characters. The rationale is text and binary files are series of multibyte [94] characters . Therefore, wide-character output functions convert wide characters to multibyte characters before sending them to the stream. Conversely, wide-character input functions read multibyte characters from a stream and then convert them to wide characters.
X.14.3 Formatted wide-character I/O functions X.14.3.1 fwprintf() Since C90 Amendment 1 (C95): #include #include int fwprintf(FILE * stream, const wchar_t *fmt, …);
As of C99: #include #include int fwprintf(FILE *restrict stream, const wchar_t *restrict fmt, …);
The function fwprintf() is the wide-character version of the function fprintf(). There are minor differences summarized in Table X‑10. The function returns the number of wide characters written or a negative integer if an encoding error occurs.
Table X‑10 Differences between fprintf() and fwprintf()
The following program writes the wide string 2500 € to the file wtext: $ cat io_fwprintf1.c #include #include #include #include
int main(void) { char *myfile=“wtext1”; char *mylocale=“en_US.UTF-8”; FILE *fh; int n; if ( ! setlocale(LC_ALL, mylocale) ) { printf(“Cannot set locale %s\n”, mylocale); exit(EXIT_FAILURE); } if ( ! (fh = fopen(myfile, “w”)) ) { perror(“open file”); exit(EXIT_FAILURE); } n = fwprintf(fh, L”2000 \u20AC\n”); // \u20AC is the symbol of the Euro currency printf(“Wide characters written: %d\n”, n ); fclose(fh); return EXIT_SUCCESS; } $ gcc -o io_fwprintf1 -std=c99 -pedantic io_fwprintf1.c $ ./io_fwprintf1 Wide characters written: 7 $ cat wtext1 2000 €
The following program writes completely the array s holding the multibyte string “2500 € + 10 € = 2510 €” to the file wtext2 (after converting it to wide characters by fwprintf()), and writes the six first wide characters to the file after conversion of the multibyte string to wide characters. $ cat io_fwprintf2.c #include #include #include #include #include int main(void) {
char *myfile=“wtext2”; char *mylocale=“en_US.UTF-8”; FILE *fh; int n; int blen, wlen; char s[] = “2500 \u20AC + 10 \u20AC = 2510 \u20AC”; // multibyte string if ( ! setlocale(LC_ALL, mylocale) ) { printf(“Cannot set locale %s\n”, mylocale); exit(EXIT_FAILURE); } if ( ! (fh = fopen(myfile, “w”)) ) { perror(“open file”); exit(EXIT_FAILURE); } wlen = mbstowcs(NULL, s, 0); // nb of wide chars blen = strlen(s); // number of bytes printf(“The mb string s has length %d (multibyte chars)\n”, blen); printf(“The mb string s has %d characters\n\n”, wlen); n = fwprintf(fh, L”%s”, s); printf(“All wide characters converted from s requested to be written. Actually written: %d\n”, n ); fwprintf(fh, L”\n”); n = fwprintf(fh, L”%.6s”, s); printf(“6 multibyte characters converted from s requested to be written. Actually Written: %d\n”, n ); fwprintf(fh, L”\n”); fclose(fh); return EXIT_SUCCESS; } $ gcc -o io_fwprintf2 -std=c99 -pedantic io_fwprintf2.c $ ./io_fwprintf2 The mb string s has length 28 (multibyte chars) The mb string s has 22 characters All wide characters converted from s requested to be written. Actually written: 22 6 multibyte characters converted from s requested to be written. Actually Written: 6
$ cat wtext2 2500 € + 10 € = 2510 € 2500 €
The following example is same as the previous except the array s holds wide characters instead of multibyte characters: $ cat io_fwprintf3.c #include #include #include #include #include int main(void) { char *myfile=“wtext3”; char *mylocale=“en_US.UTF-8”; FILE *fh; int n; int wlen; wchar_t s[] = L”2500 \u20AC + 10 \u20AC = 2510 \u20AC”; // wide string if ( ! setlocale(LC_ALL, mylocale) ) { printf(“Cannot set locale %s\n”, mylocale); exit(EXIT_FAILURE); } if ( ! (fh = fopen(myfile, “w”)) ) { perror(“open file”); exit(EXIT_FAILURE); } wlen = wcslen(s); // nb of wide chars printf(“The wide string s has length %d\n”, wlen); n = fwprintf(fh, L”%ls”, s); // write all wide characters of s fwprintf(fh,L”\n”); // print newline printf(“All wide characters from s requested to be written. Written: %d\n”, n ); n = fwprintf(fh, L”%.6ls”, s); // write the 6 first wide characters
printf(“6 wide characters from s requested to be written. Written: %d\n”, n ); fwprintf(fh,L”\n”); fclose(fh); return EXIT_SUCCESS; } $ gcc -o io_fwprintf3 -std=c99 -pedantic io_fwprintf3.c $ ./io_fwprintf3 The wide string s has length 22 All wide characters from s requested to be written. Written: 22 6 wide characters from s requested to be written. Written: 6 $ cat wtext3 2500 € + 10 € = 2510 € 2500 €
You have noticed the functions fprintf() and fwprintf() perform conversions of the arguments before writing the result. Depending on the argument is a multibyte or wide character, a multibyte or a wide string, you have to use %c, %lc, %s or %ls in the format string as summarized by Table X‑11 and Table X‑12.
Table X‑11 Modifier l used with %c in fprintf() anf fwprintf()
Table X‑11 shows if the specifier %c is used, the argument of type int is converted to unsigned char by fprintf(), to wchar_t by fwprintf() before being written. Table X‑12 shows how the functions fprintf() and fwprintf() convert an argument that is a
multibyte string or a wide string before writing it.
Table X‑12 Modifier l used with %s in fprintf() and fwprintf()
The example below illustrates Table X‑11 and Table X‑12. $ ./io_fprintf_fwprintf.c #include #include #include #include #include int main(void) { char multibyte_currency[5] = “\xE2\x82\xAC”; // UTF-8 Multibyte char (Euro) wchar_t wide_currency = L’\u20AC’; // Wide Char (Euro) char *mylocale = “en_US.UTF-8”; if ( ! setlocale(LC_ALL, mylocale) ) { printf(“Locale %s not available\n”, mylocale); exit(EXIT_FAILURE); } printf(“Argument is multibyte character\n”); fprintf(stdout,“1 %s\n”, multibyte_currency); //OK. input multibyte char
fprintf(stdout,“2 %ls\n”, multibyte_currency); // KO: input wide char required fwprintf(stderr,L”3 %s\n”, multibyte_currency); // OK: input multibyte char fwprintf(stderr,L”4 %ls\n”, multibyte_currency); // KO: input wide char required printf(“–––—\n”); printf(“Argument is wide character\n”); fprintf(stdout,“1 %c\n”, wide_currency); /*KO. Input: int Output: unsigned char*/ fprintf(stdout,“2 %lc\n”, wide_currency); /* OK. Input: wint_t Output: char * */ fwprintf(stderr,L”3 %c\n”, wide_currency); /* OK. Input: input int Output: wchar_t */ fwprintf(stderr,L”4 %lc\n”, wide_currency); /* OK. Input: wint_t Output: wchar_t */ return EXIT_SUCCESS; } $ gcc -o io_fprintf_fwprintf -std=c99 -pedantic io_fprintf_fwprintf.c $ ./io_fprintf_fwprintf Argument is multibyte character 1 € 3 € –––— Argument is wide character 1 � 2 € 3 € 4 €
Explanation: o As wide-character I/O functions and byte I/O functions must not apply to the same stream, wide-character output functions write to the stream stderr and byte output functions write to the stream stdout. o The array multibyte_crrency holds the multibyte character €. We display it using the specifier %s and %ls in the functions fprintf() and fwprintf(). Two function calls failed because the argument was expected to have type wchar_t: ▪ fprintf(stdout,“2 %ls\n”, multibyte_currency); ▪ fwprintf(stdout,L”4 %ls\n”, multibyte_currency); o The variable wide_crrency holds the wide character €. We display it using the specifier
%c and %lc in the functions fprintf() and fwprintf(). One function calls failed because the
argument is converted to type unsigned char before being written: fprintf(stdout,“1 %c\n”, wide_currency);
o The call fwprintf(stderr,L”3 %c\n”, wide_currency) worked because the expected type of the argument is int that is large enough to represent the value of the object wide_currency. X.14.3.2 vfwprintf() Since C90 Amendment 1 (C95): #include #include #include int vfwprintf(FILE *stream, const wchar_t *fmt, va_list arg);
As of C99: #include #include #include int vfwprintf(FILE *restrict stream, const wchar_t *restrict fmt, va_list arg);
The function vfwprintf() is the wide-character version of the function vfprintf(). It has the same behavior as fwprintf(). Instead of a variable list of arguments, it uses the parameters arg of type va_list that must be initialized by the macro va_start(). As the function does not invoke the macro va_end, the call va_end(arg) should be inserted after invoking vfwprintf(). The following example writes wide strings to the file logerror: $ cat io_vfwprintf.c #include #include #include #include #include #define LOG_FILE “logerror” void log_error(const wchar_t *fmt,…) { va_list arg; static FILE *logfh = NULL;
if ( ! logfh ) // if logh is a null pointer, set it to a valid stream if ( ! (logfh = fopen(LOG_FILE, “a”)) ) { // cannot create logfile fprintf( stderr, “cannot create logfile %s”, LOG_FILE ); perror(“Open logfile”); logfh = stdout; // use standard output instead } va_start(arg, fmt); vfwprintf(logfh, fmt, arg); va_end(arg); } int main(void) { char *mylocale=“ja_JP.UTF-8”; //Japenese locale wchar_t message[] = L”テスト“; if ( ! setlocale(LC_ALL, mylocale) ) { printf(“Cannot set locale %s\n”, mylocale); exit(EXIT_FAILURE); } log_error(L”情報: %ls\n”, message); return EXIT_SUCCESS; } $ gcc -o vfwprintf -std=c99 -pedantic vfwprintf.c $ ./vfwprintf $ cat logerror 情報: テスト
X.14.3.3 swprintf() Since C90 Amendment 1 (C95): #include int swprintf(wchar_t *s, size_t n, const wchar_t *fmt, …);
As of C99: #include int swprintf(wchar_t *restrict s, size_t n, const wchar_t *restrict fmt, …);
The function swprintf() is the wide-character version of the function snprintf(). It has the same behavior as fwprintf(). Instead of writing to a stream, it writes to a memory area pointed to by s at most n wide characters (including the null wide character). The functions appends a null wide character to the array s unless n is zero. It returns the number of wide character written (excluding the null wide character in the count) or a negative integer if an encoding error occurs or if the number of characters to be written, as specified by the format fmt, is greater than or equal to n. It could use used to convert arguments to wide or multibyte string as in the following example: $ cat io_swprintf.c #include #include #include #include int main(void) { char multibyte_currency[5] = “\xE2\x82\xAC”; // UTF-8 Multibyte char (Euro) char multibyte_output[5]; wchar_t wide_output[2]; char *mylocale = “en_US.UTF-8”; if ( ! setlocale(LC_ALL, mylocale) ) { printf(“Locale %s not available\n”, mylocale); exit(EXIT_FAILURE); } //null wide character appended to wide_output swprintf(wide_output, 2, L”%s\n”, multibyte_currency); printf(“Input is mbs: %s. Output is wcs: %lc (code %X)\n”, multibyte_currency, wide_output[0], wide_output[0]); return EXIT_SUCCESS; } $ gcc -o io_swprintf -std=c99 -pedantic io_swprintf.c $ ./io_swprintf Input is mbs: €. Output is wcs: € (code 20AC)
X.14.3.4 vswprintf() Since C90 Amendment 1 (C95): #include #include int vswprintf(wchar_t *s, size_t n, const wchar_t *fmt, va_list arg);
As of C99: #include #include int vswprintf(wchar_t *restrict s, size_t n, const wchar_t *restrict fmt, va_list arg);
The function vswprintf() is the wide-character version of the function vsprintf(). It has the same behavior as swprintf(). Instead of a variable list of arguments, it uses the parameters arg of type va_list that must be initialized by the macro va_start(). Since the function does not invoke the macro va_end, the call va_end(arg) should be used after invoking vwprintf(). X.14.3.5 wprintf() Since C90 Amendment 1 (C95): #include int wprintf(const wchar_t * fmt,…);
As of C99: #include int wprintf(const wchar_t * restrict fmt,…);
The function wprintf() is the wide-character version of the function printf(). It has the same behavior as fwprintf(). Instead of writing to a file, it writes to the standard output (stdout). X.14.3.6 vwprintf() Since C90 Amendment 1 (C95): #include #include int vwprintf(const wchar_t *fmt, va_list arg);
As of C99:
#include #include int vwprintf(const wchar_t *restrict fmt, va_list arg);
The function vwprintf() is the wide-character version of the function vprintf(). It has the same behavior as wprintf(). Instead of a variable list of arguments, it uses the parameters arg of type va_list that must be initialized by the macro va_start(). Since the function does not invoke the macro va_end, the call va_end(arg) should be used after invoking vwprintf(). X.14.3.7 fwscanf() Since C90 Amendment 1 (C95): #include #include int fwscanf(FILE *stream, const wchar_t *format, …);
As of C99: #include #include int fwscanf(FILE * restrict stream, const wchar_t * restrict format, …);
The function fwscanf() is the wide-character version of the function fscanf(). There are minor differences summarized by Table X‑13. The fwscanf() function returns the number of matched elements copied to the objects pointed to by the arguments or EOF if the end-offile is reached or an error occurs. The function returns if one of the following condition occurs: o The end-of-file is reached: it returns EOF. o An error occurs: it returns EOF. o Matching failure: it returns the number of items matched so far (0 is no item matched). o All the format fmt has been scanned: it returns the number of items that have been successfully matched.
Table X‑13 Differences between fscanf() and fwscanf()
The functions fscanf() and fwscanf() perform conversions of the multibyte or wide characters read from the input stream and assign the resulting converted characters to the objects
pointed to by the arguments. Table X‑14 shows how the functions convert the bytes reads from the input stream if the specifier %nc (where n is the width; if n is omitted, it takes the value of 1) is used with or without the length modifier l. For example, if the specifier %lc is used, the function fscanf() reads multibyte characters, converts them to wide characters before copying the resulting wide string to the memory area pointed to by the corresponding argument.
Table X‑14 Conversion for %c and %lc performed by fscanf() and fwscanf()
Table X‑15 Conversion for %s and %ls performed by fscanf() and fwscanf()
Table X‑15 shows how the functions convert the matched items (multibyte or wide characters) if the specifier %s is used with or without the length modifier l. For example, if the specifier %ls is used, the function fscanf() reads multibyte characters, converts them to wide characters before copying the resulting wide string to the memory area pointed to by corresponding argument. The following program reads a file encoded with UTF-8 and displays the elements retrieved: $ cat io_fwscanf1.c #include #include #include #include #include #define NB_EXPECTED_ELT 4 /* number of matching items */ int main(void) { FILE *pf = NULL; char *myfile = “info_unicode1.dat”; // input file int num, nb_elt; float val; wchar_t name[64]; wchar_t currency; char *mylocale = “en_US.UTF-8”; if ( ! setlocale(LC_ALL, mylocale) ) { printf(“Locale %s not available\n”, mylocale); exit(EXIT_FAILURE); } if ( ( pf = fopen(myfile, “r”) ) == NULL ) { printf(“Cannot open file %s for reading\n”, myfile); return EXIT_FAILURE; } else printf(“file %s opened for reading\n”, myfile); while ( ( nb_elt = fwscanf(pf, L”Amount %d: %10f %lc %64ls\n”, &num, &val, ¤cy, name)) > 0 ) {
if ( nb_elt != NB_EXPECTED_ELT ) printf(“Input stream badly formed. Matching elements: %d\n”, nb_elt); else printf(“ID=%d, value=%f currency=%lc (code %X) name=%ls\n”, num, val, currency, currency, name ); } fclose(pf); return EXIT_SUCCESS; } $ gcc -o io_fwscanf1 -std=c99 -pedantic io_fwscanf1.c
Suppose we feed the following input file info_unicode1.dat into our program: $ cat info_unicode1.txt Amount 1: 1000 € (Euro) Amount 2: 1000 ₹ (Indian_rupee) Amount 3: 1000 $ (Dollar)
We would get something like this: $ ./io_fwscanf1 file info_unicode1.dat opened for reading ID=1, value=1000.000000 currency=€ (code 20AC) name=(Euro) ID=2, value=1000.000000 currency=₹ (code 20B9) name=(Indian_rupee) ID=3, value=1000.000000 currency=$ (code 24) name=(Dollar)
X.14.3.8 vfwscanf() Since C90 Amendment 1 (C95): #include #include #include int vfwscanf(FILE *stream, const wchar_t *fmt, va_list arg);
As of C99: #include #include #include int vfwscanf(FILE *restrict stream, const wchar_t *restrict fmt, va_list arg);
The function vfwscanf() is the wide-character version of the function vfscanf(). It has the
same behavior as fwscanf(). Instead of a variable list of arguments, it uses the parameters arg of type va_list that must be initialized by the macro va_start(). Since the function does not invoke the macro va_end, the call va_end(arg) should be used after invoking vfwscanf(). X.14.3.9 swscanf() Since C90 Amendment 1 (C95): #include int swscanf(const wchar_t *s, const wchar_t *fmt, …);
As of C99: #include int swscanf(const wchar_t * restrict s, const wchar_t * restrict fmt, …);
The function swscanf() is the wide-character version of the function sscanf(). It has the same behavior as fwscanf(). Instead of reading items from a stream, it reads input from a wide string pointed to by s. X.14.3.10 vswscanf() Since C90 Amendment 1 (C95): #include #include int vswscanf(const wchar_t *s, const wchar_t *fmt, va_list arg);
As of C99: #include #include int vswscanf(const wchar_t *restrict s, const wchar_t *restrict fmt, va_list arg);
The function vswscanf() is the wide-character version of the function vsscanf(). It has the same behavior as swscanf(). Instead of a variable list of arguments, it uses the parameters arg of type va_list that must be initialized by the macro va_start(). Since the function does not invoke the macro va_end, the call va_end(arg) should be used after invoking vswscanf(). X.14.3.11 wscanf() Since C90 Amendment 1 (C95):
#include int wscanf(const wchar_t *fmt, …);
As of C99: #include int wscanf(const wchar_t * restrict fmt, …);
The function wscanf() is the wide-character version of the function scanf(). It has the same behavior as fwscanf(). Instead of reading to a stream, it gets data from the standard input (stdin). X.14.3.12 vwscanf() Since C90 Amendment 1 (C95): #include #include int vwscanf(const wchar_t *fmt, va_list arg);
As of C99: #include #include int vwscanf(const wchar_t * restrict fmt, va_list arg);
The function vwscanf() is the wide-character version of the function vscanf(). It has the same behavior as wscanf(). Instead of a variable list of arguments, it uses the parameters arg of type va_list that must be initialized by the macro va_start(). Since the function does not call the macro va_end, the call va_end(arg) should be used after invoking vwscanf().
X.14.4 Wide character I/O X.14.4.1 fgetwc() #include #include wint_t fgetwc(FILE *stream);
The function fgetwc() is the wide-character version of the fgetc(). It retrieves a wide character of type wchar_t from the input stream, converts it to wint_t, moves the position indicator
(offset) to the next wide character and returns the wide character extracted or WEOF (endof-file reached or on error). WEOF is a macro expanding to an integer indicating either the end of the file has been reached, or an error has occurred. Since the value of WEOF corresponds to no wide character, in order to differentiate it from a wide character, the return type of the function is wint_t and not wchar_t. The value of WEOF is returned if one of the following events occurs: o The end-of-file is reached o An error occurs while reading the stream, the error indicator of the stream is set o An encoding error occurs while reading the stream. The global variable errno is then set to EILSEQ. Otherwise, it returns the wide character read. X.14.4.2 fgetws() Since C90 Amendment 1 (C95): #include #include wchar_t *fgetws(wchar_t *s, int n, FILE *stream);
As of C99: #include #include wchar_t *fgetws(wchar_t * restrict s, int n, FILE * restrict stream);
The function fgetws() is the wide-character version of the fgets(). It reads from the input stream at most n-1 wide characters and places them into the memory area pointed to by s. The function appends the null wide character to the string copied into s. It stops reading the input stream if one of following events occurs: o the end-of-file is reached o a newline (that is copied to the object pointed to by s) is encountered o n-1 characters have been read o An error occurs while reading the stream o An encoding error (invalid wide character read) occurs
The fgetws() functions returns s or a null pointer. It returns s, if no error has occurred. If the end-of-file is encountered and no character is read, a null pointer is returned: s is left untouched. If a read error or an encoding error occurs while reading, a null pointer is returned: the object pointed to by s has indeterminate contents. X.14.4.3 fputwc() #include #include wint_t fputwc(wchar_t wc, FILE *stream);
The fputwc() is the wide-character version of the fputc() function. It copies the wide characters wc into the output stream (stream). The output stream is a pointer returned by the fopen() function that has opened a file for writing, reading/writing or appending. The function returns the character written unless an error occurs. If a write error occurs, it returns the value of macro WEOF and sets the error indicator of the stream. If an encoding error occurs (invalid wide character), it returns the value of macro WEOF and sets the global variable errno to EILSEQ. X.14.4.4 fputws() Since C90 Amendment 1 (C95): #include #include int fputws(const wchar_t *s, FILE *stream);;
As of C99: #include #include int fputws(const wchar_t * restrict s, FILE * restrict stream);
The function fputws() is the wide-character version of fputs(). It copies the wide string pointed to by s to the output stream indentified by the parameter stream. The output stream is a pointer returned by the fopen() function that has opened a file for writing, reading/writing or appending. It returns EOF if an error occurs. Otherwise, it returns a nonnegative integer value. X.14.4.5 getwc()
#include #include wint_t getwc(FILE *stream);
The function getwc() is the wide-character version of getc(). The function getwc() is equivalent to fgetwc() except it is a macro. X.14.4.6 getwchar() #include wint_t getwchar(void);
The function getwchar() is the wide-character version of getchar(). The function getwchar() is equivalent to getwc() with the argument stdin. X.14.4.7 putwc() #include #include wint_t putwc(wchar_t wc, FILE *stream);
The function putwc() is the wide-character version of equivalent to fputwc() except it is a macro.
putc().
The function
putwc()
is
X.14.4.8 putwchar() #include wint_t putwchar(wchar_t wc);
The function puttwchar() is the wide-character version of puttchar(). The function putwchar() is equivalent to putwc() with the argument stdout. X.14.4.9 ungetwc() #include #include wint_t ungetwc(wint_t wc, FILE *stream);
The function ungetwc() pushes the wide character c back onto the input stream represented
by pointer stream. The file associated with the stream is not modified by the function calls. Pushed-back characters can then be read from the stream in the reverse order they were pushed back. It returns the character that has been put back onto stream or WEOF on error. If the character wc equals WEOF, the function call fails leaving untouched the input stream. The function fungetwc() allows giving back a character read from the stream as if it has not been read. However, the character you put back onto the stream with the function fungetwc() does not have to be the same as the last character read from the stream. Only a single character is guaranteed to be pushed back onto the input stream. If the function is called several times for the same stream and that between the calls no pushedback character has been read from the stream or discarded, the call may fail. A successful call the function clears the end-of-file indicator of the stream. For a text or binary stream, after calling successfully the function, the file position indicator remains unspecified until the pushed-back characters are read or discarded. Take note, the pushed back characters are cancelled if the function fsetpos, rewind() or fseek() is called before the pushed back character are read.
X.15 Exercises Exercise 1. Why must a file be opened before accessing it? Exercise 2. What are the differences between the open modes r+ and w+? Exercise 3. What are the differences between the open modes r+ and a? Exercise 4. Why is the function fgets() safer than gets()? Exercise 5. Provide the expected type of the argument x in the following function calls: Call
Type of argument
fscanf(pf, “%u”, &x);
fscanf(pf, “%f”, &x);
fscanf(pf, “%Lg”, &x);
fscanf(pf, “%8lc”, &x);
fprintf(pf, “%i”, x);
fprintf(pf, “%f”, x);
fprintf(pf, “%Lg”, x);
fprintf(pf, “%8lc”, x);
Exercise 6. What could be the output of the following program? #include #include int main(void) { int x = 10; float f = 1.23; printf(“f=%f\n”,f, x++); printf(“x=%d\n”, x ); return EXIT_SUCCESS; }
Exercise 7. Write a program that reads a file and prints each line preceded with its number. Exercise 8. Why is the method consisting in calling fseek(stream, 0, SEEK_END) followed by a call to ftell() not reliable to compute the size of a file? Exercise 9. I have written a program that writes notifications in a log file but when I open the log file with a text editor, I can see nothing or the information seen is delayed. Explain why and how could overcome the issue?
CHAPTER XI STANDARD C LIBRARY XI.1 Introduction The standard C library, also called libc, is a set of header files and a library. The library implements numbers of routines that programmers can invoke within their programs. In the chapter, we will talk about the most frequently used functions provided by the standard C library. To ease our descriptions, we will break down the standard C library into several parts corresponding to the header files. The chapter will describe the elements the most commonly used in C programs: several macros, types and functions will not be broached in the chapter. As described in the beginning of the book, variables, functions, user-defined types, structures, unions, and macros declared in header files can be used in source files after invoking the directive #include following by the name of the suitable header file enclosed between chevrons: #include .
XI.2 void assert(int expr);
The macro assert takes one argument expr that is an expression evaluating to true or false. If the expression evaluates to true, the macro does nothing. If it evaluates to false, an error message including the filename, the line number, and the function name is written to the stream stderr, followed by the call of the function abort(). The function abort() terminates the program abnormally. The following program displays an error message and terminates if the number provided by the user does not range from 0 to 9. $ cat libc_assert.c #include #include #include int mult_table(int n) { int i = 0;
printf(“Multiplication Table of %d:\n”, n); while ( i < 10 ) { printf (“%d x %d = %d\n”, i, n, i * n); i = i + 1; } return EXIT_SUCCESS; } int main(void) { int num; const int array_len = 3; char input_nb[ array_len ]; printf(“Enter an integer in the range [1,9]: “); fgets(input_nb, array_len, stdin); num = atoi( input_nb ); assert (num > 0 && num < 10); mult_table(num); return EXIT_SUCCESS; } $ gcc -o libc_assert -std=c99 -pedantic libc_assert.c $ ./libc_assert Enter an integer in the range [1,9]: 20 Assertion failed: num > 0 && num < 10, file libc_assert.c, line 27, function main Abort (core dumped)
The macro is not commonly used to display error messages for users but for debugging while programming. It is normally disabled after having fully tested the program by defining the macro NDEBUG. This can be done in two ways: before including the assert.h file, you place the directive #define NDEBUG or you invoke the option –DNDEBUG while compiling. The following example disables the assert macro. $ gcc -o libc_assert -std=c99 -pedantic -DNDEBUG libc_assert.c $ ./libc_assert Enter an integer in the range [1,9]: 10 Multiplication Table of 10: 0 x 10 = 0 1 x 10 = 10 2 x 10 = 20
3 x 10 = 30 4 x 10 = 40 5 x 10 = 50 6 x 10 = 60 7 x 10 = 70 8 x 10 = 80 9 x 10 = 90
XI.3 : character handling functions The ctype.h header file declares macros and functions, dealing with characters, used for classifying characters and converting them to uppercase or lowercase letters. With the exception of the functions isdigit() and isxdigit(), all the functions, described below, are affected by the current locale set for the category LC_CTYPE.
XI.3.1 isspace() int isspace(int c);
The function isspace() returns a nonzero value (true) if c is a standard whitespace character or a character pertaining to the character set of the current locale, for which isalnum() returns zero (i.e. not a digit and not a letter). Otherwise, it returns zero (false). A standard whitespace character is one of the following characters: space (‘ ‘), horizontal tab (‘\t’), vertical tab (‘\v’), newline (‘\n’), form-feed (‘\f’), or carriage-return (‘\r’). For the “C” locale, it returns a nonzero value (true) if c is a standard whitespace character
XI.3.2 isblank() int isblank(int c);
The function isblank() returns a nonzero value (true) if the character c is a standard blank character or a character of the character set of the current locale, for which isspace() returns a nonzero value and used as a word-separator. Otherwise, it returns 0 (false). A standard blank character is space (‘ ‘) or horizontal tab (‘\t’). For the “C” locale, it returns a nonzero value (true) if c is a standard blank character.
XI.3.3 isdigit() int isdigit(int c);
The function isdigit() returns a nonzero value (true) if c is a decimal digit character. Otherwise, it returns 0 (false).
XI.3.4 isxdigit() int isxdigit(int c);
The function isxdigit() returns a nonzero value (true) if c is a hexadecimal digit character. Otherwise, it returns 0 (false).
XI.3.5 iscntrl() int iscntrl(int c);
The function iscntrl() returns a nonzero value (true) if the character c is a control character. Otherwise, it returns 0 (false). A control character is commonly used to control the terminal. They cannot be printed. In the following example, we test if the character whose code value is 4 (ASCII/Unicode) is a control character (under Linux and UNIX-systems, it is also obtained by hitting the d key while pressing the ctrl key): $ cat libc_assert.c #include #include #include int main(void) { int c = ‘^D’; printf(“is character with ASCII code %d a ctrl character? %s\n”, c, iscntrl(c) ? “TRUE” : “FALSE” ); } $ gcc -o libc_isctrl -std=c99 -pedantic libc_isctrl.c $ ./libc_isctrl Is character with ASCII code 4 a ctrl character? TRUE
XI.3.6 isgraph() int isgraph(int c);
The function isgraph() returns a nonzero value (true) if the character c can be printed and is not a space. Otherwise, it returns 0 (false).
XI.3.7 isprint() int isprint(int c);
The function isprint() returns a nonzero value (true) if the character c can be printed. Otherwise, it returns 0 (false).
XI.3.8 ispunct() int ispunct(int c);
The function ispunct() returns a nonzero value (true) if the character c is used for punctuation. Otherwise, it returns 0 (false).
XI.3.9 isupper() int isupper(int c);
The function isupper() returns a nonzero value (true) if the character c is an uppercase letter. Otherwise, it returns 0 (false).
XI.3.10 islower() int islower(int c);
The function islower() returns a nonzero value (true) if the character c is a lowercase letter. Otherwise, it returns 0 (false).
XI.3.11 isalpha() int isalpha(int c);
The function isalpha() returns a nonzero value (true) if c is an alphabetic character. Otherwise, it returns 0 (false).
XI.3.12 isalnum() int isalnum(int c);
The function isalpha() returns a nonzero value (true) if c is an alphabetic character (isalpha() returns a nonzero value) or a decimal digit character (isdigit() returns a nonzero value). Otherwise, it returns 0 (false).
XI.3.13 tolower() int tolower(int c);
The function tolower() converts an uppercase letter to its corresponding lowercase letter. If c is an uppercase letter, the corresponding lowercase letter is returned. Otherwise, c is returned with no conversion.
XI.3.14 toupper() int toupper(int c);
The function tolower() converts a lowercase letter to its corresponding uppercase letter. If c is a lowercase letter, the corresponding uppercase letter is returned. Otherwise, c is returned with no conversion. For example; $ cat libc_toupper.c #include #include #include int main(void) { char alist[] = { ‘A’, ‘z’, ‘2’ }; char i; for (i=0; i < sizeof alist; i++) { if ( isupper( alist[i] ) ) printf( “%c is an uppercase letter: “, alist[i] ); else if ( islower( alist[i] ) ) printf( “%c is a lowercase letter: “, alist[i] ); else if ( isdigit( alist[i] ) ) printf( “%c is a digit: “, alist[i] ); printf( “toupper(%c)=%c\n”, alist[i], toupper(alist[i]) ); } return EXIT_SUCCESS; } $ gcc -o libc_toupper -std=c99 -pedantic libc_toupper.c $ ./libc_toupper A is an uppercase letter: toupper(A)=A z is a lowercase letter: toupper(z)=Z 2 is a digit: toupper(2)=2
XI.4 The errno global value is declared in the file errno.h. We described it in the previous chapter (Chapter X Section X.7.2). After calling a system or C-library function, the global integer variable errno is set when an error has occurred.
XI.5 The functions listed in this section are not detailed. We just give an overall description of them. If you have to use them, refer to the man pages or the C standard. When compiling, you may have to add the mathematic library by using the option -lm.
XI.5.1 Trigonometric functions Function
Description
float acosf(float x); double acos(double x);
Return the arc cosine of x.
long double acosl(long double x); float asinf(float x); double asin(double x);
Return the arc sine of x.
long double asinl(long double x); float atanf(float x); double atan(double x);
Return the arc tangent of x.
long double atanl(long double x); float cosf(float x); double cos(double x);
Return the cosine of x.
long double cosl(long double x); float sinf(float x); double sin(double x);
Return the sine of x.
long double sinl(long double x); float tanf(float x); Return the tangent of x.
double tan(double x); long double tanl(long double x);
XI.5.2 Hyperbolic functions Function
Description
float acoshf(float x); double acosh(double x);
Return arc hyperbolic cosine of x.
long double acoshl(long double x); float asinhf(float x); double asinh(double x);
Return arc hyperbolic sine of x.
long double asinhl(long double x); float atanhf(float x); double atanh(double x);
Return arc hyperbolic tangent of x.
long double atanhl(long double x); float coshf(float x); double cosh(double x);
Return hyperbolic cosine of x.
long double coshl(long double x); float sinhf(float x); double sinh(double x);
Return hyperbolic sine of x.
long double sinhl(long double x); float tanhf(float x); double tanh(double x); long double tanhl(long double x);
Return hyperbolic tangent of x.
XI.5.3 Exponential functions Function
Description
float exp2f(float x); double exp2(double x);
Return 2x.
long double exp2l(long double x); float expm1f(float x); double expm1(double x);
Return ex-1.
long double expm1l(long double x); float frexpf(float x, int *exp); double frexp(double x, int *exp); long double frexpl(long double x, int *exp);
Return 0 or v and compute the exponent y assigned to the object pointed to by exp such as x=v*2y where v [0.5, 1[.
float ldexpf(float x, int exp); double ldexp(double x, int exp);
Return x*2exp
long double ldexpl(long double x, int exp); float scalbnf(float x, int n); double scalbn(double x, int n); long double scalbnl(long double x, int n);
Return
x FLT_RADIXn.
*
float scalblnf(float x, long int n);
FLT_RADIX is defined in float.h, generally taking the value of 2.
double scalbln(double x, long int n); long double scalblnl(long double x, long int n);
For example: $ cat libc_frexp.c #include #include
#include int main(void) { double x = 20; int exp; double significant = frexp(x, &exp); printf( “%f=%f*2^%d\n”, x, significant, exp ); return EXIT_SUCCESS; } $ gcc -o libc_frexp -std=c99 -pedantic –lm libc_frexp.c $ ./libc_frexp 20.000000=0.625000*2^5
XI.5.4 Logarithmic functions Function
Description
int ilogbf(float x); Return int ilogb(double x);
the exponent (cast to int) of x where x=v*FLT_RADIXexp where v [1, FLT_RADIX[. The macro FLT_RADIX is defined in float.h, generally taking the value of 2.
ilogb() returns the same value as logb() but cast to int. int ilogbl(long double x); float logf(float x); double log(double x);
Return ln(x). Compute natural logarithm (logarithm to base e).
long double logl(long double x); float log10f(float x); double log10(double x);
Return lg(x). Compute logarithm to base 10 (log10).
long double log10l(long double x); float log1pf(float x); double log1p(double x);
Return ln(x+1). Natural logarithm.
long double log1pl(long double x); float log2f(float x); Return lb(x). Compute logarithm to base 2 (binary logarithm,
log2(x)).
double log2(double x); long double log2l(long double x); float logbf(float x);
Return the exponent of x where x=v*FLT_RADIXexp where v [1, FLT_RADIX[. FLT_RADIX is defined in float.h, generally taking the value of 2. If FLT_RADIX is 2, logb() is equivalent to the function log2().
double logb(double x);
long double logbl(long double x);
XI.5.5 Power functions Function
Description
float cbrtf(float x); double cbrt(double x);
Return cube root of x
long double cbrtl(long double x); float hypotf(float x, float y); double hypot(double x, double y);
Return square root of x2 + y2
long double hypotl(long double x, long double y); float powf(float x, float y); double pow(double x, double y);
Return xy
long double powl(long double x, long double y); float sqrtf(float x); double sqrt(double x); long double sqrtl(long double x);
XI.5.6 Miscelleanous XI.5.6.1 fabs() float fabsf(float x); double fabs(double x);
Return square root of x
long double fabsl(long double x);
The functions return |x| (absolute value of x). XI.5.6.2 modf() float modff(float x, float *intg); double modf(double x, double *intg); long double modfl(long double x, long double *intg);
The functions break the argument x into its fractional part that is returned and its integer part assigned to the object pointed to by intg. $ cat libc_modf.c #include #include #include int main(void) { double int_part; double fract_part; double x = 1.618; fract_part = modf(x, &int_part); printf( “%f=%f + %f\n”, x, int_part, fract_part ); return EXIT_SUCCESS; } $ gcc -o libc_modf -std=c99 –pedantic -lm libc_modf.c $ ./libc_modf 1.618000=1.000000 + 0.618000
XI.5.7 Rounding functions Function
Description
float ceilf(float x); double ceil(double x); long double ceill(long double x);
Return the smallest integer not less than x. For example ceil(2.5) returns 3.0, ceil(2.8) returns 3.0, ceil (1.99) returns 3.0
float floorf(float x); double floor(double x);
Return the greatest integer not greater than x. For example floor(2.5) returns 2.0, floor(2.8) returns 2.0, floor(1.99) returns 1.0
long double floorl(long double x); double round(double x); float roundf(float x);
Return the nearest integer (round half away from 0). For example round(2.5) returns 3.0, roud(2.8) returns 3.0, round(1.4) returns 1.0
long double roundl(long double x); long int lroundf(float x); long int lround(double x); long int lroundl(long double x); long long int llroundf(float x);
Return the nearest integer (round half away from 0). For example round(2.5) returns 3, roud(2.8) returns 3, round(1.4) returns 1
long long int llround(double x); long long int llroundl(long double x); float truncf(float x); double trunc(double x);
Return the integral part.
long double truncl(long double x);
Here is an example: $ cat libc_rounding.c #include #include #include int main(void) { double list_nb[] = {-0.9, -1.1, -1.2, -1.5, -1.7, 0.9, 1.1, 1.2, 1.5, 1.7}; int i; int array_len = sizeof list_nb/sizeof list_nb[0]; printf( “% -16s% -16s% -16s% -16s% -16s \n”, “value”, “ceil”, “floor”, “round”, “trunc”);
for (i=0; i < array_len; i++) printf( “% -16.3lf% -16.3lf% -16.3lf% -16.3lf% -16.3lf\n”, list_nb[i], ceil(list_nb[i]), floor(list_nb[i]), round(list_nb[i]), trunc(list_nb[i]) ); return EXIT_SUCCESS; } $ gcc -o libc_rounding -std=c99 -pedantic -lm libc_rounding.c $ ./libc_rounding value ceil floor round trunc -0.900 -0.000 -1.000 -1.000 -0.000 -1.100 -1.000 -2.000 -1.000 -1.000 -1.200 -1.000 -2.000 -1.000 -1.000 -1.500 -1.000 -2.000 -2.000 -1.000 -1.700 -1.000 -2.000 -2.000 -1.000 0.900 1.000 0.000 1.000 0.000 1.100 2.000 1.000 1.000 1.000 1.200 2.000 1.000 1.000 1.000 1.500 2.000 1.000 2.000 1.000 1.700 2.000 1.000 2.000 1.000
XI.5.8 isnan() int isnan(real-floating-point f);
It is a macro that returns 0 if its argument has not a NaN value. Otherwise, it returns a nonzero value. For example: $ cat isnan.c #include #include #include int main(void) { double v = 1E900; /* Infinite */ double u = 1E-900; /* 0 */ double w = v * 0; /* NaN */ if ( isnan(w) ) { printf(“w has a NaN value\n”); } else { printf(“w=%f\n”, w); }
return EXIT_SUCCESS; } $ gcc -o isnan -std=c99 -pedantic isnan.c isnan.c: In function ‘main’: isnan.c:6:4: warning: floating constant exceeds range of ‘double’ isnan.c:7:4: warning: floating constant truncated to zero $ ./isnan w has a NaN value
XI.5.9 isinf() int isinf(real-floating-point f);
It is a macro that returns 0 if its argument has not an infinite value. Otherwise, it returns a nonzero value. For example: $ cat isinf.c #include #include #include int main(void) { double v = 1E900; /* Infinite */ if ( isinf(v) ) { printf(“v has an infinite value\n”); } else { printf(“v=%f\n”, v); } return EXIT_SUCCESS; } $ gcc -o isinf -std=c99 -pedantic isinf.c isinf.c: In function ‘main’: isinf.c:6:4: warning: floating constant exceeds range of ‘double’ $ ./isinf v has an infinite value
XI.6 void va_start(va_list ap, last_param_name);
type va_arg(va_list ap, type); void va_copy(va_list dst, va_list src); void va_end(va_list ap);
Those functions were described in Chapter VII Section VII.28. They allow you to create functions with a variable number of parameters.
XI.7 The header file stdbool.h defines the following macros: o bool expands to the type __Bool. o true expands to 1 o false expands to 0 Here is an example: $ cat libc_bool.c #include #include #include #include /* if two arrays holds the same strings, return true. Otherwise, return false */ #define SAME_STRING(s1,s2) (strcmp((s1), (s2)) == 0 ? true : false) int main(int argc, char **argv) { bool b; char *string1, *string2; if (argc < 3) { printf(“USAGE: %s \n”, argv[0]); return EXIT_FAILURE; } string1 = argv[1]; string2 = argv[2]; b = SAME_STRING(string1, string2); if ( b == true ) {
printf(“same string\n”); } else { printf(“different string\n”); } return EXIT_SUCCESS; } $ gcc -o libc_bool -std=c99 -pedantic -lm libc_bool.c $ ./libc_bool USAGE: ./libc_bool $ ./libc_bool Hello Hello same string $ ./libc_bool Hello hello different string
XI.8 XI.8.1 Types The stddef.h header file defines the following types: o ptrdiff_t o size_t o wchar_t XI.8.1.1 size_t and prtdiff_t The type ptrdiff_t is a signed integer type used when subtracting two pointers. The type size_t is an unsigned integer type used to represent array indexes, the sizes of objects and types. For example, the sizeof operator returns an integer of type size_t. The natural question that arises is: “why not just using an unsigned int to represent object sizes and an int for array indexes and values of pointers in arithmetic operations?” The rationale is the width of int, long, long long and pointers depends on the architecture of the computer and the operating system. Since a pointer holds an address (an integer), it can be represented by int, long, or long long. Likewise, the sizes of objects can be represented by type unsigned int, unsigned long, or unsigned long long. The sizes of integer types and pointers varying from to system according to their data type model (see Table XI‑1), you cannot write easily and naturally a portable C program if you use a specific integer type for object sizes and values resulting from the substation of two pointers. To overcome such issues, the C standard specifies two types: ptrdiff_t and size_t.
Table XI‑1 Some data type models
The data type model of a computer takes the form IaLbLLcPd or IaLbPd where I stands for integer, L for long, LL for long long and P for pointer. The data type model I32LP64 means int is represented by 32 bits, long by 64 bits, and pointers by 64 bits. When LL is not mentioned, long long has the same width as long. In summary, use type ptrdiff_t for an object taking the value of the subtraction of two pointers or for indexes in large arrays. Use the type size_t for object sizes, and indexes for [95] large arrays . The largest value for an object of type size_t is defined by the macro SIZE_MAX. The lowest value for ptrdiff_t is defined by the macro PTRDIFF_MIN and the biggest value is defined by the macro PTRDIFF_MAX. XI.8.1.2 wchar_t The wchar_t is an integer type than can represent any wide character of any supported coded character set. That is, an object of type wchar_t can hold the largest code value of any supported extended character code set. In “C” locale, English-based locales, a character can be coded in one byte (type char) but some locales require more than one byte to store code points of extended characters: wchar_t is used in those cases. As a simple example, the following snippet of code displays the letters é and è in French locale: $ cat libc_wchar.c #include #include
#include #include int main(void) { wchar_t accents[] = L”éè”; setlocale(LC_ALL, “fr_FR.UTF-8”); printf(“sizeof(wchar_t)=%d\n”, sizeof(wchar_t) ); printf(“accents: %ls\n”, accents); return EXIT_SUCCESS; } $ gcc -o libc_wchar -std=c99 -pedantic libc_wchar.c $ ./libc_wchar sizeof(wchar_t)=4 accents: éè
Explanation: o wchar_t accents[] = L”éè” assigns the string literal “éè” to the array accents. The letter L preceding the first double-quote means the string literal is composed of wide characters (of type wchar_t). o The statement setlocale(LC_ALL, “fr_FR.UTF-8”) sets the locale fr_FR.UTF-8. o The statement printf(“accents: %ls\n”, accents) displays the wide string held in accents.
XI.8.2 Macros The stddef.h header file also defines the following macros: o NULL that expands to the null pointer o offsetof(structure, member) returns the offset of a member (expressed in bytes) of a structure from the beginning of the structure. It returns a value of type size_t. For example: $ cat libc_offsetof.c #include #include #include int main(void) { struct student { char first_name[255]; char last_name[255];
int age; }; printf(“offsetof(struct student, first_name)=%d\n”, offsetof(struct student, first_name) ); printf(“offsetof(struct student, last_name)=%d\n”, offsetof(struct student, last_name) ); printf(“offsetof(struct student, age)=%d\n”, offsetof(struct student, age) ); return EXIT_SUCCESS; } $ gcc -o libc_offsetof -std=c99 -pedantic -lm libc_offsetof.c $ ./libc_offsetof offsetof(struct student, first_name)=0 offsetof(struct student, last_name)=255 offsetof(struct student, age)=512
XI.9 The I/O functions declared in stdio.h were described in the previous chapter.
XI.10 The stdint.h header file declares new integer types, defines macros and limits for integers.
XI.10.1 Integer types XI.10.1.1 Integers and pointers Since C99, two types are defined to store the address of a pointer to an object: intptr_t (signed integer) and uinptr_t (unsigned integer). They are more reliable than basic integers such as int, unsigned int, unsigned long… However, intptr_t and uinptr_t are optional and then might not be available on your system. An object pointer to void can be cast to intprt_t (to get the address) and then converted back to void * recovering the original pointer without losing data.
Do not use objects of type intptr_t or uintprt_t to store addresses of pointers to functions.
XI.10.1.2 Exact-width integer types The types intN_t are optional types representing signed integers fitting exactly in N bits. For example, int16_t is a signed integer type of 16-bit width. The problem with this kind of type is it depends on the implementation: each system defines its own integer types with exactwidth (if any), which implies, the program is not portable. A system might not define such types at all. Similarly, the type uintN_t are optional types representing unsigned integers fitting in exactly N bits. For example, uint8_t is an unsigned integer type of 8-bit width. XI.10.1.3 Minimum-width integer types The types int_leastN_t represent signed integers fitting in at least N bits. The following types are defined: o int_least8_t o int_least16_t o int_least32_t o int_least64_t The types uint_leastN_t represent unsigned integers fitting in at least N bits. The following types are defined: o uint_least8_t o uint_least16_t o uint_least32_t o uint_least64_t Systems may define additional types. XI.10.1.4 Fastest minimum-width integer types The types int_fastN_t represent the fastest signed integers fitting in at least N bits. The following types are defined: o int_fast8_t o int_fast16_t o int_fast32_t o int_fast64_t
The types uint_fastN_t represent the fastest unsigned integers fitting in at least N bits. The following types are defined: o uint_fast8_t o uint_fast16_t o uint_fast32_t o uint_fast64_t Fastest means the most efficient integer type is used depending on the architecture of the processor. For example, on a computer with 32-bit registers, it is likely more efficient to use int_fast16_t as an integer type fitting in 32 bits. Systems may define additional types. XI.10.1.5 Maximum-width integer types The type intmax_t is a signed integer type that can represent any signed integer number and then the largest possible signed integer number. The type uintmax_t is an unsigned integer type that can represent any unsigned integer number. If nowadays most of the computer define uintmax_t as long long, in the future it is very likely that it will be bigger evolving with the architecture of the processor.
XI.10.2 Limits The stdint.h header file defines a set of limits for integer types it defines.
XI.10.3 Macros
XI.11 XI.11.1 Macros
XI.11.2 Functions XI.11.2.1 strtod(), strtof(), strtold() Until C95: #include double strtod(const char *ptr, char **endptr);
Since C99: #include
double strtod(const char *restrict ptr, char **restrict endptr); float strtof(const char *restrict ptr, char **restrict endptr); long double strtold(const char *restrict ptr, char **restrict endptr);
The functions strtod(), strtof() and strtold() convert the sequence of characters pointed to by ptr [96] to double, float and long double respectively. The functions discard leading whitespaces and start parsing when the first non-whitespace character is encountered. They read characters from the string pointed by ptr to form a floating-point number. If a character cannot be used to build the current floating-point number, the functions stop reading, convert the sequence of characters read so far to a floating-point value, set the pointer *endptr (if endptr is not a null pointer) to the pointer to the character succeeding the last character of the character sequence converted. If no further conversion can be done, they set the pointer *endptr to ptr (if endptr is not a null pointer). If the conversion succeeds, the functions return a floating-point value. If no further conversion can be done, they return 0. If the sequence of characters to be converted represents a value too large, the variable errno is set to ERANGE, and the value of the macro HUGE_VAL (by strtod()), HUGE_VALF (by strtof()), or HUGE_VALL (by strtold()) is returned. A valid sequence of characters forming a floating-point number is one of the following: o Decimal integer: sequence of decimal digits (may be preceded by a minus or plus sign) o Decimal floating-point number: sequence of decimal digits separated by a decimal point (may be preceded by a minus or plus sign). The decimal point depends on the current locale. In “C” locale, the decimal radix is a period (.). o Hexadecimal integer: sequence of hexadecimal digits, ignoring case, starting with 0x or 0X (may be preceded by a minus or plus sign) o Hexadecimal floating-point number: sequence of hexadecimal digits, ignoring case, separated by a decimal point starting with 0x or 0X (may be preceded by a minus or plus sign). The decimal point depends on the current locale. o A decimal floating-point number in scientific notation [±]fe[±]n, [±]fE[±]n where f and p are floating-point values composed of decimal digits (base 10) o A hexadecimal floating-point number in scientific notation [±]hp[±]m or [±]hP[±]m where h and m are floating-point numbers composed of hexadecimal digits (ignoring case). The hexadecimal value h starts with 0x or 0X. o Inf or Infinity (ignoring case) o NAN (ignoring case)
For example: $ cat strtod.c #include #include #include int main(void) { char *ptr = “ NAN INF Infinity 10 3.14.87 -2.8 2e4 0xA.C 10e7987 0xap-2 17PP 18”; char *endptr = NULL; double d; printf(“Input string \”%s\”:\n”, ptr); d = strtod(ptr, &endptr); // init scanning /* Now, endptr points to the next item ptr points to the item that has just been read d holds the first floating-point number */ while ( ptr != endptr ) { int n = endptr-ptr; // number of characters read printf(“\”%.*s\” converted to “, n, ptr); // current item printf(“%f”, d); // value of the current item if (errno == ERANGE) { // value too large printf(” (Out of range)”); errno = 0; } printf(“\n”); ptr = endptr; // point to the next item d = strtod(ptr, &endptr); // convert the next item } } $ gcc -o strtod -std=c99 -pedantic strtod.c $ ./strtod Input string “ NAN INF Infinity 10 3.14.87 -2.8 2e4 0xA.C 10e7987 0xap-2 17PP 18”: “ NAN” converted to nan ” INF” converted to inf
” Infinity” converted to inf “ 10” converted to 10.000000 ” 3.14” converted to 3.140000 “.87” converted to 0.870000 ” -2.8” converted to -2.800000 ” 2e4” converted to 20000.000000 ” 0xA.C” converted to 10.750000 ” 10e7987” converted to inf (Out of range) ” 0xap-2” converted to 2.500000 ” 17” converted to 17.000000
XI.11.2.2 strtol(), strtoll(), strtoul(), strtoull() Until C95: #include long int strtol(const char * ptr, char **endptr, int b); unsigned long int strtoul(const char *ptr, char ** endptr, int b);
As of C99: #include long int strtol(const char *restrict ptr, char **restrict endptr, int b); long long int strtoll(const char *restrict ptr, char **restrict endptr, int b); unsigned long int strtoul(const char *restrict ptr, char **restrict endptr, int b); unsigned long long int strtoull(const char *restrict ptr, char **restrict endptr,int b);
The functions strtol(), strtoll(), strtoul() and strtoull() convert the sequence of characters pointed to by ptr to long, long, unsigned long and unsigned long long respectively. The functions discard leading whitespaces and start parsing when the first non-whitespace character is encountered. They read characters from the string pointed by ptr to form an integer number expressed in base b. If a character cannot be used to build the current integer in base b (ranging from 2 to 36), the functions stop reading, convert the sequence of characters read so far to an integer value, set the pointer *endptr (if not a null pointer) to the pointer to the character immediately succeeding the last character of the character sequence converted. If no further conversion can be done, they set the pointer *endptr to ptr (if endptr is not a null pointer).
If the conversion succeeds, the functions return an integer value. If no further conversion can be done, they return 0. If the sequence of characters to be converted represents a value too large, the variable errno is set to ERANGE, and the value of one of the following macros is returned: o strtol(): LONG_MIN if the return value is negative and LONG_MAX if the return value is positive o strtoul(): ULONG_MAX o strtoll(): LLONG_MIN if the return value is negative and LLONG_MAX if the return value is positive o strtoull(): ULLONG_MAX A valid sequence of characters forming an integer number is one of the following: o Decimal integer: sequence of decimal digits (may be preceded by a minus or plus sign) o Hexadecimal integer: sequence of hexadecimal digits, ignoring case, that may start with 0x or 0X (may be preceded by a minus or plus sign) o Octal integer: sequence of octal digits (may be preceded by a minus or plus sign) o Integer number in base b: a sequence of digits ranging from 0 to b-1. If b > 10, letters ranging from a to z (ignoring case) are used as digits. Example: $ cat strtol.c #include #include #include int main(void) { char *ptr_16 = “ 0xA 0XAC 0xFf 0xf 5.7”; int base = 16; char *endptr = NULL; long l; printf(“Input string \”%s\”:\n”, ptr_16); l = strtol(ptr_16, &endptr, base); // init scanning /* Now, endptr points to the next item ptr points to the item that has just been read
d holds the first integer */ while ( ptr_16 != endptr ) { int n = endptr-ptr_16; // number of characters read printf(“\”%.*s\” converted to “, n, ptr_16); // current item printf(“%ld”, l); // value of the current item if (errno == ERANGE) { // value too large printf(” (Out of range)”); errno = 0; } printf(“\n”); ptr_16 = endptr; // point to the next item l = strtod(ptr_16, &endptr); // convert the next item } } $ gcc -o strtol -std=c99 -pedantic strtol.c $ ./strtol Input string “ 0xA 0XAC 0xFf 0xf 5.7”: “ 0xA” converted to 10 ” 0XAC” converted to 172 ” 0xFf” converted to 255 ” 0xf” converted to 15 ” 5.7” converted to 5
XI.11.2.3 atoi(), atol() and atoll() Until C95: #include int atoi(const char *s); long int atol(const char *s);
As of C99: #include int atoi(const char *s);
long int atol(const char *s); long long int atoll(const char *s);
The functions atoi(), atol() and atoll() convert the string pointed to by s to int, long, and long long. They are equivalent to: atoi(ptr): (int)strtol(ptr, (char **)NULL, 10); atol(ptr): strtol(ptr, (char **)NULL, 10); atoll(ptr): strtoll(ptr, (char **)NULL, 10);
For example: $ cat libc_string2integer.c #include #include #include int main(void) { char *s = “2367”; printf( “atoi(%s)=%d (size=%d bytes)\n”, s, atoi(s), sizeof atoi(s) ); printf( “atol(%s)=%ld (size=%d bytes)\n”, s, atol(s), sizeof atol(s) ); printf( “atoll(%s)=%lld (size=%d bytes)\n”, s, atoll(s), sizeof atoll(s) ); return EXIT_SUCCESS; } $ gcc -o libc_string2integer -std=c99 -pedantic libc_string2integer.c $ ./libc_string2integer atoi(2367)=2367 (size=4 bytes) atol(2367)=2367 (size=4 bytes) atoll(2367)=2367 (size=8 bytes)
XI.11.2.4 atof() double atof(const char *str);
The function atof() convert the string pointed to by s to double. It is equivalent to: strtod(str, (char **)NULL);
For example:
$ cat libc_string2float.c #include #include #include int main(void) { char *s = “2367.12”; printf( “atof(%s)=%f\n”, s, atof(s) ); return EXIT_SUCCESS; } $ gcc -o libc_string2float -std=c99 -pedantic libc_string2float.c $ ./libc_string2float atof(2367.12)=2367.120000
XI.11.2.5 abs(), labs(), llabs() int abs(int j); long int labs(long int j); long long int llabs(long long int j);
The function abs(), labs() and llabs() returns the absolute value of j (i.e. |j|) XI.11.2.6 rand() int rand(void);
The function rand() returns a pseudo-random integer within [0-RAND_MAX]. $ cat libc_rand.c #include #include #include int main(void) { int i; for (i =0; i < 3; i++) printf( “rand()=%d (within [0-%d])\n”, rand(), RAND_MAX ); return EXIT_SUCCESS;
} $ gcc -o libc_rand -std=c99 -pedantic -lm libc_rand.c $ ./libc_rand rand()=16838 (within [0-32767]) rand()=5758 (within [0-32767]) rand()=10113 (within [0-32767])
Now, what happens if we run again the program? $ ./libc_rand rand()=16838 (within [0-32767]) rand()=5758 (within [0-32767]) rand()=10113 (within [0-32767])
We got the same sequence of pseudo-random integers! When you invoke rand(), the very first pseudo-random integer is computed by an algorithm from a special value called seed value. Then, each call to rand() returns a pseudo-random integer based on the previous one. This implies, to get another sequence of pseudo-random integers, you have to change the seed by using the srand() function. If you do not invoke srand() before calling rand(), by default, the seed is set to 1.
The same seed value causes rand() to produce the same sequence of pseudo-random integers.
XI.11.2.7 srand() void srand(unsigned int seed);
The srand() function sets the seed value in order to generate a new sequence of pseudorandom integers. In the following example, the first sequence of pseudo-random numbers is based on the seed value of 1. The second sequence of pseudo-random numbers is based on the seed value 10. The last sequence of pseudo-random numbers is based on the seed value 1: we get the same sequence of integers as the first one. $ cat libc_srand1.c #include #include #include int main(void) { int i;
printf(“default seed:\n”); for (i =0; i < 3; i++) printf( “%d\n”, rand() ); srand(10); printf(“\nseed=10:\n”); for (i =0; i < 3; i++) printf( “%d\n”, rand() ); printf(“\ndefault seed (1):\n”); srand(1); for (i =0; i < 3; i++) printf( “%d\n”, rand() ); return EXIT_SUCCESS; } $ gcc -o libc_srand1 -std=c99 -pedantic libc_srand1.c $ ./libc_srand1 default seed: 16838 5758 10113 seed=10: 4543 28214 11245 default seed (1): 16838 5758 10113
Programmers often employ the value returned by the function time() as seed: $ cat libc_srand2.c #include #include #include #include int main(void) {
int i; time_t t; srand(time(&t)); for (i =0; i < 3; i++) printf( “%d\n”, rand() ); return EXIT_SUCCESS; } $ gcc -o libc_srand2 -std=c99 -pedantic -lm libc_srand2.c $ ./libc_srand2 3119 17214 17900 $ ./libc_srand2 4027 18563 18152
The function time(), declared in the header file time.h, returns the number of seconds from the epoch (00:00:00 UTC, January 1, 1970). The following example displays sequences of pseudo-random integers ranging from 0 to 9: $ cat libc_srand3.c #include #include #include #include #define RANDOM_MODULUS 10 int main(void) { int i; time_t t; srand(time(&t)); for (i =0; i < 3; i++) printf( “%d\n”, rand() % RANDOM_MODULUS); return EXIT_SUCCESS;
} $ gcc -o libc_srand3 -std=c99 -pedantic libc_srand3.c $ ./libc_srand3 2 3 8 $ gcc -o libc_srand3 -std=c99 -pedantic libc_srand3.c $ ./libc_srand3 1 3 6
XI.11.2.8 abort() void abort(void);
The function abort() triggers an abnormal termination of the running program and raises the signal SIGABRT. The program ends with an unsuccessful exit status (the exit code depends on the implementation). There is no guarantee the program terminates gracefully. That is, if you invoke this function instead of the function exit(), remember that, depending on the implementation, unwritten data in buffered streams may not be written to files, and temporary files may not be removed. Here is an example: $ cat libc_abort1.c #include #include int main(void) { printf( “Hello\n”); abort(); return EXIT_SUCCESS; } $ gcc -o lib_abort1 -std=c99 -pedantic lib_abort1.c $ ./lib_abort1 Hello Abort (core dumped) $ echo $? 134
In our system, the program terminates with exit code 134. The following example highlights the signal SIGABRT is actually sent: $ cat libc_abort2.c
#include #include #include void quit(int sig) { printf(“Signal %d received. SIGABORT=%d\n”, sig, SIGABRT); } int main(void) { signal(SIGABRT, quit); printf( “Hello\n”); abort(); return EXIT_SUCCESS; } $ gcc -o lib_abort2 -std=c99 -pedantic lib_abort2.c $ ./lib_abort2 Hello Signal 6 received. SIGABORT=6 Abort (core dumped)
The function signal() will be described in section XI.14. XI.11.2.9 atexit() int atexit(void (*f)(void));
The function atexit() places the function f() in the list of the functions to be called when the program terminates normally. The functions will be called in the reverse order of their registration. The function f() takes no parameter and returns nothing. The implementation will be able to support at least 32 functions to be registered. The following example calls the function f1() and f2() at program termination: $ cat libc_atexit.c #include #include void f1(void) { printf(“Function f1()\n”); } void f2(void) { printf(“Function f2()\n”);
} int main(void) { atexit(f1); atexit(f2); printf( “Hello\n”); return EXIT_SUCCESS; } $ gcc -o libc_atexit -std=c99 -pedantic libc_atexit.c $ ./libc_atexit Hello Function f2() Function f1()
XI.11.2.10 exit() void exit(int e);
The function exit() terminates normally the program with the exit status e. Unwritten data in buffered streams are sent to files, streams are closed and temporary files removed. The following program terminates with the exit status 47 if no argument is provided to the program. $ cat libc_exit.c #include #include int main(int argc, char **argv) { if (argc < 2) exit(47); printf( “first arg=%s\n”, argv[1]); return EXIT_SUCCESS; } $ gcc -o libc_exit -std=c99 -pedantic libc_exit.c $ ./libc_exit $ echo $? 47
XI.11.2.11 _Exit() void _Exit(int e);
The function _Exit() terminates normally the program with the exit status e. It differs from exit() in that, depending on, the implementation, unwritten data in buffered streams may be sent to files, streams may be closed and temporary files may be removed with no guarantee. There is another difference: functions registered by atexit() and by signal() will not be called. XI.11.2.12 malloc(), calloc() and realloc() void *malloc(size_t size); void *calloc(size_t n_elt, size_t elt_size); void *realloc(void *p, size_t size);
The functions allocate a memory block and return a pointer to it. We have already studied them thoroughly in Chapter III. XI.11.2.13 free() void free(void *p);
The functions releases a memory block pointed to by p previously allocated by malloc(), calloc() or realloc(). We have already studied it in Chapter III. XI.11.2.14 getenv() char *getenv(const char *var);
The function getenv() returns the string assigned to an environment variable named var. A null pointer is returned if var is not found. $ cat libc_getenv.c #include #include #define CHECK_STRING(s) ( (s) == NULL ? “undefined” : (s) ) int main(void) { char *s1 = getenv(“HOME”); char *s2 = getenv(“MYMSG”);
printf(“HOME=%s\n”, CHECK_STRING(s1) ); printf(“MYMSG=%s\n”, CHECK_STRING(s2) ); return EXIT_SUCCESS; } $ gcc -o libc_getenv -std=c99 -pedantic libc_getenv.c $ ./libc_getenv HOME=/home/david MYMSG=undefined $ export MYMSG=Hello $ ./libc_getenv HOME=/home/david MYMSG=Hello
XI.11.2.15 system() int system(const char *s);
The system() function executes the command pointed to by s by the default command interpreter (CLI) if available. If s is a null pointer, it returns if a nonzero value if a command line interface is available on the system. On UNIX and UNIX-based systems, it always returns a nonzero value (the command line interface is a shell). The behavior of the function varies from system to the system. The return value depends on the system. On the following example, we tell the UNIX shell to run two commands. $ cat libc_system1.c #include #include int main(void) { char * cmd1 = “echo $SHELL”; char * cmd2 = “ls myprog”; if ( system(NULL) ) { printf(“CLI available on the system:\n”) ; printf(“Run command \”%s\”:\n”, cmd1); system(cmd1); printf(“\nRun command \”%s\”:\n”, cmd2); system(cmd2); } else {
printf(“CLI not available on the system\n”); } return EXIT_SUCCESS; } $ gcc -o libc_system -std=c99 -pedantic libc_system.c $ ./libc_system CLI available on the system: Run command “echo $SHELL”: /usr/bin/ksh Run command “ls myprog”: myprog: No such file or directory
If we run the same command in the shell, we get the same output: $ echo $SHELL /usr/bin/ksh $ ls myprog myprog: No such file or directory
Now, let us print the shell termination code of the commands: $ echo $SHELL /usr/bin/ksh $ echo $? 0 $ ls myprog myprog: No such file or directory $ echo $? 2
Does the system() function allow us to get the exit status of commands on UNIX and UNIXbased systems? On UNIX and UNIX-based systems, if s is not a null pointer, the function system() returns a value containing the termination code of the command pointed to by s. The following example, not portable, working on POSIX-compliant systems only, completes the previous example by displaying the exit status of the commands. The first command returns a shell exit status of 0 and the second one a nonzero value (indicating a failure): $ cat libc_system1.c #include #include #include
int main(void) { char *cmd1 = “echo $SHELL”; char *cmd2 = “ls myprog”; int system_val; int cmd_exit_status; if ( system(NULL) ) { printf(“CLI available on the system:\n”) ; /* First command */ printf(“Run command \”%s\”:\n”, cmd1); system_val = system(cmd1); cmd_exit_status = WEXITSTATUS(system_val); printf(“exit status=%d\n”, cmd_exit_status ); /* Second command */ printf(“\nRun command \”%s\”:\n”, cmd2); system_val = system(cmd2); cmd_exit_status = WEXITSTATUS(system_val); printf(“exit status=%d\n”, cmd_exit_status ); } else { printf(“CLI not available on the system\n”); } return EXIT_SUCCESS; } $ gcc -o libc_system2 -std=c99 -pedantic libc_system2.c $ ./libc_system2 CLI available on the system: Run command “echo $SHELL”: /usr/bin/ksh exit status=0 Run command “ls myprog”: myprog: No such file or directory exit status=2
To retrieve the exit status of a command from the value returned by system(), we called the
macro WEXITSTATUS defined in the . XI.11.2.16 qsort() void qsort(void *p, size_t n, size_t size,int (*cmpfunc)(const void *, const void *));
The qsort() function sorts a set of n objects, of size size, pointed to by p without altering the objects. Only the order of objects pointed to by p is altered. The last parameter of qsort() is a function cmpfunc that compares two objects in order to sort them. The function cmpfunc takes two arguments (pointers to const void) and returns an int. The comparison function is of the following form: int cmpfunc(const void *a, const void *b);
Where: o If a greater than b, it returns an integer greater than 0. o If a equals b, it returns 0. o If a less than b, it returns an integer less than 0. The following example sorts an array of integers: $ cat libc_qsort1.c #include #include int cmp(const void *a, const void *b) { return *(int *)a - *(int *)b; } int main(void) { int i; int list_int[] = { 2, 0, 6, 1 }; size_t obj_size = sizeof list_int[0]; size_t nb_elt = sizeof list_int / obj_size; printf(“Before sorting:\n”); for (i=0; i < nb_elt; i++) printf(“%d “, list_int[i]) ; qsort(list_int, nb_elt, obj_size,cmp);
printf(“\n\nAfter sorting:\n”); for (i=0; i < nb_elt; i++) printf(“%d “, list_int[i]) ; printf(“\n\n”); return EXIT_SUCCESS; } $ gcc -o libc_qsort1 -std=c99 -pedantic libc_qsort1.c $ ./libc_qsort1 Before sorting: 2 0 6 1 After sorting: 0 1 2 6
The following example sorts an array of strings: $ cat libc_qsort2.c #include #include #include #define MAX_STRING_LEN 64 int cmp(const void *a, const void *b) { return strcmp((char *)a, (char *)b); } int main(void) { int i; char list_fruit[4][MAX_STRING_LEN] = { “apple”, “tomato”, “banana”, “lichee”}; size_t obj_size = sizeof list_fruit[0]; size_t nb_elt = sizeof list_fruit / obj_size; printf(“Before sorting\n”); for (i=0; i < nb_elt; i++) printf(“%s “, list_fruit[i]) ; qsort(list_fruit, nb_elt, obj_size,cmp); printf(“\n\nAfter sorting:\n”);
for (i=0; i < nb_elt; i++) printf(“%s “, list_fruit[i]) ; printf(“\n\n”); return EXIT_SUCCESS; } $ gcc -o libc_qsort2 -std=c99 -pedantic libc_qsort2.c $ ./libc_qsort2 Before sorting apple tomato banana lichee After sorting: apple banana lichee tomato
XI.11.2.17 bsearch() void *bsearch(const void *obj,const void *p,size_t n,size_t size,int (*cmpfunc)(const void *, const void *));
The bsearch() function searches a sorted list of n objects pointed to by p for the element pointed to by obj and returns a pointer to it if found. It returns a null pointer if the object obj is not found. The parameter size indicates the size of an object. The last parameter is a comparison function that will compare obj (that will be the first argument) with each element of the list (second argument). The function cmpfunc takes two arguments (pointers to const void) and returns an int. The comparison function is of the following form: int cmpfunc(const void *a, const void *b);
Where: o If a greater than b, it returns an integer greater than 0. o If a equals b, it returns 0. o If a less than b, it returns an integer less than 0. The function bsearch() works properly only if the list of objects pointed to by p has been sorted beforehand. The function qsort() is usually invoked before calling bsearch(). The following example searches for the integer 6: $ cat libc_bsearch1.c #include #include
int cmp(const void *a, const void *b) { return *(int *)a - *(int *)b; } int main(void) { int i; int list_int[] = { 2, 0, 6, 1 }; size_t obj_size = sizeof list_int[0]; size_t nb_elt = sizeof list_int / obj_size; int elt = 6; int * p_elt; qsort(list_int, nb_elt, obj_size,cmp); p_elt = bsearch(&elt, list_int, nb_elt, obj_size, cmp); if (p_elt != NULL) printf(“Element %i found\n”, *p_elt); else printf(“Element %i not found\n”, elt); return EXIT_SUCCESS; } $ gcc -o libc_search1 -std=c99 -pedantic libc_search1.c $ ./libc_search1 Element 6 found
The following example searches for the string “banana”: $ cat libc_bsearch2.c #include #include #include #define MAX_STRING_LEN 64 int cmp(const void *a, const void *b) { return strcmp((char *)a, (char *)b); }
int main(void) { int i; char *p_elt; char elt[MAX_STRING_LEN] = “banana”; char list_fruit[][MAX_STRING_LEN] = { “apple”, “tomato”, “banana”, “lichee”}; size_t obj_size = sizeof list_fruit[0]; size_t nb_elt = sizeof list_fruit / obj_size; qsort(list_fruit, nb_elt, obj_size,cmp); p_elt = bsearch(elt, list_fruit, nb_elt, obj_size, cmp); if (p_elt != NULL) printf(“Element %s found\n”, p_elt); else printf(“Element %s not found\n”, elt); return EXIT_SUCCESS; } $ gcc -o libc_bsearch2 -std=c99 -pedantic libc_bsearch2.c $ ./libc_bsearch2 Element banana found
XI.12 XI.12.1 Comparison functions XI.12.1.1 strcmp() int strcmp(const char *s1, const char *s2);
The strcmp() compares the strings pointed to by s1 and s2 and returns: o An integer greater than if s1 is greater than s2 o 0 if s1 equals s2 o An integer less than if s1 is less than s2 The function was described in Chapter III Section III.4.4.5. XI.12.1.2 strncmp()
int strncmp(const char *s1, const char *s2, size_t n);
The strcmp() compares at most n characters of the strings pointed to by s1 and s2 and returns: o An integer greater than 0 if the string pointed to by s1 is greater than the string pointed to by s2 o 0 if the string pointed to by s1 equals the string pointed to by s2 o An integer less than 0 if the string pointed to by s1 is less than the string pointed to by s2
The function was described in Chapter III Section III.4.4.5. XI.12.1.3 memcmp() int memcmp(const void *p1, const void *p2, size_t n);
The memcmp() compares the first n bytes of the objects pointed to by p1 and p2 and returns: o An integer greater than 0 if the object pointed to by p1 is greater than the object pointed to by p2 o 0 if the object pointed to by p1 equals the object pointed to by p2 o An integer less than 0 if the object pointed to by p1 is less than the object pointed to by p2
You can call it, of course, to compare strings but also other kinds of objects such as structures. The following example compares two structures: $ cat libc_memcmp.c #include #include #include #define DEFAULT_ARRAY_LEN 10 struct array_int { int *a; size_t nb_elt; size_t len; }; int main(void) { struct array_int a1, a2;
a1.a = calloc(DEFAULT_ARRAY_LEN, sizeof *a1.a); a2.a = calloc(DEFAULT_ARRAY_LEN, sizeof *a2.a); a1.a[0] = 1; a1.a[1] = 2; a1.len=DEFAULT_ARRAY_LEN; a1.nb_elt = 2; a2.a[0] = 1; a2.a[1] = 2; a2.len=DEFAULT_ARRAY_LEN; a2.nb_elt = 2; if ( ! memcmp(&a1, &a2, sizeof a1) ) { printf(“a1 same as a2\n”); } else { printf(“a1 different from a2\n”); } free(a2.a); printf(“After statement a2 = a1\n”); a2 = a1; if ( ! memcmp(&a1, &a2, sizeof a1) ) { printf(“a1 same as a2\n”); } else { printf(“a1 different from a2\n”); } return EXIT_SUCCESS; } $ gcc -o libc_memcmp -std=c99 -pedantic libc_memcmp.c $ ./libc_memcmp a1 different from a2 After statement a2 = a1 a1 same as a2
XI.12.2 Copy functions XI.12.2.1 strcpy() Until C95 #include
char *strcpy(char * restrict dest,const char * restrict src);
As of C99: #include char *strcpy(char * restrict dest,const char * restrict src);
The strcpy() function copies the string, including the null character, pointed to by src into the memory area pointed to by dest. The copy stops when a null character is encountered. It returns dest. The function does not work properly with overlapping pointers (see Chapter VII Section VII.18.2). The function was described fully in Chapter III. XI.12.2.2 strncpy() Until C95: #include char *strncpy(char *dest,const char *src,size_t n);
As of C99: #include char *strncpy(char * restrict dest,const char * restrict src,size_t n);
The strncpy() function performs the same task as strcpy() except it copies at most n characters. The copy stops when a null character is encountered. Characters appearing after a null character is encountered are not copied. If the length of the string pointed to by src is less than n, extra null characters are appended until the total number of characters written reaches the value of n. If the length of the string pointed to by src is greater than n, the memory are pointed to by dest is not terminated by a null character. The function does not work properly with overlapping pointers (see Chapter VII Section VII.18.2). The function was described in Chapter III. XI.12.2.3 memset() #include void *memset(void *p, int c, size_t n);
The memset() function sets the first n bytes of the object pointed to by p to the value c converted to unsigned char. It returns the pointer p.
The following example copies the character ‘A’ into the five first characters of the string s: $ cat libc_memset1.c #include #include #include #define MAX_STRING_LEN 64 int main(void) { char s[MAX_STRING_LEN] = “Hello world”; memset(s, ‘A’, 5); printf(“%s\n”, s); return EXIT_SUCCESS; } $ gcc -o libc_memset1 -std=c99 -pedantic libc_memset1.c $ ./libc_memset1 AAAAA world
It is often called to initialize memory areas with the null character: $ cat libc_memset2.c #include #include #include #define MAX_STRING_LEN 64 int main(void) { char s[MAX_STRING_LEN]; memset(s, ‘\0’, sizeof s); return EXIT_SUCCESS; }
XI.12.2.4 memcpy() Until C95: #include
void *memcpy(void *dest,const void *src,size_t n);
As of C99: #include void *memcpy(void * restrict dest,const void * restrict src,size_t n);
The function memcpy() copies n characters of the memory area pointed to by src to the memory block pointed to by dest. It returns the pointer dest. The pointers src and dest can point to any object including strings. The function does not work properly with overlapping pointers (see Chapter VII Section VII.18.2). We described the function with examples in Chapter VI Section VI.3.10. XI.12.2.5 memmove() #include void *memmove(void *dest, const void *src, size_t n);
The function memmove() copies n characters of the memory area pointed to by src to the memory block pointed to by dest. It returns the pointer dest. It is less efficient than memcpy() because unlike memcpy() it allows overlapping pointers by allocating a temporary memory area for the copy (see Chapter VII Section VII.18.2).
XI.12.3 Concatenation functions XI.12.3.1 strcat() Until C95: #include char *strcat(char *dest,const char *src);
As of C99: #include char *strcat(char * restrict dest,const char * restrict src);
The function strcat() concatenates the string pointed to by src to the string pointed to by dest and returns dest. The null character of the string pointed to by src is also copied while the null character of the string pointed to by dest is overwritten. You have to ensure the object pointed to by dest is large enough take the resulting string. The function was described in
Chapter III Section III.4.4.4. XI.12.3.2 strncat() Until C95: #include char *strncat(char *dest,const char *src, size_t n);
As of C99: #include char *strncat(char * restrict dest,const char * restrict src,size_t n);
The function strncat() performs the same task as strcat() except it copies at most the n first characters from the string pointed to by src. A null character is always appended to dest. You have to ensure the object pointed to by dest is large enough take the resulting string. The function was described in Chapter III Section III.4.4.4
XI.12.4 Look up functions XI.12.4.1 strchr() #include char *strchr(const char *s, int c);
The function strchr() searches the string pointed to by s for the character c converted to char and returns a pointer to the first character matching c. It returns a null pointer if the character is not found. The following example searches the string s for the characters ‘x’ and ‘y’: $ cat libc_strchr.c #include #include #include int main(void) { char s[]=“w=5 x=6 y=7 z=8”; char *p; char var;
var = ‘x’; printf(“\nSearch for %c:\n”, var); p = strchr(s, var); if ( p != NULL ) printf(“strchr(\”%s\”, ‘%c’) returns %s\n”, s, var, p); else printf(“strchr(\”%s\”, ‘%c’) returns NULL\n”,s, var); var = ‘y’; printf(“\nSearch for %c:\n”, var); p = strchr(s, var); if ( p != NULL ) printf(“strchr(\”%s\”, ‘%c’) returns %s\n”, s, var, p); else printf(“strchr(\”%s\”, ‘%c’) returns NULL\n”,s, var); var = ‘u’; printf(“\nSearch for %c:\n”, var); p = strchr(s, var); if ( p != NULL ) printf(“strchr(\”%s\”, ‘%c’) returns %s\n”, s, var, p); else printf(“strchr(\”%s\”, ‘%c’) returns NULL\n”,s , var); return EXIT_SUCCESS; } $ gcc -o libc_strchr -std=c99 -pedantic libc_strchr.c $ ./libc_strchr Search for x: strchr(“w=5 x=6 y=7 z=8”, ‘x’) returns x=6 y=7 z=8 Search for y: strchr(“w=5 x=6 y=7 z=8”, ‘y’) returns y=7 z=8 Search for u: strchr(“w=5 x=6 y=7 z=8”, ‘u’) returns NULL
XI.12.4.2 strrchr() #include char *strrchr(const char *s, int c);
The function strrchr() searches the string pointed to by s for the character c converted to char and returns a pointer to the last character matching c. It returns a null pointer if the character is not found. The following example searches the string s for the characters ‘5’: $ cat libc_strrchr.c #include #include #include int main(void) { char s[]=“a=5 b=6 c=5 d=8”; char *p; char var; var = ‘5’; printf(“Search for %c:\n”, var); p = strchr(s, var); if ( p != NULL ) printf(“strchr(\”%s\”, ‘%c’) returns %s\n”, s, var, p); else printf(“strchr(\”%s\”, ‘%c’) returns NULL\n”, s, var); var = ‘5’; printf(“\nSearch for %c:\n”, var); p = strrchr(s, var); /* search in reverse order */ if ( p != NULL ) printf(“strrchr(\”%s\”, ‘%c’) returns %s\n”, s, var, p); else printf(“strrchr(\”%s\”, ‘%c’) returns NULL\n”, s, var);
return EXIT_SUCCESS; } $ gcc -o libc_strrchr -std=c99 -pedantic libc_strrchr.c $ ./libc_strrchr Search for 5: strchr(“a=5 b=6 c=5 d=8”, ‘5’) returns 5 b=6 c=5 d=8 Search for 5: strrchr(“a=5 b=6 c=5 d=8”, ‘5’) returns 5 d=8
XI.12.4.3 strpbrk() #include char *strpbrk(const char *s1, const char *s2);
The function strpbrk() searches the string s1 for the characters of the string pointed to by s2 and returns a pointer to the first character within s1 matching one of them. The following example searches the string s1 for the characters 6, 7, 8, and 9: $ cat libc_strchr.c #include #include #include int main(void) { char s1[]=“w=5 x=6 y=7 z=8”; char s2[] = “6789”; char *p; char var; printf(“Search for characters %s:\n”, s2); p = strpbrk(s1, s2); if ( p != NULL ) printf(“strpbrk(\”%s\”, ‘%s’) returns %s\n”, s1, s2, p); else printf(“strchr(\”%s\”, ‘%s’) returns NULL\n”, s1, s2);
return EXIT_SUCCESS; } $ gcc -o libc_strpbrk -std=c99 -pedantic libc_strpbrk.c $ ./libc_strpbrk Search for characters 6789: strpbrk(“w=5 x=6 y=7 z=8”, ‘6789’) returns 6 y=7 z=8
XI.12.4.4 strstr() #include char *strstr(const char *s1, const char *s2);
The function strstr() searches the string s1 for the sub-string s2 and returns a pointer to it if found. Otherwise, it returns a null pointer. In the following example, the function returns a pointer to a sub-string “src” within the string held in the array s: $ cat libc_strstr.c #include #include #include int main(void) { char s[]=“base=/opt/project src=/opt/project/src lib=/opt/project/lib”; char *p; char *field; field = “src”; p = strstr(s, field); printf(“Search %s for %s:\n”, s, field); printf(“strstr() returns %s\n”, p); return EXIT_SUCCESS; } $ gcc -o libc_strstr -std=c99 -pedantic libc_strstr.c $ ./libc_strstr Search base=/opt/project src=/opt/project/src lib=/opt/project/lib for src: strstr() returns src=/opt/project/src lib=/opt/project/lib
XI.12.4.5 strtok() Until C95: #include char *strtok(char *s1, const char *sep);
As of C99: #include char *strtok(char * restrict s1,const char * restrict sep);
The function strtok() splits the string pointed to by s1 into of sub-strings according to the characters contained in the string pointed to by sep. Each character of the string pointed to by sep is treated as a delimiter separating two substrings within s1. The first call to strtok() reads the string s1 character by character and ignores leading characters of s1 (as if they were not present) also contained in the string pointed to by sep. if no delimiter is found within s1, the function returns s1. If s1 contains only delimiter characters present in the string pointed to by sep, a null pointer is returned. Otherwise, if s1 contains characters not listed in sep and if a delimiter character is found within s1, the function splits the string pointed to by s1 into two sub-strings: the first one is composed of characters preceding the delimiter and the second one is composed of the rest of string following the delimiter character. The function returns a pointer to the first sub-string and registers a pointer to the second sub-string for the next calls. The first call returns a pointer to the first substring. The second call, taking a null pointer as first argument, perform the same process as the first call by breaking the second substring previously registered into two new substrings according to the delimiter characters passed as second argument… A typical usage is shown below: o First call: p = strtok(s1, sep_list1); o Second call: p = strtok(NULL, sep_list2); o Third call: p = strtok(NULL, sep_list2); o Etc. The strings holding the delimiter characters sep_list1, sep_list2…may be identical. In our first example, we will use the same delimiters #, % and - for all the calls: $ cat libc_strtok1.c #include
#include #include int main(void) { char s[]=”#%lib%src#include”; char *p; /*split into: sub-string1=lib that is returned sub-string2=src#include that is registred */ p = strtok(s, “#%-“); printf(“%s\n”, p); /* split into: sub-string1=src that is returned sub-string2=include that is registred */ p = strtok(NULL, “#%-“); printf(“%s\n”, p); /* split into: sub-string1=include that is returned sub-string2=NULL : end of processing */ p = strtok(NULL, “#%-“); printf(“%s\n”, p); return EXIT_SUCCESS; } $ gcc -o libc_strtok -std=c99 -pedantic libc_strtok.c $ ./libc_strtok lib src include
Explanation: o The array s holds the string “#%lib%src#include”. o The first call p = strtok(s, “#%-“), ignoring the leading delimiters # and % within s, splits s into two sub-strings separated by % within s: a pointer to the first sub-string “lib” is returned and a pointer to the second sub-string “src#include” is stored for the next call.
o The second call p = strtok(NULL, “#%-“) splits “src#include” into two sub-strings separated by # within s: a pointer to the first sub-string “src” is returned and a pointer to the rest of string “include” is stored for the next call. o The third call p = strtok(NULL, “#%-“) splits “include” into two sub-strings: a pointer to the first sub-string “include” is returned. Here is a second example: $ cat libc_strtok2.c #include #include #include int main(void) { char s[]=”##init#x=5#y=7#z=8”; char *p; printf(“Split %s. Delimiters are: %s\n”, s, “#%”); /* return init. Register x=5#y=7#z=8 */ p = strtok(s, “#%”); printf(“Processing: %s\n”, p); /* return x. Register 5#y=7#z=8 */ p = strtok(NULL, “=”); printf(“name=%s “, p); /* return 5. Register y=7#z=8 */ p = strtok(NULL, “#”); printf(“value=%s\n”, p); /* return y. Register 7#z=8 */ p = strtok(NULL, “=”); printf(“name=%s “, p); /* return 7. Register z=8 */ p = strtok(NULL, “#”); printf(“value=%s\n”, p); /* return z. Register 8 */ p = strtok(NULL, “=”);
printf(“name=%s “, p); /* return 8. */ p = strtok(NULL, “#”); printf(“value=%s\n”, p); return EXIT_SUCCESS; } $ gcc -o libc_strtok2 -std=c99 -pedantic libc_strtok2.c $ ./libc_strtok2 Split ##init#x=5#y=7#z=8. Delimiters are: #% Processing: init name=x value=5 name=y value=7 name=z value=8
The third example is a bit more complex. It reads a configuration file, retrieves section names, and fields with their values. A configuration file has the following form: [section name] field=value field=value … [section name] field=value field=value …
The configuration file our program is going to scan is given below: $ cat config.ini [LOCATION] base=/opt/project/proj1 bin=/opt/project/proj1/bin lib=/opt/project/proj1/lib src=/opt/project/proj1/src header=/opt/project/proj1/include log=/opt/project/proj1/log [LOG] nb_logfiles=90 max_days_logfile=31 [TEST]
DEBUG=yes
The program that extracts section names and their fields with their values is given below: $ cat libc_strtok3.c #include #include #include #define CONFIG_FILE “config.ini” #define MSG_LEN 255 #define LINE_LEN 255 int read_config_file(char *filename) { FILE *pf = NULL; char err_msg[MSG_LEN]; /* contain error messages */ char line[LINE_LEN]; /* contain a line from input file */ char *section_name = NULL; char *field = NULL; char *value = NULL; int line_number = 0; if ( filename == NULL ) { fprintf(stderr, “filename is NULL\n”); return 0; } if ( ( pf = fopen(filename , “r”) ) == NULL) { sprintf( err_msg, “File %s”, filename ); perror(err_msg); return 0; } /* reading input stream line by line */ while ( fgets(line, LINE_LEN-1, pf) != NULL ) { line_number++; /* Test if it is a section. A section is enclosed between [ and ] */ if ( strchr(line, ‘[‘ ) ) { /* get section name */ if ( ( section_name = strtok(line, “[]”) ) != NULL) printf(“\nSection %s:\n”, section_name);
} else { /* This a field or blank line */ field = value = NULL; if ( strchr(line, ‘=’ ) ) { char *p = NULL; /* get field name */ field = strtok(line, “=”); /* get value of the field */ if ( field != NULL ) { value = strtok(NULL, “=”); /* remove newline character from value */ if ( value != NULL && ( p = strchr(value, ‘\n’) ) != NULL ) *p = ‘\0’; } if (! value || ! field ) printf(“Line %d badly formed\n”, line_number); else printf(“Field %s: %s\n”, field, value); } else { /* ignore this line. Does not contain field */ continue; } } } fclose(pf); return 1; } int main(void) { read_config_file(CONFIG_FILE); return EXIT_SUCCESS; }
Let us run it. We get this: $ gcc -o libc_strtok3 -std=c99 -pedantic libc_strtok3.c $ ./libc_strtok3 Section LOCATION:
Field base: /opt/project/proj1 Field bin: /opt/project/proj1/bin Field lib: /opt/project/proj1/lib Field src: /opt/project/proj1/src Field header: /opt/project/proj1/include Field log: /opt/project/proj1/log Section LOG: Field nb_logfiles: 90 Field max_days_logfile: 31 Section TEST: Field DEBUG: yes
XI.12.4.6 memchr() #include void *memchr(const void *s, int c, size_t n);
The function memchr() searches the first n bytes of the memory area pointed to by s for the character c, firstly converted to unsigned char, and returns a pointer to the first character matching c. It returns a null pointer if the character c has not been found. The pointer s does not need to point to a string. The following example searches the string s for the characters ‘x’ and ‘y’: $ cat libc_memchr.c #include #include #include int main(void) { char s[]=“w=5 x=6 y=7 z=8”; char *p; char var; var = ‘x’; printf(“\nSearch for %c:\n”, var); p = memchr(s, var, strlen(s));
if ( p != NULL ) printf(“memchr(\”%s\”, ‘%c’, %d) returns %s\n”, s, var, strlen(s), p); else printf(“memchr(\”%s\”, ‘%c’, %d) returns NULL\n”,s, var, strlen(s)); var = ‘y’; printf(“\nSearch for %c:\n”, var); p = memchr(s, var, strlen(s)); if ( p != NULL ) printf(“memchr(\”%s\”, ‘%c’, %d) returns %s\n”, s, var, strlen(s), p); else printf(“memchr(\”%s\”, ‘%c’, %d) returns NULL\n”,s, var, strlen(s)); var = ‘u’; printf(“\nSearch for %c:\n”, var); p = memchr(s, var, strlen(s)); if ( p != NULL ) printf(“memchr(\”%s\”, ‘%c’, %d) returns %s\n”, s, var, strlen(s), p); else printf(“memchr(\”%s\”, ‘%c’, %d) returns NULL\n”,s , var, strlen(s)); return EXIT_SUCCESS; } $ gcc -o libc_memchr -std=c99 -pedantic libc_memchr.c $ ./libc_memchr Search for x: memchr(“w=5 x=6 y=7 z=8”, ‘x’, 15) returns x=6 y=7 z=8 Search for y: memchr(“w=5 x=6 y=7 z=8”, ‘y’, 15) returns y=7 z=8 Search for u: memchr(“w=5 x=6 y=7 z=8”, ‘u’, 15) returns NULL
XI.12.5 management error function XI.12.5.1 strerror()
#include char *strerror(int errnum);
The strerror() function returns the error message associated with the error number errnum. The function was described in Chapter X Section X.7.3
XI.12.6 string length #include size_t strlen(const char *s);
The strlen() function returns the number of characters (i.e. bytes) in the string pointed to by s.
XI.13 XI.13.1 Types The header file time.h defines the types clock_t, time_t and struct tm. The types clock_t and time_t are integer types used to store time. The structure tm has at least the following members: o int tm_sec: seconds in the integer interval [0-60] o int tm_min: minutes in the integer interval [0-59] o int tm_hour: hours in the integer interval [0-23] o int tm_mday: day of the month in the integer interval [1-31] o int tm_wday: day number of the week in the integer interval [0-6]. Sunday is represented by 0, Monday by 1… o int tm_mon: month number of the year in the integer interval [0-11]. January is denoted by 0, February 1… o int tm_year: number of years since 1900. The value stored in this member added to 1900 yields the complete year. o int tm_yday: day of the year counted from January 1, in the integer interval [0-365] o int tm_isdst: DST flag (Daylight Saving Time). If the member holds the value 0, DST is disabled. If it holds a positive value, DST is active. If it holds a negative value, the information is not available.
XI.13.2 Functions
XI.13.3 time() #include time_t time(time_t *p_time);
The function time() returns the current time, based on the Gregorian calendar, if available and assigns it to the object pointed to p_time if different from the null pointer. If p_time is a null pointer, the value will not be stored. If the current date is not available, it returns (time_t)-1. The return value depends on the implementation. On UNIX systems, UNIX-based systems (Linux, BSD systems), and Microsoft operating systems (and more generally on POSIX-compliant systems), it returns the number of seconds elapsed since the Epoch (00:00:00 UTC, January 1, 1970). The following example displays, on a Linux system, the number of seconds elapsed from the Epoch: $ cat libc_time1.c #include #include #include #include int main(void) { time_t t; if ( (t = time(NULL) ) != (time_t)-1 ) printf(“%ju seconds elapsed since the Epoch\n”, (uintmax_t)t); return EXIT_SUCCESS; } $ gcc -o libc_time1 -std=c99 -pedantic libc_time1.c $ ./libc_time1 1449678851 seconds elapsed since the Epoch
If we run it again 10 s later: $ ./libc_time1 1449678862 seconds elapsed since the Epoch
The following example displays, on our Linux system, the number of seconds elapsed between two calls to the function time():
$ cat libc_time2.c #include #include #include #include int main(void) { time_t t1, t2; t1 = time(NULL); sleep(10); t2 = time(NULL); printf(“%d seconds elapsed between the two calls\n”, t2 - t1); return EXIT_SUCCESS; } $ gcc -o libc_time2 -std=c99 -pedantic libc_time2.c $ ./libc_time2 10 seconds elapsed between the two calls
XI.13.4 difftime() #include double difftime(time_t t2, time_t t1);
The function difftime() returns the number of seconds elapsed between t2 and t1 whatever the way the values of t2 and t1 are encoded. On UNIX systems (and POSIX-compliant systems), it produces the same output as t2-t1. The previous example, libc_time1.c, should have been written as follows (portable code): $ cat libc_difftime.c #include #include #include #include int main(void) { time_t t1, t2; t1 = time(NULL);
sleep(10); t2 = time(NULL); printf(“%f seconds elapsed between the two calls\n”, difftime(t2,t1) ); return EXIT_SUCCESS; } $ gcc -o libc_difftime -std=c99 -pedantic libc_difftime.c $ ./libc_difftime 10.000000 seconds elapsed between the two calls
XI.13.5 localtime() #include struct tm *localtime(const time_t *p_time);
The function localtime() takes a pointer to an object of type time_t and returns a pointer to an object of type struct tm whose members are filled according to the local time zone with the values corresponding to the provided time (pointed to by p_time). If the function cannot translate the object pointed to by p_time, it returns a null pointer. In the following example, the function localtime() fills the tm structure, expressed as local time, corresponding to the current time returned by time() and displays its contents: $ cat libc_localtime #include #include #include int main(void) { char *months[] = { “Jan”, “Feb”, “Mar”, “Apr”, “May”, “Jun”, “Jul”, “Aug”, “Sep”, “Oct”, “Nov”, “Dec” }; char *wdays[] = { “Sun”, “Mon”, “Tue”, “Wed”, “Thur”, “Fri”, “Sat” }; time_t tm_now = time(NULL); if ( tm_now != (time_t)-1 ) { struct tm *p_tm_now = localtime(&tm_now); if (p_tm_now != NULL) printf( “%02d/%02d/%d (%s, %s) %02d:%02d:%02d\n”,
p_tm_now->tm_mon+1, /* month of the year [0-11] */ p_tm_now->tm_mday, /* day of the month [1-31] */ p_tm_now->tm_year+1900, /* years since 1900 */ months[p_tm_now->tm_mon], /* month of the year [0-11] */ wdays[p_tm_now->tm_wday], /* day of the week [0-6] */ p_tm_now->tm_hour, p_tm_now->tm_min, p_tm_now->tm_sec ); } return EXIT_SUCCESS; } $ gcc -o libc_localtime -std=c99 -pedantic libc_localtime.c $ ./libc_localtime 12/09/2015 (Dec, Wed) 19:28:02
XI.13.6 gmtime() #include struct tm *gmtime(const time_t *p_time);
The function gmtime() takes a pointer to an object of type time_t and returns a pointer to an object of type struct tm whose members are filled with values, according to the time standard known as UTC (Coordinated Universal Time), corresponding to the given time (pointed to by p_time). If the function cannot translate the object pointed to by p_time, it returns a null pointer. In the following example, the function gmtime() fills the tm structure, expressed as a UTC time, corresponding to the current time returned by time()and displays its contents: $ cat libc_gmtime #include #include #include int main(void) { char *months[] = { “Jan”, “Feb”, “Mar”, “Apr”, “May”, “Jun”, “Jul”, “Aug”, “Sep”, “Oct”, “Nov”, “Dec” }; char *wdays[] = { “Sun”, “Mon”, “Tue”, “Wed”, “Thur”, “Fri”, “Sat” };
time_t tm_now = time(NULL); if ( tm_now != (time_t)-1 ) { struct tm *p_tm_now = gmtime(&tm_now); if (p_tm_now != NULL) printf( “%02d/%02d/%d (%s, %s) %02d:%02d:%02d\n”, p_tm_now->tm_mon+1, /* month of the year [0-11] */ p_tm_now->tm_mday, /* day of the month [1-31] */ p_tm_now->tm_year+1900, /* years since 1900 */ months[p_tm_now->tm_mon], /* month of the year [0-11] */ wdays[p_tm_now->tm_wday], /* day of the week [0-6] */ p_tm_now->tm_hour, p_tm_now->tm_min, p_tm_now->tm_sec ); } return EXIT_SUCCESS; } $ gcc -o libc_gmtime -std=c99 -pedantic libc_gmtime.c $ ./libc_gmtime 12/09/2015 (Dec, Wed) 18:28:39
XI.13.7 asctime() #include char *asctime(const struct tm *p_tm);
The function asctime() translates the object pointed to by p_tm into string. For example: $ cat libc_asctime.c #include #include #include int main(void) { time_t tm_now = time(NULL); if ( tm_now != (time_t)-1 ) { struct tm *p_tm_now = localtime(&tm_now);
if (p_tm_now != NULL) printf( “%s\n”, asctime(p_tm_now) ); } return EXIT_SUCCESS; } $ gcc -o libc_asctime -std=c99 -pedantic libc_asctime.c $ ./libc_asctime Wed Dec 9 19:29:15 2015
XI.13.8 ctime() #include char *ctime(const time_t *p_time);
The function asctime() converts the object pointed to by p_time into local time before translating it to string. It is equivalent to: asctime(localtime(p_time));
For example: $ cat libc_ctime.c #include #include #include int main(void) { time_t tm_now = time(NULL); if ( tm_now != (time_t)-1 ) printf( “%s\n”,ctime(&tm_now) ); return EXIT_SUCCESS; } $ gcc -o libc_ctime -std=c99 -pedantic libc_ctime.c $ ./libc_ctime Wed Dec 9 19:29:35 2015
XI.13.9 mktime()
#include time_t mktime(struct tm *p_tm);
The function mktime() converts the object p_tm corresponding to the local time into an object of type time_t that is returned. You do not have to set the members tm_wday and tm_yday since they are ignored. It is interesting to note that those members are automatically set by the function according to the local time. If the conversion cannot be done, it returns (time_t)-1. In the following example, we tell the function to compute the day of the week (member tm_wday) and the day of the year (member ym_yday): $ cat libc_mktime1.c #include #include #include int main(void) { char *wdays[] = { “Sun”, “Mon”, “Tue”, “Wed”, “Thur”, “Fri”, “Sat” }; struct tm loc_time; /* set time to 07/04/1961 23:11 */ loc_time.tm_sec = 00; loc_time.tm_min = 11; loc_time.tm_hour = 23; loc_time.tm_mday = 4; loc_time.tm_mon = 6; /* July = 6 */ loc_time.tm_year = 1961 - 1900; loc_time.tm_isdst = 1; if ( mktime(&loc_time) != (time_t)-1 ) printf(“Day of the week=%s. It is the %d th day of the year\n”, wdays[loc_time.tm_wday], loc_time.tm_yday+1 ); return EXIT_FAILURE; } $ gcc -o libc_mktime1 -std=c99 -pedantic libc_mktime1.c $ ./libc_mktime1 Day of the week=Tue. It is the 185 th day of the year
The following example computes the number of seconds elapsed between two given dates 02/02/1980 00:00:00 and 02/03/1980 00:00:00: $ cat libc_mktime2.c
#include #include #include int main(void) { char *wdays[] = { “Sun”, “Mon”, “Tue”, “Wed”, “Thur”, “Fri”, “Sat” }; struct tm loc_time; time_t t1, t2; double nb_seconds; /* set time to 02/02/1980 00:00:00 */ loc_time.tm_sec = 00; loc_time.tm_min = 00; loc_time.tm_hour = 00; loc_time.tm_mday = 2; loc_time.tm_mon = 2; /* February = 1 */ loc_time.tm_year = 1980 - 1900; t1 = mktime(&loc_time); /* set time to 02/03/1980 00:00:00 */ loc_time.tm_sec = 00; loc_time.tm_min = 00; loc_time.tm_hour = 00; loc_time.tm_mday = 3; loc_time.tm_mon = 2; /* February = 1 */ loc_time.tm_year = 1980 - 1900; t2 = mktime(&loc_time); nb_seconds = difftime(t2, t1); printf(“Nb seconds elapsed: %f\n”, nb_seconds ); printf(“Nb hours elapsed: %f\n”, nb_seconds/3600 ); return EXIT_FAILURE; } $ gcc -o libc_mktime2 -std=c99 -pedantic libc_mktime2.c $ ./libc_mktime2 Nb seconds elapsed: 86400.000000 Nb hours elapsed: 24.000000
XI.13.10 strftime()
Until C95: #include size_t strftime(char *s, size_t n, const char *fmt, const struct tm *p_tm);
As of C99: #include size_t strftime(char * restrict s, size_t n, const char * restrict fmt, const struct tm * restrict p_tm);
The strftime() function converts the time, stored in the object pointed to by p_tm, into string according to the format fmt, and stores it in memory area pointed to by s. No more than n characters are written to s. The function is affected by the locale of the LC_TIME category. The format fmt is introduced by the character % followed by an optional modifier E or O, followed by a conversion specifier. Table XI‑2 lists the conversion specifiers you can bring into play. The following example shows the output for each conversion specifier: $ cat libc_strftime.c #include #include #include int main(void) { int i; struct tm *p_tm_now; const int array_len = 255; char s[ array_len ]; time_t t_now = time(NULL); char *fmt[] = { “%A”, “%a”, “%B”, “%b”, “%c”, “%D”, “%d”, “%e”, “%F”, “%g”, “%G”, “%h”, “%H”, “%I”, “%j”, “%m”, “%M”, “%n”, “%p”, “%R”, “%r”, “%S”, “%T”, “%t”, “%U”,”%u”, “%V”, “%W”, “%w”, “%X”, “%x”, “%Y”, “%y”, “%Z”, “%z”, , “%%” }; size_t fmt_len = sizeof fmt/sizeof fmt[0]; if ( t_now != (time_t)-1 ) { p_tm_now = localtime(&t_now); for (i=0; i < fmt_len; i++) { strftime(s, array_len-1, fmt[i], p_tm_now); printf(“%s yields %s\n”, fmt[i], s); } }
return EXIT_FAILURE; } $ gcc -o libc_strftime -std=c99 -pedantic libc_strftime.c $ ./libc_strftime %A yields Thursday %a yields Thu %B yields December %b yields Dec %c yields Thu Dec 10 11:34:12 2015 %D yields 12/10/15 %d yields 10 %e yields 10 %F yields 2015-12-10 %g yields 15 %G yields 2015 %h yields Dec %H yields 11 %I yields 11 %j yields 344 %m yields 12 %M yields 31 %n yields %p yields AM %R yields 11:31 %r yields 11:31:12 AM %S yields 12 %T yields 11:31:12 %t yields %U yields 49 %u yields 4 %V yields 50 %W yields 49 %w yields 4 %X yields 11:31:12 %x yields 12/10/15 %Y yields 2015 %y yields 15 %Z yields CET %z yields +0100 %% yields %
Conversion specifier
Description
%A
Name of the day of the week such as “Thursday”
%a
Name of the day of the week in abbreviated form such as “Thu”
%B
Name of the month such as “December”
%b
Name of the month in abbreviated form such as “Dec”
%c
Date and time with a format depending on the locale such as “Thu Dec 10
%D
Same as %m/%d/%y such as “12/10/15”
%d
Day of the month (in [01-31]) such as “10”
%e
Day of the month ([1-31]). If composed of a single digit, a leading space is added such as ” 1”
%F
Same as %Y−%m−%d (ISO 8601 representation) such as “2015-12-10”
%g
Last two digits of the week-based year (in [00-99]) such as “15” (ISO 8601)
%G
Week-based year such as “2015” (ISO 8601)
%h
Same as %b such as “Dec”
%H
Hour in 24-hour clock format (in [00-23]) such as “15”
%I
Hour in 12-hour clock format (in [00-12]) such as “03”
%j
Day of the year (in [001-366]) such as “344”
11:34:12 2015”
%M
Minutes (in [01-59]) such as “31”
%m
Month of the year (in [01-12]) such as “12”
%n
Newline character (’\n’)
%p
AM or PM according to the 12-hour clock
%R
Same as %H:%M such as “11:31”
%r
Time in 12-hour clock format such as “11:31:12 AM”
%S
Seconds (in [00-60]) such as “01”
%T
Same as %H:%M:%S ((ISO 8601 representation)) such as “14:36:09”
%t
Horizontal tab (’\t’)
%U
Week number of the year (in [00-53]) such as “02”. Week one starting with the first Sunday of the year.
%u
Day of the week as specified by ISO 8601 (in [1-7]).
%V
Week number of the year (in [01-53]) according to ISO 8601 such as “03”.
%W
Week number of the year (in [00-53]) such as “03”. Week one starting with the first Monday of the year.
%w
Day number of the week (in [0-6]). Sunday denoted by 0, Monday by 1…
%X
Time according to the locale
%x
Date according to the locale
%Y
Year such as “2015”
%y
Last two digits of the year (in [00-99]) such as “15”
%Z
Name of the time zone such as “CET”. If the time zone cannot be determined, an empty string is output.
%z
Offset from UTC using the ISO 8601 representation +HHMM or –HHMM such as “+1000” meaning UTC + 01:00 while “-0230” means UTC - 02:30.
%%
Output % Table XI‑2 Conversion specifiers for strftime()
Figure XI‑1 ISO 8601 Week
You have noticed that the week generated by %U starts with Sunday, while the week produced by %W starts with Monday. The rationale for that is depending on countries, a week may start with Monday or Sunday. Therefore, the week one is the week containing the first Monday or Sunday of the year. To overcome the discrepancies between the representation of date, time and the meaning of a week, the standard ISO 8601 was created. The ISO 8601 is a standard created in 1988, based on the Gregorian calendar, describing a standard format for date and time. You have noticed the week specified by ISO 8601 (%V) is different from its usual meaning. An ISO 8601 week starts on Monday and the very first week of the year (week one) is the week that contains the first Thursday of the year. Therefore, since the first Thursday occurs between January 1st and January 7th, an ISO 8601 week contains always January 4th and starts between December 29th of the previous year and January 4th. It also implies the last week of the year (52 or 53) is the week that
contains the last Thursday of the year, terminating between December 28th and January 3th (see Figure XI‑1). The modifiers E and O can be utilized with some specifiers. They define a specific format depending on the locale of the category LC_CTIME. The E modifier alters the way the time and the date of the current locale is output while the modifier O uses the appropriate numeric symbols for the current locale. If there is no alternative, E and O are ignored. Modifier
Specifiers
E
%c %EC %x %X %y %Y
O
%d %e %H %I %m %M %S %u %U %V %w %W %y Figure XI‑2 E and O modifiers used by strftime()
XI.14 A signal is a basic way for processes to communicate with each other. The number of signals allowed in your system is limited and defined by your system. Within systems, a signal is identified by a number, known as a signal ID and a macro, holding the name of the signal, representing the signal ID. For example, on UNIX and UNIX-based systems, a process can send a signal to another process to stop it or to terminate it. The C language does not specify the way processes can communicate with each other (which can be done through system calls) but defines how a C program receiving a signal (from another process or the system itself) can handle it. A signal can be sent to the running process from the system (hardware or the kernel) when an event occurs (such as an attempt to access an invalid memory address, I/O interruptions…) or from the program itself or another process. A signal may be synchronous and asynchronous. A synchronous signal is generated when an error is generated or an instruction cannot be performed while a statement is being processed. An asynchronous signal can be received by the running program at any time in an unforeseeable way. The C language specifies only the following macros representing signals (that can also be sent by another process): o SIGABRT: signal sent by the function abort(). o SIGFPE: sent by the system when an error in an arithmetic operation occurs (division by 0 or overflow). o SIGILL: sent by the system when an instruction cannot be executed (illegal instruction):
instruction not allowed, unknown instruction… o SIGINT: signal sent by an interactive device such as a terminal. On UNIX systems and UNIX-based systems, it is sent when you press while the program is running on the foreground. o SIGSEGV: sent by the system when the program attempts to access an invalid memory address. o SIGTERM: signal terminating the program Generally, programmers do not use C functions for dealing with signals because more powerful functions are required. In POSIX operating systems (such as UNIX systems) and UNIX-based systems, there are many more signals defined, along with more appropriate functions such as kill(), sigaction(), sigprocmask(), sigemptyset(), sigfillset(), sigpending()… Because the C language is supposed to be used in any operating systems, only two functions manipulating signals are specified: signal() and raise(). XI.14.1.1 raise() #include int raise(int signum);
The raise() function generates the signal whose ID is specified by signum. The signal is not posted to another process but to the running program itself: the running programming sends a signal to itself! XI.14.1.2 signal() #include void (*signal(int signum, void (*handler)(int)))(int);
The function signal() registers the function handler() that will be called when the running program receives the signal whose ID is sinum. The function signal() has two parameters and returns a pointer to void. The first parameter sinumg is a signal ID and the second parameter is a pointer to a function, known as a signal handler, having the following prototype: void handler(int signum);
The signal() function returns handler (the pointer to the function called) or the value of the macro SIG_ERR if the function cannot be registered. The default handler is called SIG_DFL. Most of the time, the default handler terminates the program. Another macro SIG_IGN also can be used to ignore a signal. When a signal is ignored, it has no effect on the program.
In the following example, the function raise() generates the signal SIGINT that is handled by the function catch_int(): $ cat libc_signal1.c #include #include #include void catch_int(int sig) { printf(“Signal %d received\n”, sig); switch (sig) { case SIGINT: printf(“SIGINT received\n”); break; default: printf(“Signal unknown\n”); } } int main(void) { if ( signal(SIGINT, catch_int) == SIG_ERR ) { /* cannot handle signal */ printf(“Cannot register catch_int to handle SIGINT signal\n”); } else { /* handle signal */ printf(“catch_int registered to handle SIGINT signal\n”); } raise(SIGINT); /* generates SIGINT signal */ printf(“Leaving program…\n”); return EXIT_SUCCESS; } $ gcc -o libc_signal1 -std=c99 -pedantic libc_signal1.c $ ./libc_signal1 catch_int registered to handle SIGINT signal Signal 2 received SIGINT received Leaving program…
In the following example, the function raise() generates the signal SIGINT that is ignored: $ cat libc_signal2.c #include #include
#include int main(void) { if ( signal(SIGINT, SIG_IGN) == SIG_ERR ) { printf(“Cannot ignore signal SIGINT\n”); } else { /* ignore signal */ printf(“SIGINT signal will be ignored\n”); } raise(SIGINT); /* generates SIGINT signal */ printf(“Leaving program…\n”); return EXIT_SUCCESS; } $ gcc -o libc_signal2 -std=c99 -pedantic libc_signal2.c $ ./libc_signal2 SIGINT signal will be ignored Leaving program…
In the following example, the function raise() generates the signal SIGINT that is handled by the default function SIG_DFL: $ cat libc_signal3.c #include #include #include int main(void) { if ( signal(SIGINT, SIG_DFL) == SIG_ERR ) { /* cannot handle signal */ printf(“Cannot register catch_int to handle SIGINT signal\n”); } else { /* handle signal */ printf(“Default handler registered for SIGINT signal\n”); } raise(SIGINT); /* generates SIGINT signal */ printf(“Leaving program…\n”); return EXIT_SUCCESS; } $ gcc -o libc_signal3 -std=c99 -pedantic libc_signal3.c $ ./libc_signal3 Default handler registered for SIGINT signal
We can see the default handler produces no output and just leaves the program when
invoked. Therefore, the message “Leaving program…” was not displayed. If the signal() function registers functions handling signals what happens if no function is registered to handle a signal? Let us try: $ cat libc_signal4.c #include #include #include int main(void) { raise(SIGINT); /* generates SIGINT signal */ printf(“Leaving program…\n”); return EXIT_SUCCESS; } $ gcc -o libc_signal4 -std=c99 -pedantic libc_signal4.c $ ./libc_signal4
We got the same behavior as if we had registered the signal handler SIG_DFL. Can we conclude that the default handler for all signals is always SIG_DFL? No! The default function that handles a signal depends on the implementation. For some signals, the handler function SIG_IGN may be set and for others, a default handler function is set when the program starts. Ignoring signals will not make your programmer more reliable. For example, if your program receives the signal SIGSEGV after attempting to access an invalid memory area, you may ignore it but it is not a good idea: such a signal indicates your program has corrupted memory and it has to terminate to avoid performing unreliable actions. Some signals may be ignored because they do not carry important information for your program. For example, in UNIX and UNIX-based systems, when a child process terminates, it sends the signal SIGCHLD to the parent. Such a signal may be ignored with no consequence. Not all signals can be ignored. Depending on the system, some signals cannot be ignored at all. For example, On UNIX and UNIX-based system, the signal SIGKILL can neither be caught nor ignored. Except for the signals SIGFPE, SIGSEGV, SIGILL, and any signal defined by the system raised after calculation errors, the execution of the program returns to the point where the program was interrupted by the signal. What happens if the running program is sent a signal while executing a signal handler? It depends. The implementation defines how new signals are managed while the signal handler function is executing.
Does the signal handler still remain active after it has been executed? It also depends on the implement. The implementation may choose to call the function signal(signum, SIG_DFL) before actually executing the handler and after receiving the signal signum. In the following example, in our operating system Oracle Solaris, the default handler is automatically set, before the handler catch_int() is executed: $ cat libc_signal5.c #include #include #include void catch_int(int sig) { printf(“Signal %d received\n”, sig); switch (sig) { case SIGINT: printf(“SIGINT received\n”); break; default: printf(“Signal unknown\n”); } } int main(void) { if ( signal(SIGINT, catch_int) == SIG_ERR ) { printf(“Cannot register catch_int to handle SIGINT signal\n”); } else { printf(“catch_int registered to handle SIGINT signal\n”); } printf(“\nFirst call to raise()\n”); raise(SIGINT); printf(“\nSecond call to raise()”); raise(SIGINT); printf(“Leaving program…\n”); return EXIT_SUCCESS; } $ gcc -o libc_signal5 -std=c99 -pedantic libc_signal5.c $ ./libc_signal5 catch_int registered to handle SIGINT signal First call to raise() Signal 2 received
SIGINT received Second call to raise()
We can see the first signal SIGINT sent was indeed handled by catch_int() but the second one was not. If you wish to keep the same handler for coming signals, you have to register it again, within the handler function, with the function signal() as in the following example: $ cat libc_signal6.c #include #include #include void catch_int(int sig) { printf(“Signal %d received\n”, sig); switch (sig) { case SIGINT: printf(“SIGINT received\n”); break; default: printf(“Signal unknown\n”); } if ( signal(SIGINT, catch_int) == SIG_ERR ) { printf(“Cannot register catch_int to handle SIGINT signal\n”); } else { printf(“catch_int registered to handle SIGINT signal\n”); } } int main(void) { if ( signal(SIGINT, catch_int) == SIG_ERR ) { printf(“Cannot register catch_int to handle SIGINT signal\n”); } else { printf(“catch_int registered to handle SIGINT signal\n”); } printf(“\nFirst call to raise()”); raise(SIGINT); printf(“\nSecond call to raise()”); raise(SIGINT);
printf(“\nLeaving program…\n”); return EXIT_SUCCESS; } $ gcc -o libc_signal6 -std=c99 -pedantic libc_signal6.c $ ./libc_signal6 catch_int registered to handle SIGINT signal First call to raise()Signal 2 received SIGINT received catch_int registered to handle SIGINT signal Second call to raise()Signal 2 received SIGINT received catch_int registered to handle SIGINT signal Leaving program…
In the following example, we will not use the raise() function to generate a signal but we spawn an error in an operation (the notorious division by 0): $ cat libc_signal7.c #include #include #include void handle_sig(int sig) { printf(“Sig %d received\n”, sig); switch (sig) { case SIGABRT: printf(“SIGABRT received\n”); break; ;; case SIGFPE: printf(“SIGFPE received\n”); break; ;; case SIGILL: printf(“SIGILL received\n”); break; ;; case SIGINT: printf(“SIGINT received\n”); break; ;; case SIGSEGV: printf(“SIGSEGV received\n”); break;
;; case SIGTERM: printf(“SIGTERM received\n”); break; ;; default: printf(“Signal unknown\n”); } } int main(void) { char x; if ( signal(SIGFPE, handle_sig) == SIG_ERR ) { printf(“Cannot register handle_sig to catch SIGFPE signal\n”); } else { printf(“handle_sig registered to catch SIGFPE signal\n”); } printf(“\n”); x = 1/0; /* division by 0 */ return EXIT_SUCCESS; } $ gcc -o libc_signal7 -std=c99 -pedantic libc_signal7.c libc_signal7.c: In function ‘main’: libc_signal7.c:48:9: warning: division by zero $ ./libc_signal7 handle_sig registered to catch SIGFPE signal handle_sig registered to catch SIGSEGV signal Sig 8 received SIGFPE received Arithmetic Exception (core dumped)
In the following example, the signal SIGSEGV raises after we attempt to copy a value into an invalid memory area (address 0): $ cat libc_signal8.c #include #include #include void catch_sig(int sig) { printf(“Sig %d received\n”, sig);
switch (sig) { case SIGABRT: printf(“SIGABRT received\n”); break; ;; case SIGFPE: printf(“SIGFPE received\n”); break; ;; case SIGILL: printf(“SIGILL received\n”); break; ;; case SIGINT: printf(“SIGINT received\n”); break; ;; case SIGSEGV: printf(“SIGSEGV received\n”); break; ;; case SIGTERM: printf(“SIGTERM received\n”); break; ;; default: printf(“Signal unknown\n”); } } int main(void) { char *p = NULL; if ( signal(SIGSEGV, catch_sig) == SIG_ERR ) { printf(“Cannot register catch_sig to catch SIGSEGV signal\n”); } else { printf(“catch_sig registered to catch SIGSEGV signal\n”); } printf(“\n”); *p = 10; /* illegal access to memory address */ return EXIT_SUCCESS; } $ gcc -o libc_signal8 -std=c99 -pedantic libc_signal8.c $ ./libc_signal8 catch_sig registered to catch SIGSEGV signal Sig 11 received
SIGSEGV received Segmentation Fault (core dumped)
In the following program, the user is prompted to type anything. If the word quit is typed, the signal SIGTERM is raised and handled by handle_term() that terminates the program. The signal SIGINT is first handled by handle_int() and then ignored: $ cat libc_signal9.c #include #include #include #include #define TERMINATE “quit” /* handler for SIGTERM */ void handle_term(int sig) { printf(“Sig %d received. Termination requested\n”, sig); exit(0); } /* handler for SIGINT */ void handle_int(int sig) { printf(“\n (sig %d) received but ignored\n”, sig); if ( signal(SIGINT, SIG_IGN) == SIG_ERR ) { printf(“Cannot ignore SIGINT signal\n”); } else { /* SIGINT is ignored */ printf(“Sorry SIGINT signal ignored\n”); } } int main(void) { int s_len = 64; char s[s_len]; if ( signal(SIGTERM, handle_term) == SIG_ERR ) { printf(“Cannot register handle_term to handle SIGTERM signal\n”); } else { printf(“handle_term registered to handle SIGTERM signal\n”); } if ( signal(SIGINT, handle_int) == SIG_ERR ) {
printf(“Cannot register handle_int to handle SIGINT signal\n”); } else { printf(“handle_int registered to handle SIGINT signal\n”); } printf(“\n”); while ( 1 ) { printf(“Type anything or type quit to end the program: “); if (fgets(s, s_len, stdin)) printf(“String typed=%s\n”, s); if ( strncmp(s, TERMINATE, strlen(TERMINATE)) == 0 ) raise(SIGTERM); } printf(“Leaving program\n”); return EXIT_SUCCESS; } $ gcc -o libc_signal9 -std=c99 -pedantic libc_signal9.c
Let us run it: we type the string hello and then quit that ends the program: $ ./libc_signal9 handle_term registered to handle SIGTERM signal handle_int registered to handle SIGINT signal Type anything or type quit to end the program: hello String typed=hello Type anything or type quit to end the program: quit String typed=quit Sig 15 received. Termination requested
Let us run it again: when we hit , the signal SIGINT is sent. The first signal SIGINT is handled by handle_int() that prints a message the first time and is ignored afterwards. To terminate the program, we finally type quit: $ ./libc_signal9 handle_term registered to handle SIGTERM signal handle_int registered to handle SIGINT signal Type anything or type quit to end the program: (sig 2) received but ignored
Sorry SIGINT signal ignored Type anything or type quit to end the program: String typed= Type anything or type quit to end the program: quit String typed=quit Sig 15 received. Termination requested
Unfortunately, the C library is not appropriate for managing signals. There are too many grey areas relating to signals depending on the implementation. For this reason, using functions provided by the system itself to control signals is a much better alternative generally chosen by programmers though not portable. To ensure a certain level of portability, programmers write a specific code for the systems on which their programs are supposed to be run.
XI.15 #include int setjmp(jmp_buf env); void longjmp(jmp_buf env, int val);
The function setjmp() saves the current environment of the program (registers of the CPU) in the object env (the type jmp_buf is an opaque type). It returns 0 after it is explicitly called and returns val after longjmp() is called. If the function longjmp() is called, it restores the environment that was stored in env by setjmp() and the program execution returns to the point where setjmp() was invoked: in this case, the function setjmp() returns the value val that is the second argument passed to longjmp(). Take note if longjmp() is called with the argument val holding the value of 0, the setjmp() function will return 1. This allows differentiating calls to setjmp() and longjmp(). Let us explain them with a basic example. In the following example, the first time setjmp() is encountered (line 9), the explicit call to the function saves the current state of the program in the objet env and returns 0. Next, in line 17, after longjmp() is called, the program execution goes to line 9 restoring the program environment and as if it was actually invoked, setjmp() returns with the value 1 passed as a second argument to longjmp(). $ ./libc_setjmp1.c 1 #include 2 #include 3 #include
4 5 int main(void) { 6 jmp_buf env; 7 int val; 8 9 if ( (val = setjmp(env)) == 0 ) { 10 printf(“setjmp() called\n”); 11 } else { 12 printf(“longjump() called with value %d\n”, val); 13 exit(EXIT_SUCCESS); 14 } 15 16 printf(“Call to longjmp()\n”); 17 longjmp(env, 1); /* goto to setjmp() that saves environment in env */ 18 19 printf(“This line will never be printed\n”); 20 return EXIT_SUCCESS; 21 } $ gcc -o libc_setjmp1 -std=c99 -pedantic libc_setjmp1.c $ ./libc_setjmp1 setjmp() called Call to longjmp() longjump() called with value 1
Explanation: o Lines 9-10: the setjmp() function is called. It records the program state in the object env. The first time line 9 is processed, the function setjmp() returns 0 causing the statement in line 10 to be executed, displaying the message “setjmp() called”. o Lines 11-13: if longjmp() is called, the execution goes to line 9 causing setjmp() to return the value passed as a second argument to longjmp(). The message “longjump() called with value…” is displayed and the program terminates with the call to the exit() function. o Line 17: the function longjmp() is called with the arguments env and 1 causing the execution to go to the point the setjmp() function was called to save the program state into the object env: it goes to line 9. Lines 29-20 will never be executed. The following example is similar except that the second argument passed to longjmp() is given by the user: $ ./libc_setjmp2.c #include #include #include
#include int main(void) { jmp_buf env; int val, v; const int s_len = 64; char s[s_len]; if ( (val = setjmp(env)) == 0 ) { printf(“setjmp() called\n”); } else { printf(“longjump() was called with value %d\n”, val); } printf(“\nType a digit or q to quit:”); fgets(s, s_len, stdin); if ( ! strncmp(s, “q”, 1) ) exit(EXIT_SUCCESS); v = atoi(s); longjmp(env, v); /* goto to setjmp() that saves environment in env */ return EXIT_SUCCESS; } $ gcc -o libc_setjmp2 -std=c99 -pedantic libc_setjmp2.c
Let us run it. Type an integral number or the letter q to terminate the program. Below, the digits 2, and 5 and the letter q are typed: $ ./libc_setjmp2 setjmp() called Type a digit or q to quit:2 longjump() was called with value 2 Type a digit or q to quit:5 longjump() was called with value 5 Type a digit or q to quit:q
You may think the pair setjmp()/longjmp() is similar to the goto statement. It is true but it does more. It also performs an unconditional branch, it is much more powerful in that it can
[97] perform a non-local branch. That is, it can jump to any point within the program , not only within the same function. Here is an example showing a non-local branch: $ cat libc_setjmp3.c #include #include #include #include jmp_buf pg_state; void f2() { printf(“ Within function f2(). I call longjmp()\n”); longjmp(pg_state, 1); printf(“ Within function f2(), after longjmp(). Never printed\n”); } void f1() { printf(“ Within function f1(). I call f2()\n”); f2(); printf(“ Within function f1(), after f2(). Never printed\n”); } int main(void) { int val; if ( (val = setjmp(pg_state)) == 0 ) { /* save programe state */ printf(“In main(), setjmp() called\n”); } else { /* longjmp() invoked */ printf(“Come back to main(), longjump() was called with value %d\n”, val); exit(EXIT_SUCCESS); } f1(); printf(“Line never printed\n”); return EXIT_SUCCESS; } $ gcc -o libc_setjmp3 -std=c99 -pedantic libc_setjmp3.c $ ./libc_setjmp3 In main(), setjmp() called Within function f1(). I call f2() Within function f2(). I call longjmp()
Come back to main(), longjump() was called with value 1
In our program, we called setjmp() to save the state of the program. Then, we called the function f1() that in turn called f2() that finally called longjmp(). When the function longjmp() is called (within f2()), the execution never came back to the function f1() that called f2() but came back directly to the point where the program environment was saved: it came back to setjmp() called in main(). Generally, setjmp() and longjmp() are used to emulate exceptions as would do languages such as C++, C# or ADA. An exception is an event (such signals) indicating an error and or any unexpected information to be taken into consideration. The C language does not support exceptions but can only emulate them through setjmp()/longjmp(). In the following example, we emulate exceptions. We check if the arguments passed to the program are integers. If no argument is passed to the program, longjmp() is called with the value EXC_NOARG. If one or more arguments are not integers, longjmp() is called with the value EXC_BADARG. $ cat libc_setjmp4.c #include #include #include #include enum exceptions { EXC_SUCCESS=1, EXC_NOARG, EXC_BADARG }; jmp_buf pg_env; /* FUNCTION: isinteger() DESCRIPTION: check if a string is an integer number PARAMETERS: . s: string to test RETURN: . 0: not an integer . 1: is an integer */ int isinteger(char *s) { char *p; if ( s == NULL ) return 0; /* not digit */
for (p=s; *p != ‘\0’; p++) if ( ! isdigit(*p) ) /* not digit */ return 0; return 1; /* is digit */ } /* FUNCTION: check_args() DESCRIPTION: check if the arguments passed to the program are integers Use longjmp() to similate exceptions: . EXC_NOARG: no argument passed . EXC_BADARGS: one or more arguments are not integers . EXC_SUCCESS: sucessful PARAMETERS: . n: number of arguments . s: list of argument to test RETURN: No return value */ void check_args(int n, char **s) { int i; if (n < 2 ) /* arguments expected */ longjmp(pg_env, EXC_NOARG); for (i=1; i < n ; i++) { if (! isinteger( s[i] ) ) /* argument is not an integer */ longjmp(pg_env, EXC_BADARG); } longjmp( pg_env, EXC_SUCCESS ); } int main(int argc, char *argv[]) { int excep; excep = setjmp(pg_env); /* save current environment */ if ( excep == 0 ) { /* return from setjmp() */ /* Exceptions raised by check_args() */ check_args(argc, argv);
} else { /* return from lonjmp() */ /* Exceptions managed here */ switch (excep) { case EXC_NOARG: printf(“Exception raised EXC_NOARG. Arguments missing\n”); break; case EXC_BADARG: printf(“Exception raised EXC_BADARGS. Arguments must be integers\n”); break; case EXC_SUCCESS: printf(“Processing successful\n”); break; default: printf(“Unknown value from longjmp()\n”); } } } $ gcc -o libc_setjmp4 -std=c99 -pedantic libc_setjmp4.c $ ./libc_setjmp4 Exception raised EXC_NOARG. Arguments missing $ ./libc_setjmp4 ABC Exception raised EXC_BADARGS. Arguments must be integers $ ./libc_setjmp4 123 Processing successful
Before using setjmp()/longjmp(), remember the following rules: o Free allocated memory areas that will no longer be used, flush streams, close streams if required, before calling longjmp(). o The function setjmp() must be in a function that has not terminated at the time longjmp() is called. Otherwise, the behavior depends on the implementation. o Automatic variables should not be altered between setjmp() and longjmp() calls unless they are declared volatile. Otherwise, the behavior is undefined, which means the values held in such variables are not reliable. Accordingly, if you need the values of automatic variables that change between setjmp() and longjmp() calls, declare them as volatile. The following example is not portable because the function f1() that calls setjmp() has terminated at the time longjmp() is called: $ cat libc_setjmp_undef_behafior.c #include #include #include
jmp_buf pg_state; void f2() { printf(” Within function f2(). I call longjmp()\n”); longjmp(pg_state, 1); } void f1() { printf(“Within function f1(). I call setjmp()\n”); if ( ! setjmp(pg_state) ) { printf(“f1(): return from setjmp()\n”); } else { printf(“f1(): return from lonjmp()\n”); } } int main(void) { f1(); /* setjmp() called in the function*/ f2();/* error: longjmp() called in the function*/ return EXIT_SUCCESS; }
XI.16 : wide character handling functions The wctype.h header file declares macros and functions, dealing with wide characters, used for classifying wide characters and for converting to uppercase or lowercase letters. With the exception of the functions iswdigit() and iswxdigit(), all the functions, described below, are affected by the current locale set for the category LC_CTYPE.
XI.16.1 iswspace() #include int iswspace(wint_t wc);
The function iswspace() returns a nonzero value (true) if wc is a whitespace of the current locale. Otherwise, it returns zero (false). If the “C” locale is used, the function returns a nonzero value if wc is a standard
whitespace character that is one of the following characters: space (L’ ‘), horizontal tab (L‘\t’), vertical tab (L‘\v’), newline (L‘\n’), form-feed (L‘\f’) or carriage-return (L‘\r’).
XI.16.2 iswblank() Since C99: #include int iswblank(wint_t wc);
The function iswblank() returns a nonzero value (true) if the wide character wc is a standard blank wide character or a character, pertaining to the character set of the current locale, for which isspace() returns a nonzero value and used as a word-separator. Otherwise, it returns 0 (false). A standard blank wide character is space (L’ ‘) or horizontal tab (L‘\t’). For the “C” locale, it returns a nonzero value (true) if c is a blank character.
XI.16.3 iswdigit() #include int isdigit(wint_t wc);
The function iswdigit() returns a nonzero value (true) if wc is a decimal digit character. Otherwise, it returns 0 (false).
XI.16.4 iswxdigit() #include int iswxdigit(wint_t wc);
The function iswxdigit() returns a nonzero value (true) if wc is a hexadecimal digit character. Otherwise, it returns 0 (false).
XI.16.5 iswcntrl() #include int iswcntrl(wint_t wc);
The function iswcntrl() returns a nonzero value (true) if the wide character wc is a control character. Otherwise, it returns 0 (false).
XI.16.6 iswgraph() #include int iswgraph(wint_t wc);
The function iswgraph() returns a nonzero value (true) if the wide character wc can be printed (iswprint(wc) returns a nonzero value) and is not a whitespace (iswspace(wc) returns 0). Otherwise, it returns 0 (false).
XI.16.7 iswprint() #include int iswprint(wint_t wc);
The function iswprint() returns a nonzero value (true) if the wide character wc can be printed. Otherwise, it returns 0 (false).
XI.16.8 iswpunct() #include int iswpunct(wint_t wc);
The function iswpunct() returns a nonzero value (true) if the wide character wc is used for punctuation for which the function calls iswspace(wc) and iswalnum(wc) return zero. Otherwise, it returns 0 (false).
XI.16.9 iwsupper() #include int iswupper(wint_t wc);
The function iswupper() returns a nonzero value (true) if the wide character wc is an uppercase letter of the basic character set or an uppercase letter of the character set of the current locale. Otherwise, it returns 0 (false).
XI.16.10 iswlower() #include int iswlower(wint_t wc);
The function iswlower() returns a nonzero value (true) if the wide character wc is a lowercase letter of the basic character set or a lowercase letter of the character set of the current locale. Otherwise, it returns 0 (false).
XI.16.11 iswalpha() #include int iswalpha(wint_t wc);
The function iswalpha() returns a nonzero value (true) if wc is a letter of the basic character set or a letter of the character set of the current locale. Otherwise, it returns 0 (false).
XI.16.12 iswalnum() #include int iswalnum(wint_t wc);
The function iswalpha() returns a nonzero value (true) if iswalpha(wc) or iswdigit(wc) returns a nonzero value. Otherwise, it returns 0 (false).
XI.16.13 towlower() #include wint_t towlower(wint_t wc);
The function towlower() converts an uppercase letter to its corresponding lowercase letter. If wc is an uppercase letter, the corresponding uppercase letter is returned. Otherwise, wc is returned with no conversion.
XI.16.14 towupper() #include wint_t towupper(wint_t wc);
The function towupper() converts a lowercase letter to its corresponding uppercase letter. If wc is a lowercase letter, the corresponding uppercase letter is returned. Otherwise, wc is returned with no conversion.
XI.17 XI.17.1 Wide string numeric conversion functions The following functions convert wide strings to numeric values. The equivalent functions processing characters have names starting with str instead of wcs. They have the same behavior. #include double wcstod(const wchar_t * restrict ptr, wchar_t ** restrict endptr); float wcstof(const wchar_t * restrict ptr, wchar_t ** restrict endptr); long double wcstold(const wchar_t * restrict ptr, wchar_t ** restrict endptr); long int wcstol(const wchar_t * restrict ptr, wchar_t ** restrict endptr,int base); long long int wcstoll(const wchar_t * restrict ptr, wchar_t ** restrict endptr,int base); unsigned long int wcstoul(const wchar_t * restrict ptr, wchar_t ** restrict endptr, int base); unsigned long long int wcstoull(const wchar_t * restrict ptr, wchar_t ** restrict endptr, int base);
XI.17.2 Search functions The following functions are search functions working with wide strings. The equivalent functions dealing with characters have names starting with str instead of wcs. They have similar behaviors. #include wchar_t *wcschr(const wchar_t *s, wchar_t c); size_t wcscspn(const wchar_t *s1, const wchar_t *s2); wchar_t *wcspbrk(const wchar_t *s1, const wchar_t *s2); wchar_t *wcsrchr(const wchar_t *s, wchar_t c);
size_t wcsspn(const wchar_t *s1, const wchar_t *s2); wchar_t *wcsstr(const wchar_t *s1, const wchar_t *s2); wchar_t *wcstok(wchar_t * restrict s1, const wchar_t * restrict s2, wchar_t ** restrict ptr); wchar_t *wmemchr(const wchar_t *s, wchar_t c, size_t n);
XI.17.3 Time functions The function wcsftime() is the wide-character version of strftime(): #include #include size_t wcsftime(wchar_t * restrict s, size_t maxsize, const wchar_t * restrict fmt, const struct tm * restrict timeptr);
XI.17.4 Copy, concatenation, converstion, and miscelleanous functions The following functions were studied in Chapter IX. #include size_t wcslen(const wchar_t *s); wchar_t *wmemset(wchar_t *s, wchar_t c, size_t n); wchar_t *wcscpy(wchar_t * restrict tgt, const wchar_t * restrict src); wchar_t *wcsncpy(wchar_t * restrict tgt, const wchar_t * restrict src, size_t n); wchar_t *wmemcpy(wchar_t * restrict tgt, const wchar_t * restrict src, size_t n); wchar_t *wmemmove(wchar_t *tgt, const wchar_t *src, size_t n); wchar_t *wcscat(wchar_t * restrict s1, const wchar_t * restrict s2); wchar_t *wcsncat(wchar_t * restrict s1, const wchar_t * restrict s2, size_t n); int wcscmp(const wchar_t *s1, const wchar_t *s2);
int wcscoll(const wchar_t *s1, const wchar_t *s2); int wcsncmp(const wchar_t *s1, const wchar_t *s2, size_t n); size_t wcsxfrm(wchar_t * restrict s1, const wchar_t * restrict s2, size_t n); int wmemcmp(const wchar_t *s1, const wchar_t *s2, size_t n); wint_t btowc(int c); int wctob(wint_t wc); int mbsinit(const mbstate_t *ps); size_t mbrlen(const char *restrict s, size_t n, mbstate_t *restrict ps); size_t mbrtowc(wchar_t *restrict pwc, const char *restrict mbc, size_t n, mbstate_t *restrict ps); size_t wcrtomb(char * restrict mbc, wchar_t wc, mbstate_t *restrict ps); size_t mbsrtowcs(wchar_t *restrict wcs, const char **restrict mbs, size_t len,mbstate_t * restrict ps); size_t wcsrtombs(char *restrict mbs,const wchar_t **restrict wcs,size_t len,mbstate_t * restrict ps);
CHAPTER XII C11 XII.1 Introduction The C11 standard, also known as ISO/IEC 9899:2011, is the most recent version of the C standard officially published in 2011. The gcc compiler supports the C11 standard as of the version 4.6 but not all features of the standard are supported. For example, the statement _Alignas is not supported by the version 4.6 but as of version 4.7, the type-generic expression feature is supported since version 4.9… C11 brings new features including: o Multi-threading support o Conditional features o New floating-point macros o Specifying and querying the alignment of objects o Static assertions o Anonymous structures and unions o No-return functions o New macros for the complex types o Exclusive access to files o Removal of the function of the notorious fgets() o Bounds-checking functions o Type-generic expressions (generic selection) o Improving of Unicode support For sake of simplicity, and to avoid writing an imposing book, only some of which will be described in the chapter).
XII.2 Generic selection C11 introduces the keyword _Generic that selects an expression amongst a list of associations according to the type of its first argument that is an expression called controlling expression: it is known as a generic selection. A generic selection is an expression taking the following form: _Generic(ctrl_expr, association_list)
Where association_list is a list of associations separated by commas of the form type: expr_type
Or default: expr_default
Where expr_type, ctrl_expr and expr_default are expressions and type is a type name. The expression ctrl_expr is matched against the type of each association in association_list. If a type of an association is compatible with the type of the controlling expression, the expression associated with the type is selected. If no matching type is found, the default expression expr_default is selected. If no default association is provided, there must be a compatible type in an association. A generic selection of the form: _Generic(ctrl_expr, type1:expr1, type2:expr2,…,default:expr_default)
Could be written in pseudo-code like this: type_controlling_expression = type of ctrl_expr if (type_controlling_expression is compatible with type of type1)
select expr1 if (type_controlling_expression is compatible with type of type2)
select expr2 else select exp_default
Generic selections are naturally used with macros as in the following example: $ cat gen_select1.c #include #include void print_float(void) { printf(“float selected\n”); } void print_int(void) { printf(“int selected\n”); } void print_other(void) { printf(“other selected\n”); } #define show_type(t) _Generic( (t), float: print_float(), \ int: print_int(), \
default: print_other() ) int main(void) { show_type(12); // generates printf_int() show_type(12.6F); // generates printf_float() show_type(“Hello”); // generates printf_other() return EXIT_SUCCESS; } $ gcc -o gen_select1 -std=c11 -pedantic gen_select1.c $ ./gen_select1 int selected float selected other selected
The following example displays the type of objects. $ cat gen_select2.c #include #include #include #define gettype(t) _Generic( (t), char: “char”, \ signed char: “signed char”, \ unsigned char: “unsigned char”, \ int: “int”, unsigned int: “unsigned int”, \ long: “long”, unsigned long: “unsigned long”,\ long long: “long long”, \ unsigned long long: “unsigned long long”,\ float: “float”,\ double: “double”,\ default: “other type” ) int main(void) { char c; float f; size_t s; void *q; ptrdiff_t p; printf(“type of char c: %s\n”, gettype(c) ); printf(“type of float f: %s\n”, gettype(f) );
printf(“type of size_t s: %s\n”, gettype(s) ); printf(“type of void *q: %s\n”, gettype(q) ); printf(“type of prtdiff_t p: %s\n”, gettype(p) ); printf(“type of 0: %s\n”, gettype(0) ); printf(“type of 0L: %s\n”, gettype(0L) ); } $ gcc -o gen_select1 -std=c11 -pedantic gen_select1.c $ ./gen_select2 type of char c: char type of float f: float type of size_t s: unsigned long long type of void *q: other type type of prtdiff_t p: long long type of 0: int type of 0L: long
XII.3 Exclusive open mode C11 defines a new open mode x combined with the modes w, wb, w+ and wb+. It changes the original meaning of those modes. If a file already exists, a null pointer is returned. Moreover, it allows an exclusive access to the file: the file is not modified by another program while used. With this mode, you are certain to create a new file. Mode
Meaning
Starting offset
wx
Create a new text file, with non-shared access, for Beginning of the writing. If the file exists, a null pointer is returned. file
wbx
Create a new binary file, with non-shared access, for Beginning of the writing. If the file exists, a null pointer is returned. file
w+x
Create a new text file, with non-shared access, for Beginning of the reading and writing. If the file exists, a null pointer is file returned.
wb+x or w+bx
Create a new binary file, with non-shared access, for Beginning of the reading and writing. If the file exists, a null pointer is file returned. Table XII‑1 C11 new open modes
Why introducing a new open mode? The rationale is, until C11, to check if a file already
exists, a programmer had to open it for reading as in the code snippet below: FILE *pf; char *myfile = “info2.txt”; if ( (pf = fopen(myfile[i], “r”) == NULL ) { pf = fopen(myfile[i], “w”); … }
Such a code contains a security issue: between the first call to fopen() and the second call, there may be an attack. To overcome such an issue, C11 introduces the x open mode.
XII.4 Anonymous unions and structures Before C11, within a structure or union, you could use an anonymous structure or union only if the member of such a type was specifically named as in the following example: $ cat anon_struct1.c #include #include int main(void) { struct myNumber { int type; union { int i; float f; } n; }; struct myNumber x; x.n.i = 123; printf(“%d\n”, x.n.i); return EXIT_SUCCESS; } $ gcc -o anon_struct1 -std=c99 -pedantic -Wall anon_struct1.c $ ./anon_struct1 123
C11 allows inserting anonymous structures and unions inside another structure or union without declaring a member with that type. C11 permits to write the following code,
equivalent to the previous program: $ cat anon_struct2.c #include #include int main(void) { struct myNumber { int type; union { int i; float f; }; }; struct myNumber x; x.i = 123; printf(“%d\n”, x.i); return EXIT_SUCCESS; }
With gcc, if we compile it with the option ––std=c99, we get an error: $ gcc -o anon_struct2 -std=c99 -pedantic anon_struct2.c anon_struct2.c: In function ‘main’: anon_struct2.c:10:8: warning: ISO C99 doesn’t support unnamed structs/unions [-Wpedantic] };
If we compile it with the option ––std=c11, it works: $ gcc -o anon_struct2 -std=c11 -pedantic -Wall anon_struct2.c $ ./anon_struct2 123
XII.5 Static assertion The keyword _Static_assert is similar to assert but unlike the latter, it is not executed at runtime but at compile-time after the preprocessor phase. For example, if your code requires an int to be 4 bytes wide, you could write something like this: $ cat static_assert1.c
#include #include _Static_assert( sizeof(int) == 4, “The program cannot run on this platform. Type 4-byte int is required”); int main(void) { printf(“Static assertion example\n”); printf(“sizeof(int)=%d\n”,sizeof(int)); return EXIT_SUCCESS; }
On a platform working with 4-byte int, the compilation succeeds: $ gcc -o static_assertion1 -std=c11 -pedantic static_assertion1.c $ ./static_assertion1 Static assertion example sizeof(int)=4
Otherwise, the compilation failed with an error message: $ gcc -o static_assertion1 -std=c11 -pedantic static_assertion1.c static_assertion1.c:4:1: error: static assertion failed: “The program cannot run on this platform. Type 4-byte int is required” _Static_assert( sizeof(int) == 4, “The program cannot run on this platform. Type 4-byte int is required”);
The static assertion takes the form: _Static_assert(expr, string);
Where expr is a constant expression and string is a constant string. If expr evaluates to 0 (false), the message string is displayed. Otherwise, nothing happens.
XII.6 No-return functions The new keyword _Noreturn is used with functions to hint the compiler they will never return to their callers, which allows the compiler to perform optimizations. In the following example, the function quit() will never return to its caller: $ cat noreturn.c #include #include _Noreturn quit(void) { printf(“Exiting the program…\n”); exit(EXIT_SUCCESS);
} int main(void) { quit(); return EXIT_SUCCESS; }
In C11, using the _Nonreturn keyword, some functions have new declarations. For example: _Noreturn void abort(void); _Noreturn void exit(int status);
This does not change their behavior.
XII.7 Complex #include double complex CMPLX(double x, double y); float complex CMPLXF(float x, float y); long double complex CMPLXL(long double x, long double y);
Before C11, a complex was defined like this: $ cat complex1.c #include #include #include int main(void) { double complex z = 10.1 + 2.1*I; printf(“Real part: %f\n”, creal(z)); printf(“Imaginary part: %f\n”, cimag(z)); return EXIT_SUCCESS; } $ gcc -o complex1 -std=c99 -pedantic complex1.c $ ./complex1 Real part: 10.100000
Imaginary part: 2.100000
C11 macros CMPLX, CMPLXF and CMPLXL, defined in the header file complex.h, let you define a complex in another way. The previous example is equivalent to the following: $ cat complex2.c #include #include #include int main(void) { double complex z = CMPLX(10.1, 2.1); printf(“Real part: %f\n”, creal(z)); printf(“Imaginary part: %f\n”, cimag(z)); return EXIT_SUCCESS; } $ ./complex2 Real part: 10.100000 Imaginary part: 2.100000
The gcc compiler does not define the macros CMPLX, CMPLXF and CMPLXL. Instead, it defines the macro __builtin_complex that is equivalent to CMPLXF.
XII.8 Alignment The new keyword _Alignof of the C11 standard returns the alignment of a type: _Alignof(type)
The following example displays the alignment requirements for int, double and void*: $ cat Align1.c #include #include int main(void) { printf(“sizeof(int):%d Alignment=%d\n”, sizeof(int), _Alignof(int)); printf(“sizeof(float):%d Alignment=%d\n”, sizeof(float), _Alignof(float)); printf(“sizeof(void *):%d Alignment=%d\n”, sizeof(void *), _Alignof(void *)); return EXIT_SUCCESS; } $ gcc -o align1 -std=c11 -pedantic align1.c
$ ./align1 sizeof(int):4 Alignment=4 sizeof(float):4 Alignment=4 sizeof(void *):4 Alignment=4
If you include the header file stdalign.h, you can use the macro alignof that expands to _Alignof. The C11 standard goes further; it allows specifying alignment constraints for an object or a member of a structure or union. Changing the alignment requirement for an object takes the following form: _Alignas(expr) obj_decl
Where expr is a constant expression and obj_decl is a declaration of an object. The specified alignment must be supported by the compiler. It has a second form: _Alignas(align_type) obj_decl
In this form, the object is declared with the same alignment as the type align_type. The following example changes the alignment of the member i of the structures str1 and str2: $ cat Align2.c #include #include int main(void) { struct str1 { char c; // 1 byte _Alignas(8) int i; // Aligned on a 8-byte boundary }; struct str2 { char c; // 1 byte int i; // Aligned on a 4-byte boundary, on this computer }; printf(“sizeof(str1):%d Alignment=%d\n”, sizeof(struct str1), _Alignof(struct str1)); printf(“sizeof(str2):%d Alignment=%d\n”, sizeof(struct str2), _Alignof(struct str2));
return EXIT_SUCCESS; } $ gcc -o align2 -std=c11 -pedantic align2.c $ ./align2 sizeof(str1):16 Alignment=8 sizeof(str2):8 Alignment=4
If you include the header file stdalign.h, you can use the macro alignas that expands to _Alignas. C11 specifies the function aligned_alloc() that allocates memory aligned with a specific alignment: void *aligned_alloc(size_t align, size_t size);
The requested alignment align must be supported by the compiler. The requested size of the piece of memory to be allocated must be a multiple of align.
XII.9 Bounds-checking functions C11 defines in Annex K several alternatives to traditional functions handling arrays receiving data. Before C11, functions expected that programmers provide arrays whose sizes were large enough to hold characters transmitted by functions. Thus, traditionally, C functions did not check bounds of the given arrays, which caused a great deal of bugs tricky to detect when more characters than expected by programmers were copied into arrays. Not only this caused programs to crash, but also worse, it raised security vulnerabilities allowing malicious code to be executed. To overcome such issues, C11 provides in addition to traditional functions, new functions (checking the bounds of arrays) that never write beyond the capacity of arrays. Below, we list the new functions without describing them, letting you find them out by yourself. errno_t tmpfile_s(FILE * restrict * restrict ptream); errno_t tmpnam_s(char *s, rsize_t maxsize); errno_t fopen_s(FILE * restrict * restrict ptream, const char * restrict filename, const char * restrict mode); errno_t freopen_s(FILE * restrict * restrict npstream, const char * restrict filename, const char * restrict mode, FILE * restrict stream); int fprintf_s(FILE * restrict stream, const char * restrict fmt, …); int fscanf_s(FILE * restrict stream, const char * restrict fmt, …);
int printf_s(const char * restrict fmt, …); int scanf_s(const char * restrict fmt, …); int snprintf_s(char * restrict s, rsize_t n, const char * restrict fmt, …); int sprintf_s(char * restrict s, rsize_t n, const char * restrict fmt, …); int sscanf_s(const char * restrict s, const char * restrict fmt, …); int vfprintf_s(FILE * restrict stream, const char * restrict fmt,va_list arg); int vfscanf_s(FILE * restrict stream, const char * restrict fmt, va_list arg); int vprintf_s(const char * restrict fmt, va_list arg); int vscanf_s(const char * restrict fmt, va_list arg); int vsnprintf_s(char * restrict s, rsize_t n, const char * restrict fmt, va_list arg); int vsprintf_s(char * restrict s, rsize_t n, const char * restrict fmt, va_list arg); int vsscanf_s(const char * restrict s, const char * restrict fmt, va_list arg); char *gets_s(char *s, rsize_t n); void abort_handler_s(const char * restrict msg, void * restrict ptr, errno_t error); void ignore_handler_s(const char * restrict msg, void * restrict ptr, errno_t error); errno_t getenv_s(size_t * restrict len,char * restrict value, rsize_t maxsize, const char * restrict name); void *bsearch_s(const void *key, const void *base, rsize_t nmemb, rsize_t size, int (*compar)(const void *k, const void *y, void *context), void *context); errno_t qsort_s(void *base, rsize_t nmemb, rsize_t size, int (*compar)(const void *x, const void *y, void *context), void *context); errno_t wctomb_s(int * restrict status, char * restrict s, rsize_t smax, wchar_t wc); errno_t mbstowcs_s(size_t * restrict retval, wchar_t * restrict dst, rsize_t dstmax, const char * restrict src, rsize_t len); errno_t wcstombs_s(size_t * restrict retval, char * restrict dst, size_t dstmax, const wchar_t * restrict src, rsize_t len); errno_t memcpy_s(void * restrict s1, rsize_t s1max, const void * restrict s2, rsize_t n);
errno_t memmove_s(void *s1, rsize_t s1max, const void *s2, rsize_t n); errno_t strcpy_s(char * restrict s1, rsize_t s1max, const char * restrict s2); errno_t strncpy_s(char * restrict s1, rsize_t s1max, const char * restrict s2, rsize_t n); errno_t strcat_s(char * restrict s1, rsize_t s1max, const char * restrict s2); errno_t strncat_s(char * restrict s1, rsize_t s1max, const char * restrict s2, rsize_t n); char *strtok_s(char * restrict s1, rsize_t * restrict s1max, const char * restrict s2, char ** restrict ptr); errno_t memset_s(void *s, rsize_t smax, int c, rsize_t n); errno_t strerror_s(char *s, rsize_t maxsize, errno_t errnum); size_t strerrorlen_s(errno_t errnum); size_t strnlen_s(const char *s, size_t maxsize); errno_t asctime_s(char *s, rsize_t maxsize, const struct tm *ptime); errno_t ctime_s(char *s, rsize_t maxsize,const time_t *ptimer); struct tm *gmtime_s(const time_t * restrict ptimer, struct tm * restrict result); struct tm *localtime_s(const time_t * restrict ptimer, struct tm * restrict result); int fwprintf_s(FILE * restrict stream, const wchar_t * restrict fmt, …); int fwscanf_s(FILE * restrict stream, const wchar_t * restrict fmt, …); int snwprintf_s(wchar_t * restrict s, rsize_t n, const wchar_t * restrict fmt, …); int swprintf_s(wchar_t * restrict s, rsize_t n, const wchar_t * restrict fmt, …); int swscanf_s(const wchar_t * restrict s, const wchar_t * restrict fmt, …); int vfwprintf_s(FILE * restrict stream, const wchar_t * restrict fmt, va_list arg); int vfwscanf_s(FILE * restrict stream, const wchar_t * restrict fmt, va_list arg); int vsnwprintf_s(wchar_t * restrict s, rsize_t n, const wchar_t * restrict fmt, va_list arg);
int vswprintf_s(wchar_t * restrict s, rsize_t n, const wchar_t * restrict fmt, va_list arg); int vswscanf_s(const wchar_t * restrict s, const wchar_t * restrict fmt, va_list arg); int vwprintf_s(const wchar_t * restrict fmt, va_list arg); int vwscanf_s(const wchar_t * restrict fmt, va_list arg); int wprintf_s(const wchar_t * restrict fmt, …); int wscanf_s(const wchar_t * restrict fmt, …); errno_t wcscpy_s(wchar_t * restrict s1, rsize_t s1max, const wchar_t * restrict s2); errno_t wcsncpy_s(wchar_t * restrict s1, rsize_t s1max, const wchar_t * restrict s2, rsize_t n); errno_t wmemcpy_s(wchar_t * restrict s1, rsize_t s1max, const wchar_t * restrict s2, rsize_t n); errno_t wmemmove_s(wchar_t *s1, rsize_t s1max, const wchar_t *s2, rsize_t n); errno_t wcscat_s(wchar_t * restrict s1, rsize_t s1max, const wchar_t * restrict s2); errno_t wcsncat_s(wchar_t * restrict s1, rsize_t s1max, const wchar_t * restrict s2, rsize_t n); wchar_t *wcstok_s(wchar_t * restrict s1, rsize_t * restrict s1max, const wchar_t * restrict s2, wchar_t ** restrict ptr); size_t wcsnlen_s(const wchar_t *s, size_t maxsize); errno_t wcrtomb_s(size_t * restrict retval, char * restrict s, rsize_t smax, wchar_t wc, mbstate_t * restrict ps); errno_t mbsrtowcs_s(size_t * restrict retval, wchar_t * restrict dst, rsize_t dstmax, const char ** restrict src, rsize_t len, mbstate_t * restrict ps); errno_t wcsrtombs_s(size_t * restrict retval, char * restrict dst, rsize_t dstmax, const wchar_t ** restrict src, rsize_t len, mbstate_t * restrict ps);
PART II TOOLS
CHAPTER XIII COMPILING C PROGRAMS XIII.1 Introduction A programming language is a set of symbols, keywords and rules representing actions that programmers would like the computer to perform. The problem is that processors can only understand and execute machine language, which depends on hardware. Machine language or machine code is a series of 0 and 1 (each processor has its own machine language). Although in theory, programmers could read and write machine code, they could not write complex programs using such a language. Consequently, programming languages were created to ease programming. The first step toward readable code was the assembly language (second-generation language). Instead of manipulating zeroes and ones, programmers worked with names. For example, machine instructions were given names such as load and move in place of a series of 0 and 1 representing them. Assemblers were used to translate assembly language into binary code (machine code). Each processor had its own assembly language, meaning that assembly language was hardware-dependent. Programmers had to write new programs for each processor architecture. More generally, assembly languages had major drawbacks: o They were not suitable for complex programs o Programmers required in-depth knowledge of the underlying hardware o They were processor-dependent; that is, assembly code was not portable. A language is said to be portable if it can run in several different systems To cope with all those constraints, high-level languages closer to the human language were developed. Besides understanding them easily and learning them quickly, programmers could write, debug and modify their programs in a more convenient way. Because processors could not execute them directly, compilers were also designed to convert them into machine code. Today, most programmers employ high-level languages. They work with a text editor to
write source modules and then a compiler driver to generate binary code (i.e. machine [98] language) that the system can execute. Some software applications (known as IDE ) offer a complete programming environment including text editor, compiler driver, Makefile… The aim of this chapter is to explain the major concepts behind the compilation notions. Throughout the chapter, we will use gcc to perform all compilation steps. This chapter is intended to programmers wishing to learn how to compile C programs in the UNIX and UNIX-like operating systems. Even though, each UNIX variant has its own compiler, we propose here an introduction to the topic by using the very popular GNU compiler gcc. You can download, install and use it freely provided you respect the terms of the GNU license called GPL.
XIII.2 Compilation Phases
Figure XIII‑1 Compilation Phases
The compilation process translates high-level code, such as a C program, into machine code. It can be broken down into four main stages that we are going to discover throughout the chapter. First, let us briefly describe them: o Preprocessing: the preprocessor performs substitutions, and inclusions of files in source files. o Compilation: it consists of:
▪ Lexical analysis (also called scanning and tokenizing): the compiler splits the source files into language units called tokens. It ensures that the “vocabulary” used in the source files is correct. For example, it recognizes reserved words, variables names, symbols, constants, and so on. ▪ Syntax analysis (also called parsing): the compiler checks the syntax of the source files. It checks the way in which the tokens have been combined to form valid statements. It checks whether the “grammar” is correct. ▪ Semantic analysis: ensures that the statements are meaningful. For example, it checks the types of variables and functions and binds each function name and external variable identifier with their definition. ▪ Intermediate code generation ▪ Optimization: optimizes the code generated during the previous phase. ▪ Assembly code generation: generates assembly code o Assembly: the assembler builds object modules from assembly language modules by translating assembly language into binary code o Linking: the link-editor (also called linker) builds executables from object modules. The word link-editor is more explicit than linker is. It avoids confusion between the linking at compile time by the link-editor and the dynamic linking performed by the dynamic linker (known as a loader). The link-editor builds executable files and shared libraries, while the loader place executables and shared libraries into memory and transfers control to the program.
XIII.3 Preprocessing The C preprocessor, known as cpp, replaces every macro it finds by its value. A macro is an alias for a sequence of characters. Two kinds of macros are available, predefined and user-defined macros. Predefined macros can be used within a source file without needing to define them. User defined macros are set by the directive #define within a source or header file: #define macro_name macro_text
When the preprocessor encounters such a line, it knows that each time it will find the word macro_name, it is has to replace it with the character sequence macro_text. The preprocessor performs replacement of macro names with their values, inclusions of file contents and other basic tasks. The C preprocessor has only a few directives (statements of a preprocessor) telling it what to do. A C preprocessor directive begins with the number sign # and terminates at the end of the line. This means that a directive must be held on a single line. However, to ease the readability, programmers may have to break it into multiple lines. In this case, before hitting the key, they just have to precede it with the backslash character (\) so that the newline loses its special meaning.
The C preprocessing has other interesting directives, conditional directives: #ifndef, #if, #else, and #elif that we will also talk about. In the following examples, we will use the GNU C preprocessor provided with gcc: it is invoked with the –E option of gcc. In the following sections, we review what we learned about the C preprocessor.
Unlike the C language, the C preprocessor directives does not end with the semi-colon but the newline character (generated by key).
XIII.3.1 Comments Lines starting with // or unclosed between /* and */ are ignored: $ cat comment.c int main() { /* This comment is Ignored */ print(“The preceding comment is ignored”); return 0; } $ gcc –E comment.c int main() { print(“The preceding comment is ignored”); return 0; }
XIII.3.2 Macro substitutions In the following C program, we create the macro MSG_TEXT that will be replaced by its value when encountered: $ cat preproc_1.c #define MSG_TEXT “This is my first macro” int main(int argc, char **argv) {
printf(“MSG_TEXT=%s\n”, MSG_TEXT); return 0; } $ gcc –E preproc_1.c … int main(int argc, char **argv) { printf(“MSG_TEXT=%s\n”, “This is my first macro”); return 0; }
You can notice that macro expansion does occur when a macro is enclosed between double-quotes (or single-quotes). You define macros with arguments as shown below: $ cat preproc_2.c #define display_message(msg) printf(“%s\n”, msg) int main(int argc, char **argv) { display_message(“This line is altered by the preprocessor”); return 0; }
The #define directive allows the programmer to create the macro display_message whose argument is enclosed between parentheses. The preprocessor will replace it by its definition as shown below: $ gcc -E preproc_2.c int main(int argc, char **argv) { printf(“%s\n”, “This line is altered by the preprocessor”); return 0; }
XIII.3.3 File Inclusion Consider the following file: $ cat h.h const int MAX = 512;
In the following C program the file h.h is inserted by using the directive #include: $ cat preproc_3.c #include “h.h” int main(int argc, char **argv) { printf(“%s\n”, “This line is altered by the preprocessor”); return 0; }
The preprocessor will produce: $ gcc -E preproc_3.c const int MAX = 512; int main(int argc, char **argv) { printf(“%s\n”, “This line is altered by the preprocessor”); return 0; }
XIII.3.4 Predefined macros The standard C has several predefined macros. You just have to call them by their name. For example: o __FILE__ expands to the name of the file that contains it o __LINE__ expands to the number of line in which it appears For example: $ cat predef_mac.c int main(int argc, char **argv) { printf(“%s %d\n”, __FILE__, __LINE__); return 0; } $ gcc -E predef_mac.c … int main(int argc, char **argv) { printf(“%s %d\n”, “predef_mac.c”, 2); return 0; }
XIII.3.5 Conditional Directives The directive #ifdef checks if a macro has been defined. If so, it outputs all the text between the directives #ifdef and #endif within the source file. $ cat cond_dir1.c #define ADD_INFO 1 int main(int argc, char **argv) { #ifdef ADD_INFO printf(“%s %d\n”, __FILE__, __LINE__); #endif return 0; } $ gcc -E cond_dir1.c … int main(int argc, char **argv) { printf(“%s %d\n”, “cond_dir1.c“, 5); }
Now, if you remove the line defining the macro, the call to printf() disappears: $ cat cond_dir2.c int main(int argc, char **argv) { #ifdef ADD_INFO printf(“%s %d\n”, __FILE__, __LINE__); #endif return 0; } $ gcc -E cond_dir2.c … int main(int argc, char **argv) { return 0; }
You could also define a macro on the command line with the option -D: $ cat cond_dir2.c int main(int argc, char **argv) { #ifdef ADD_INFO printf(“%s %d\n”, __FILE__, __LINE__); #endif return 0;
} $ gcc -E cond_dir2.c –D ADD_INFO=1 … int main(int argc, char **argv) { printf(“%s %d\n”, “cond_dir2.c”, 3); return 0; }
Likewise, the directive #ifndef checks if a macro has been defined. If not so, it outputs all the text between the directives #ifdef and #endif within the source file: $ cat cond_dir2.c int main(int argc, char **argv) { #ifndef ADD_INFO printf(“%s %d\n”, __FILE__, __LINE__); #endif return 0; } $ gcc -E cond_dir2.c –D ADD_INFO=1 … int main(int argc, char **argv) { return 0; } $ gcc -E cond_dir.c … int main(int argc, char **argv) { printf(“%s %d\n”, “cond_dir2.c”, 3); return 0; }
The C preprocessor also allows you to add an alternative text bloc: #ifdef macro_name text1 #else text2
#endif
If the macro macro_name is defined, the text block text1 is output. Otherwise, the alternative text block text2 is generated. For example: $ cat cond_dir3.c int main(int argc, char **argv) { #ifdef ADD_INFO printf(“%s %d\n”, __FILE__, __LINE__); #else printf(“The macro ADD_INFO is undefined\n”); #endif return 0; } $ gcc -E cond_dir3.c … int main(int argc, char **argv) { printf(“The macro ADD_INFO is undefined\n”); return 0; }
XIII.4 Lexical analysis The lexical analysis step breaks the program into recognized words called tokens. The processing is also known as tokenization. The set of tokens can be viewed as the vocabulary of the programming language. In a programming language, statements, describing the actions that the processor has to execute, are composed of tokens in the same way as the words of vocabulary form sentences. For example, in a C program, the compiler will break the statement int I = 200; into five tokens: o First token: the reserved word int o Second token: the identifier I o The third token: the symbol = o The fourth token: the integer constant 200 o The fifth token: the symbol semicolon ; Each token has a particular meaning and is specific to the programming language used. As a comparison with the English language, the sentence a duck is an animal is comprised of five tokens: a, duck, is, an and animal. All these tokens are meaningful to the English language.
The tokens of a programming language are typically composed of: o keywords such as if, else, while, int, float… o Symbols such as ?, =, +, -, /, %, ++, %, *, (, ) {, }… o Identifiers such as variable names and function names
XIII.5 Syntax analysis Each programming language has a grammar that is a set of rules defining permitted combination of tokens: it defines the syntax of the language. The syntax analysis consists in checking if a sequence of tokens forming statements is grammatically correct. For example, in C language the statement if x > 6 is not grammatically correct while if (x>6) x++ has a valid syntax. As a comparison with the English language, the sentence a is animal duck an is not an English sentence because it does not follow the English grammar.
XIII.6 Semantic analysis In English language, the sentence a tape recorder is made up of flowers, buildings and cows is grammatically correct but has little meaning. In the same manner, in programming languages, the compiler has to check the meaning of the statements. For example, if p is a pointer to void and f() a function the statement p = f is wrong even though the syntax is valid. The compiler will generate an error. This can be done because at this stage, every function and variable identifier is associated with its definition.
XIII.7 Assembly code At the end of the compiler step, assembly code is generated. The compiler translates a high-level language, independent of the hardware, such as the C language, into an assembly language that is processor-dependent. That is, each type of processor has its own assembly language. Thus, the C compiler will produce different assembly code on different machines even from the same C program. The assembly language is just a more convenient representation of the processor language: names are used to refer to machine instructions and registers. It relieves programmers from remembering the binary representations of processor operations (called opcodes) and registers. It is easier to work with names than a series of bits. Each source file, written in a high-level language, is translated into an assembly code file that will be in turn translated into machine code by the assembler.
XIII.8 Assembly The role of the assembler is to translate assembly code into machine code. Programmers can directly write an assembly program but it is not portable: a new program has to be written for each processor type. Instead, it is better to write a program in a high-level language and then compile it for the target processors.
XIII.9 Linking Programmers often write programs composed of several source files. Each source file will be considered by the preprocessor, compiler and then assembler to generate an object file. An object file, containing machine code, cannot be executed in that format: it can only be used as a basic brick to build executables. Object files will be combined by the link-editor to produce an executable than can then be run. Thus, the object files can be reused to generate different executables. For example, suppose you have written, in the file average.c, a function that calculates the average of a series of numbers provided as arguments. You can compile it to generate an object file (average.o) and then use it in several programs without recompiling it or rewriting the function. You just need to link it with other object files as in the example below that compiles three executables based on the object file average.o: $ gcc -o list_users average.o users.o $ gcc -o proc_stat average.o process.o $ gcc -o stat_file average.o file.o
In the example, the executables list_users, proc_stat and stat_file use the object file average.o.
XIII.10 Compilers and Interpreters
Figure XIII‑2 Interpreter
An interpreter is a program that reads statements from files, called scripts, and executes them directly as they appear. It does not generate executable files. For example, the Bourne shell and awk are interpreters.
Figure XIII‑3 Compiler
The interpreter performs the following stages: o Lexical analysis o Syntax analysis o Semantic analysis o Intermediate code o Execution The compilation process works differently. It translates input files, known as source files,
written in a high-level language into object files (also called target files or target modules) that contain machine code. Then, it combines object files to produce executable files that the operating system can execute as depicted in Figure XIII‑3. The compilation process generates machine code that can be used: o On the machine on which it was produced (called native compilation) o On other machines (called cross-compilation) Since an executable contains machine code, it is obviously faster than a script. However, it is faster to write a script than a compiled program. If you alter source files, you must recompile your program to reflect the changes. The following sections explain how to compile C programs. Some languages use both compilers and interpreters as shown in Figure XIII‑4. They produce an intermediate code executed by a program called virtual machine. It is faster than a script. Intermediate code is very useful, as it features both the advantages of compiling and interpreting: it is fast and independent of the hardware, thereby increasing the portability of applications. Some languages, such as java, compile source code into intermediate code and optimize it before interpreting it.
Figure XIII‑4 Virtual Machine
XIII.11 Compiler Driver
Figure XIII‑5 Gcc steps
A compiler driver is a program that controls the different compilation phases. Users generally do not invoke individually the preprocessor, compiler, optimizer, and assembler. Instead, they tell the compiler driver to generate object modules and then they link them to produce executable files. Thus, the compiler driver relieves the user of having to perform all compilation phases separately. Compiler drivers have options enabling the preprocessor, compiler, optimizer, assembler and linker to be invoked on an individual basis. The compiler driver depends on the programming language used.
XIII.12 Compiling C Programs Throughout this chapter, we will use the GNU gcc as a compiler driver. Irrespective of the compiler used, the same concepts are involved and only the utility options change. The gcc utility performs all compilation stage (see Figure XIII‑5) as described in the following sections.
XIII.13 GNU gcc
First, let us analyze the general syntax of the command gcc (compiler driver): gcc [-o output_file] [options] source_files
The –o option lets you name the output file. The suffix of the source files determines how gcc is to process them. Several suffixes are recognized, but in this chapter, only the following are dealt with: o Source_file.c: C program o Source_file.i: preprocessed C code o Source_file.s: assembler code o Source_file.o: object code If the output_file argument is not provided, gcc will name the output file according to the suffix of the input file.
XIII.13.1 Preprocessor (cpp) You can invoke the cpp preprocessor by specifying the –E option as follows: gcc [-o output_file] [options] –E source_file
Where source_file is a file containing a C program with the .c suffix. When you invoke gcc with the –E option, it stops after the preprocessing stage. If you omit the –o option, the output appears on the screen. Traditionally, preprocessed files have the .i suffix. For example: $ gcc -o preproc_1.i -E preproc_1.c $ cat preproc_1.i … int main(int argc, char **argv) { printf(“MSG_TEXT=%s\n”, “This is my first macro”); }
XIII.13.2 Compiler (cc1) The second compilation stage consists of translating C code into assembly code. To tell gcc to perform all stages up to the assembly code, use the following syntax: gcc [-o output_file] [options] –S source_file
Where source_file is a file containing C code with the suffix .c or a preprocessed file with the .i suffix. It generates an assembly code file. If output_file is not supplied, it generates an output file with the same name as source_file but with the .s suffix. For example, gcc –S main.c
will produce a file called main.s. The –S option invokes the cpp preprocessor followed by the C compiler called cc1 if source_file is a C file. Otherwise, if source_file is a preprocessed file, only the C compiler is invoked. For example: $ gcc -o preproc_1.S -S preproc_1.c $ cat preproc_1.S .file “preproc_1.c” .section .rodata …
Which is equivalent to: $ gcc -o preproc_1.S -S preproc_1.i
XIII.13.3 Optimizer This optional stage consists of optimizing the code generated by the compiler cc1. To tell gcc to perform all stages up to this point, use the following syntax: gcc [-o output_file] [options] -O -S source_module
The –O option attempts to reduce the code size and execution time. It produces assembly code.
XIII.13.4 Assembler (as) The assembly stage consists of translating assembly code into machine code. To tell gcc to perform all stages up to the generation of machine code, use the following syntax: gcc [-o output_file] [options] –c source_file
Where source_file is one of the following: o A C program with the .c suffix. In this case, the preprocessor (cpp), the compiler (cc1) and the assembler (as) are invoked in sequence. o A preprocessed file with the .i suffix. In this case, the compiler (cc1) and the assembler (as) are invoked in sequence. o An assembler program with the .s suffix. In this case, only the assembler (as) is invoked. It produces target object files. If output_file is omitted, it generates a file with the same name as the input file, but with the .o suffix. For example, gcc –c main.c will produce a file called
main.o.
By default, the optimizer does not execute. You can tell gcc to perform the optimization stage as follows: gcc [-o output_file] [options] -O -c source_module
The assembler generates object files (with the .o extension) containing the binary code. It also adds information in all object files that the link-editor will extract to generate executables. The following three commands produce the same object file preproc_1.o: $ gcc -o preproc_1.o -c preproc_1.c $ gcc -o preproc_1.o -c preproc_1.S $ gcc -o preproc_1.o -c preproc_1.i
XIII.13.5 Link-Editor (collect2/ld) The last compilation stage consists of producing an executable file from the object files spawn by the assembler as. To tell gcc to combine object files and thereby produce an executable, use the following syntax: gcc [-o output_file] [options] input_file_list
Where input_file_list is a list of object files (with the .o suffix) separated by blanks. The gcc utility will invoke the GNU link-editor called collect2 to combine them to produce an executable. If output_file is omitted, the executable will be named a.out. As explained earlier, the utility gcc can generate an executable from preprocessed files, C code files or assembly code files. Input_file_list may actually be a list of input files separated by blanks having the .i, .s or .c suffix. That is, the actual compilation commands that will be invoked (preprocessor, compiler…) depend on the extension of the source files: o If the source files have the .c suffix, the preprocessor, compiler, assembler and linker will be invoked in sequence o If the source files have the .i suffix, the compiler, assembler and linker will be invoked in sequence o If the source files have the .s suffix, the assembler and linker will be invoked in sequence o If the source files have the .o suffix, only the linker will be invoked
If you execute the command gcc with no option, it will invoke the cpp preprocessor, the cc1 C compiler, the as assembler and then the collect2 link-editor that is a wrapper for the system link-editor ld. In general, when you compile your source files, you do not tell gcc to produce preprocessor code and assembler code, but only object files and executables. The following commands are four ways to produce the same executable preproc_1: $ gcc -o preproc_1 preproc_1.o $ gcc -o preproc_1 preproc_1.c $ gcc -o preproc_1 preproc_1.S $ gcc -o preproc_1 preproc_1.i
The binary file generated by the link-editor can be executed: $ ./preproc_1 This is my first macro
XIII.14 Writing Source Files Consider the following C program: $ cat main.c #include float avg(float x, float y) { return ( (x + y)/2 ); } float square(float x) { return ( x * x ); } int main(void) { float z = 1.2; float w = 3.4; printf(“avg(%g,%g)=%g”, z, w, avg(z,w)); return 0; }
Source files are text files written in C language with the .c suffix. If we call prog the executable that we wish to build, the main.c source file is compiled as follows: $ gcc -o prog -std=c99 -pedantic main.c $ ./prog avg(1.2,3.4)=2.3
Writing an entire C program in one file imposes various limitations: o It is very difficult for several programmers to work together on the same project o Maintaining a small source file is quite easy, but it is not really possible when it contains several thousands of lines o If you wish to reuse functions in another project, you have to copy their definitions and then insert them into your source files. This method is prone to errors and does not constitute a good way to manage a project. For this reason, programmers prefer modular programming: C code is split into several files called modules. This approach provides the following benefits: o Source modules can be developed and tested separately. This allows several programmers to work together. o It facilitates the maintenance, which means programmers easily alter and test their programs. o Modules can be reused. o It allows separate compilation. o It provides a better design for building programs: the encapsulation technique can be used.
XIII.14.1 Modules Programmers break large programs into several units more maintainable called source files (with the .c extension). Related functions are put into the same source file. Remember that source files contain the code written by programmers while objet files are generated by the compiler from source files. Both contain the same information but expressed in two different languages: one understandable for human beings and the other one for the computer. Modular programming allows sharing object files without providing the source files. Instead, Programmers may supply only header files and object files. This means that you do not require the source files developed by someone else to use other functions, you just need to be supplied the object file implementing them and the header files stating the declarations.
A module consists of a header file acting as an interface and a file implementing the “services” declared by the interface. A source module is then composed of a header file and a source file. Likewise, an object module is composed of the same header file and an object file generated by the compiler from the source file. For example, if you write a C source file that calls a function defined in another module that someone else has written, you simply include the header file in your source file and then specify the object module name at linking stage. You do not need to know how a function is implemented but only the arguments that you have to pass to it along with the value it returns as specified by the prototype in the header file. As we learned it in the book, this also infers that the implementation can be hidden. Users do not need to know how objects are actually designed, they may have only access to the public information in the header files: the technique is known as an encapsulation. For us, throughout the chapter, unless otherwise expressed, the word module will be a synonym for file. Thus, the word module with no qualifier means both object module and source module. In the context, both are valid. Now, suppose that you wish to put the avg() and square() functions in a separate file called calc.c . The source file calc.c contains the definitions of the avg() and square() functions: $ cat calc.c #include “calc.h” float avg(float x, float y) { return ( (x + y)/2 ); } float square(float x) { return ( x * x ); }
The very first line integrates the header file calc.h into calc.c to avoid any mismatches between the declarations in the header file and the definitions in the source file. The header file calc.h, contains the prototypes of the avg() and square() functions defined in calc.c: $ cat calc.h #ifndef __CALC_H__ #define __CALC_H__ extern float avg(float , float); extern float square(float); #endif /* __CALC_H__ */
The header files, ending with the .h suffix by convention, contain the declarations of global identifiers (sharable between modules). To tell the preprocessor to include header files in source files, C programmers put the preprocessor directive #include. To prevent header files from being included several times, programmers use the #ifndef, #define and #endif directives. Therefore, the preprocessor will only include the header file once. A header file looks like this: #ifndef NAME #define NAME Declarations #endif
In order to create an executable, a single module, the main module, must define the function main(). The system will give control to the program by calling the function main(). The main source file, containing the main() function that calls the function avg(), could be written as follows: $ cat main.c #include #include “calc.h” int main(void) { float z = 1.2; float w = 3.4; printf(“avg(%g,%g)=%g\n”, z, w, avg(z,w)); return 0; }
This is equivalent to the following code: $ cat main.c #include external float avg(float , float); external float square(float); int main(void) { float z = 1.2; float w = 3.4; printf(“avg(%g,%g)=%g\n”, z, w, avg(z,w)); }
Every external identifier must be declared before being used. Since the function avg() (defined in the module calc.c) is referenced in the main source file main.c, you have to inform the compiler of the function prototype, so that type checking can be performed. Instead of inserting explicitly the function prototype float avg(float, float) in the source file, programmers would use the preprocessor directive #include “calc.h” containing it. The following executable prog is built from the source files calc.c and main.c: $ gcc -o prog -std=c99 -pedantic main.c calc.c $ ./prog avg(1.2,3.4)=2.3
Building an executable this way works perfectly but if you alter a source file, you have to recompile all the source files. Compiling two small source files does not take a long time, but if you have to compile a great number of source files, it may take hours. Separate compilation, described in section XIII.16, overcomes this issue: each source file is compiled independently.
XIII.15 Header Files In modular programming, programmers develop several source files that are compiled individually. An identifier having external linkage, defined in a source file (and then in an object file), can be shared with other modules that can reference it without defining it (they declare it with the storage-class specifier extern). Such an identifier is commonly declared in a header file (for example, the variable errno). Header files are used in modular programming as interfaces to modules. Typically, header files contain: o Structures and unions. For example: struct string { char *s; int len; };
o Function prototype. For example: float avg(float, float);
o Typedef names. For example: typedef string string;
o Global variables. For example:
extern int max_retry;
o Macros. For example: #define ABS(x) ( (x) > 0 ? (x):-(x) )
Thus, declarations stored in header files are separated from their implementations, located in source files. Each source file should be accompanied with its header file. There are two kinds of header file: o Standard header files, such as stdio.h, provided by the system or the compiler. On UNIX and UNIX-like system, they are usually located in the /usr/include and /usr/include/sys directories. o User-defined header files You have two ways to include header files in your source files: o The header file is surrounded by quotation marks: #include “filename”
▪ When you compile source files containing a line like this, the compiler will look for filename in the directories listed below in sequential order: ▪ Current directory ▪ Directory list specified by the compiler’s –I option ▪ Default search directories (usually /usr/include) Programmers tend to use this method to include header files, because the working directory is also searched for header files during the compilation phase. For example: ▪ #include “calc.h” ▪ #include “../include/calc.h”
o The header file is enclosed between < and >: #include
When you compile source files containing a line like this, the compiler will look for filename in the directories listed below in the following order: ▪ Directory list specified by the compiler’s –I option ▪ Default search directories (typically /usr/include)
Programmers tend to use the latter method to include standard header files. You can employ the gcc –I option to add a directory to the list of directories that will be searched for header files: gcc –c source_file_list –Iinc_dir1 –Iinc_dir2…
Where: o source_file_list is the list of source files (with the .c suffix) separated by blanks o inc_dir1, inc_dir2… are the directories that will be searched for the header files included in the source files. For example: $ gcc -std=c99 -pedantic -c main.c calc.c -I../include
XIII.16 Separate compilation Separate compilation consists in compiling source files individually, which produces one object file per source file. In our example, we have two source files, main.c and calc.c. First, we compile them to produce object files and then we will invoke the link-editor to combine them and generate a binary file as explained below: o Step 1. Building object files: The following example builds the main.o and calc.o object files from the main.c and calc.c source files: $ gcc -std=c99 -pedantic -c main.c $ gcc -std=c99 -pedantic -c calc.c
o Step 2. Linking: After building the object modules main.o and calc.o, you can tell gcc to combine them to generate the executable file called prog as follows: $ gcc -std=c99 -pedantic -o prog main.o calc.o
Finally, you can execute it: $ ./prog
Now, suppose you alter the main.c file as follows: $ cat main.c #include #include “calc.h”
int main(void) { float z = 5; float w = 5.2; printf(“avg(%g,%g)=%g\n”, z, w, avg(z,w)); return 0; }
You just need to recompile the main.c source file and then call the link-editor: $ gcc -std=c99 -pedantic -c main.c $ gcc -std=c99 -pedantic -o prog main.o calc.o
XIII.16.1 Sharing identifiers amongst modules You had a long talk about the concepts definition, declaration, linkage, scope, and storage duration. Here, we just give a brief reminder about what we have learned.
Figure XIII‑6 Linking Object Files
Separate compilation supposes that an identifier can be defined in a module and referenced through the same name in different modules. An identifier can be used only if defined in a module. For an identifier to be used outside the module in which it is defined, it must have external linkage. That is, it has an external definition (i.e. file scope) and has been declared without the storage-class specifier static. A reference to an external identifier outside object module in which it is defined is known as an external reference. An identifier with file scope (i.e. external identifier) can be shared amongst modules or
visible only within the module in which it is defined. Declared with the storage-class specifier static, an external identifier can be accessed only within the module in which it is defined: it has internal linkage. Such an identifier cannot be referenced outside its object module: it is “private”. The link-editor matches external references to external definitions and then merges input object files into a single binary file (executable) that can be executed as shown in Figure XIII‑6. Suppose that you attempt to build the prog executable as follows: $ gcc -o prog -std=c99 -pedantic main.o Undefined first referenced Symbol in file avg main.o ld: fatal: Symbol referencing errors. No output written to prog
Linking failed because the main.o object file used a reference to an identifier (the avg function) that has not been defined. You have noticed that when we compiled the source file main.c to yield the object file main.o, no error was produced: external references are resolved at linking stage. If external identifiers are referenced but not defined in an object file, the link-editor generates an error. Each external reference must match one external definition. Take note that in the terminology of compilers, a symbol is synonym for identifier.
XIII.17 Warning Messages When compiling, you should use the –Wall option of gcc that turns on all warning messages. It will help you to correct several mistakes when compiling. Unless your program conforms to C90, C99 and C11, you should also specify the standard you want your program to conform to with the option –std=c90, or –std=c99 or –std=c11 and the option –pedantic.
XIII.18 Libraries A library is an indexed file containing binary code. You could think of libraries as “service” providers. If you wish to use a particular service, you just need to specify the link-editor the name of the library implementing the service. For example, if you wish to use the power math function pow() in your program, you have two choices: o Coding the subroutine yourself o Resorting to an existing library implementing it
According to the UNIX convention, library names have the following format: libname.x
Where: o name is a word identifying the library. You have to provide it for the link-editor when using the –l option. o x is a suffix identifying the type of the library: so for shared libraries and a for archive libraries
On your system, the extension for shared libraries might not be .so. For example, on HP-UX, the .sl extension (shared library) is used.
The UNIX and UNIX-like systems have several ready-to-use libraries. For example, the standard C library, libc, and the math library, libm, can be exploited in your programs. Thus, if you wish to invoke the power function, you simply have to inform the linker that it has to search in the math library for the pow() function. Every library is associated with header files containing declarations for shared identifiers (function, types, variables, macros, structures…). Header files should be included in the source files invoking functions defined in the libraries. The link-editor’s task is to combine object modules and libraries to produce binary executables as shown in Figure XIII‑7. If in your object modules or libraries, there is any undefined external identifier, the linker will not be able to generate the executable file. Keep in mind that header files are different from libraries and object modules. The former contains the declarations while the latter contains the implementations (code). For example, the sqrt() math function is defined in the libm math library and declared in the math.h header file (normally located in /usr/include). The directive #include tells the compiler driver (preprocessor) to place its contents into the file invoking it. During the linking phase, you have to supply explicitly the libm math library to the link-editor if the compiler driver does not specify it automatically. Otherwise, an error will be generated.
Figure XIII‑7 Building an executable
There are two kinds of libraries: static libraries and shared libraries. First, let us discuss about static libraries.
XIII.18.1 Using Libraries Suppose that you wish to use the power math function in the main.c source file. The libm math library defines the pow() function, hence you do not need to implement it. In the main.c
source file, we can call it as follows: $ cat main.c #include #include #include “calc.h” int main(void) { float z = 5; float w = 5.2; printf(“pow( avg(%g,%g), 2 )=%g\n”, z, w, pow( avg(z,w), 2 ) ); return 0; }
The math.h header file contains the prototype of the pow() function. If you compile the source files and then link the resulting compilation units (object files), you may obtain the following output: $ gcc -std=c99 -pedantic -c main.c calc.c $ gcc -std=c99 -pedantic -o prog main.o calc.o Undefined first referenced Symbol in file pow main.o ld: fatal: Symbol referencing errors. No output written to prog
Linking failed because the pow() function was defined neither in the object files main.o nor calc.o. As some compiler drivers do not add automatically the math library, you have to tell the link-editor to get the pow() function definition from the math library by using the –l option as shown below: $ gcc -std=c99 -pedantic -o prog main.o calc.o -lm
More generally, the option –l links libraries with object modules: gcc [-o output_file] [options] object_module_list –lname
Where: o output_file is the name of the executable file to be generated o options are options of the utility gcc. o object_module_list is a list of object files separated by blanks o name is the short name for the library whose file name is of the form libname.x. If x is so, it is a shared library, if x is a, it is a static library.
Of course, you can specify more than one library. For each library that you wish to exploit, precede it with the –l option: gcc … –lname1 –lname2 –lname3…
Where: o name1, name2… are the short names for the libraries libname1.x, libname2.x… The link-editor will look a default library path for libname1.x, libname2.x… Usually, the default library search path includes the /usr/lib directory. The default library search path is indicated in the manual page of the ld link-editor (type out man ld). If the libraries that you wish to draw on are not in the default library search path, you must give explicitly their location to the link-editor by specifying the –L option as follows: gcc … –Ldir1 –Ldir2…
Where: o dir1, dir2… are directories that will be added to the default library search directories. They will be searched before considering the default library search path. If you employ the standard C library or system libraries, such as the math library, you do not need to specify their location, since they are in the default search directories. Furthermore, the standard C library, libc, is automatically linked with your object files even though you do not specify it with the option –lc.
Figure XIII‑8 Using a Static Library
XIII.18.2 C Library The C library in the UNIX systems, called libc, is a superset of the standard C library. It also defines a number of functions complying with SUS (Single UNIX Specification) and other specifications varying with the systems. The GNU C library (commonly found on Linux systems), called glibc, is also an extension of the standard C library. It conforms to ANSI C, POSIX standards, BSD interface and SYSTEM V specifications (SVID) and includes other features such as internationalization.
The compiler drivers always include the standard C library. As far as the GNU gcc is concerned, it invokes glibc.
XIII.18.3 Static Libraries A static library (or archive library) is an archive file containing a collection of object modules created and maintained by the ar utility. All or parts of them are placed into the executable file, when needed, at link time. By convention, a static library name has the .a suffix and starts with lib. For example, libnumber.a, libm.a (math library) and libc.a (C library) are archive libraries. The link-editor will merge all object files supplied on the command line and object files retrieved from archive libraries into a single executable file. That is why archive libraries are also called static libraries. Remember that only the object files needed are copied from archive libraries into the executable file and not the entire archive libraries. For example, if you invoke the avg() function defined in the calc.o module stored in the libnumber.a archive library, only the calc.o object file will be extracted as shown in Figure XIII‑8. It means that each program has its own copy of the object module calc.o. It infers that the same code may be loaded into memory several times as shown in Figure XIII‑9.
Figure XIII‑9 Three Processes Using the Same Functions
XIII.18.3.1 Available archive libraries There may be several archive libraries, such as libm.a (math library). They are usually located in /usr/lib. However, in practice, static libraries are not used when shared libraries are available. The C library is automatically included by the compiler drivers at linking stage. Therefore, you do not need to specify it with the option–lc but if you use other libraries, you must inform the link-editor by providing their short name by using the option -l. XIII.18.3.2 Creating Archive Libraries The following example creates the archive library number from the module calc.o: $ ar rv libnumber.a calc.o a – calc.o ar: creating libnumber.a
If you wish to add other modules at a later stage, launch the same command as shown below: $ ar rv libnumber.a str.o date.o a – str.o a – date.o
The ar command with the option t lists the object modules stored in an archive library: $ ar t libnumber.a calc.o str.o date.o
You can also specify the v option to display additional details: $ ar tv libnumber.a
The command ar with the d option removes the object file date.o from the archive library libnumber.a: $ ar d libnumber.a date.o
XIII.18.3.3 Header Files When you build archive libraries, you should also install somewhere the corresponding header files. For example, if you build the library libnumber.a comprising calc.o, str.o and date.o, you should put the header files calc.h, str.h and date.h in an accessible directory.
Figure XIII‑10 Example of Project Organization
XIII.18.3.4 Using Archive Libraries Suppose, as shown in Figure XIII‑10, that you place your libraries in a directory called lib, the header files in a directory called include and the source files in a directory called src. Now, suppose that you had created the archive library libnumber.a containing the object files str.o, date.o and calc.o and you wish to use the function avg() (defined in the object file calc.o) in your program as in the following example: $ cat main.c
#include #include “calc.h” int main(void) { float a = 19.19; float b = 21.21; printf( “avg(%g,%g)=%g\n”, a, b, avg(a,b) ); return 0; }
First, you have to compile the main.c source file: $ gcc -std=c99 -pedantic -c main.c main.c:2:17: calc.h: No such file or directory
It failed because the header file calc.h was not present the current directory. Therefore, either copy the header file calc.h to the current directory or indicate its location to the compiler by specifying the option –I: $ gcc -std=c99 -pedantic -c main.c -I../include
You do not need to specify the directory location for standard header files (such as stdio.h).
You must then link object modules and archive libraries to produce an executable. The object module main.o references the function avg() defined in the module calc.o (included in the archive library libnumber.a). To inform the linker that it has to search the library libnumber.a for the external definition of the symbol avg(), two methods are available: o Specify the pathname of the library as follows: $ gcc -o prog_arch main.o $HOME/project/lib/libnumber.a
o Use the –l and –L options as follows: $ gcc -o prog_arch main.o -L$HOME/project/lib -lnumber
XIII.18.3.5 Static executable The link-editor can build static or dynamic executables. When it generates a static executable, it copies the code and data from object files, including those extracted from
archive libraries, into a complete executable file in which all references are resolved before running it. The executable file needs no further information to be loaded into memory when executed (all required data and routines are in the executable file). It means that the size of such a program may be large. When you tell the compiler driver to link your object files and libraries to produce an executable, gcc invokes the link-editor with a large number of arguments. If you wish to view them, use the option –v: $ gcc -v -o prog main.o $HOME/project/lib/libnumber.a
The GNU compiler utility uses the GNU link-editor collect2, which in turn invokes the system link-editor ld with several arguments. If your system supports shared libraries (also called dynamically linked libraries), it will use them instead of archive libraries. To tell gcc to build a static executable (using static libraries), you have to specify the –static option: $ gcc -static -o prog_stat main.o -L$HOME/project/lib -lnumber
This works only if the archive libraries used are also available on the system. This is not always the case. XIII.18.3.6 Linking Order Contrast: $ gcc -o prog_arch main.o $HOME/project/lib/libnumber.a
With: $ gcc -o prog_arch $HOME/project/lib/libnumber.a main.o Undefined first referenced symbol in file avg main.o ld:fatal: Symbol referencing errors. No output written to prog_arch collect2:ld return 1 exit status
Very strange, isn’t it? The appearance order of the object files and archive libraries on the command line is relevant! This is due to the following points: o The link-editor reads the command line (from left to right) o The link-editor extracts only the object files containing the external definitions of unresolved references from the archive libraries.
Therefore, if archive libraries appear before your object files on the command line, no object file will be extracted since object modules containing external references have not yet been read.
You should place the archive libraries, after your object files, so that external references are fetched before their definition.
XIII.18.4 Shared Libraries During the link-editing stage, object files extracted from archive libraries are copied into executable files. Unfortunately, this method has some disadvantages: o If a static library is often used on the system, its code is loaded several times into memory. It is memory-consuming. Think of extensively used functions such as printf() … o If you update static libraries (bug corrections), the programs using them must be recompiled o Several copies of the same code are stored on disks, which is a waste of disk space. Another kind of library, called shared libraries or dynamically linked libraries, overcomes the issue. A shared library is a particular object file whose name is of the form libname.so, where name is the short name of the library. A shared library is different from an archive library: it is treated as an object file generated by the link-editor, whereas an archive library is a collection of object files produced by the ar tool. If you wish to update a shared library, you have to recreate it. It is an object file defining external symbols that can be referenced by other object files. Unlike archive libraries, instead of copying object code, the link-editor places into the executable files information that the loader will later exploit to bind shared libraries to the process address space. When shared libraries are used, the link-editor produces incomplete programs, called dynamic executables, that need further linking at execution time, known as a dynamic linking. The dynamic linking task is assigned to the loader (also known as a dynamic linker or runtime linker). It reads the dynamic executable file, load it into memory, bind shared libraries to the process address space, resolves the unresolved references and then give control to the program.
XIII.18.4.1 Sharing Code of Libraries In memory, there is only one copy of shared libraries. For example, the libm.so shared library has one copy in memory shared amongst processes as shown in Figure XIII‑11 and Figure XIII‑12.
Figure XIII‑11 Processes Sharing the Same Library
XIII.18.4.2 System Libraries The UNIX and UNIX-like systems have several shared libraries such as libc.so and libm.so.
They are usually located in the /usr/lib directory. They save programmers a lot of time. XIII.18.4.3 Position-Independent Code (PIC) Normally, compilers and assembler produce virtual addresses. This means, symbols in object modules and executables have fixed addresses (virtual addresses). Thus, when the binary file is loaded into memory, variables, pointers and subprograms are located at fixed addresses in the virtual address space. The problem is a shared library cannot be placed at a fixed address, because dynamically-linked libraries can be shared between multiple processes. Therefore, the same library can map to a process address space at a specific virtual address according to the process (see Figure XIII‑12). To allow shared libraries to be tied to several address spaces at different locations, the link-editor generates Position-Independent Code (PIC) to build shared libraries. We will not explain how the different systems implement PIC. XIII.18.4.4 Building Shared Libraries Now, let us build the libnumber.so shared library containing the calc.o object module: o Compile the source file calc.c with the –fPIC option to generate Position-Independent Code: $ gcc -std=c99 -pedantic -fPIC -c calc.c -I../include
o Tell the link-editor to create the libnumber.so shared library by passing the –shared option to gcc: $ gcc -std=c99 -pedantic -shared -o libnumber.so calc.o -lm
Next, you can test it. Instead of using the archive library libnumber.a previously created, link the shared library libnumber.so with your object module main.o to produce the executable prog_dyn. The way to link shared libraries depends on the operating system. The two following ways to link shared libraries, by using gcc, are generally accepted: o Specify the pathname of the libraries as follows: $ gcc -o prog_dyn main.o $HOME/project/lib/libnumber.so
o Or use the –l and –L options as follows: $ gcc -o prog_dyn main.o -L$HOME/project/lib -lnumber
Now, you can execute the prog_dyn program.
Figure XIII‑12 Mapping Shared Libraries into process address spaces
XIII.18.4.5 Shared Library Dependencies When you link shared libraries and object modules to produce a dynamic object file (i.e. dynamic executable or shared library file), the link-editor places shared library information in it for the loader. They are called dynamic dependencies. Therefore, the target object file depends on a list of shared libraries that will be loaded into memory and attached to it at a later stage when executed. Consequently, a program with dynamic dependencies needs further linking. To list the shared libraries on which a dynamic object
file depends, use the ldd command (list dynamic dependencies): $ ldd prog_dyn libnumber.so => /usr/local/lib/libnumber.so libm.so.2 => /lib/libm.so.2 libc.so.1 => /lib/libc.so.1 libgcc_s.so.1 => /usr/lib/libgcc_s.so.1
If a shared library has been built from other shared libraries, the ldd command also displays its dynamic dependencies. For example: $ ldd $HOME/project/lib/libnumber.so libm.so.2 => /lib/libm.so.2 libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 libc.so.1 => /lib/libc.so.1
The string to the left of the symbol => is the pathname of the required library, and the name to the right of => is the pathname found by the loader. If a library cannot be found, you would see the text “not found”: $ ldd $HOME/project/lib/prog_dyn libnumber.so => (file not found) libc.so.1 => /lib/libc.so.1
The program prog_dyn above cannot be executed by the loader because it does not know where the library libnumber.so is located. If you attempt to execute such a program, the loader generates an error: $ ./prog_dyn ld.so.1: ./prog: fatal: libnumber.so: open failed:No such file or directory
How could the loader know where the shared libraries on which a program depends are located? The following section explains how the loader links libraries and resolves remaining external references when a dynamic program is executed. XIII.18.4.6 Dynamic programs and search path The link-editor builds a partially executable file only if shared libraries are used. It does not merge the code and data of shared libraries into a single object file as it would do with archive libraries. It includes only information for the dynamic linker (i.e. loader) that will be responsible for binding shared libraries to the dynamic executable when creating a process. Hence the name dynamic executable. The procedure for building and using shared libraries varies with the systems. Even if you provide the pathnames to the shared libraries for the link-editor, it may be insufficient depending on the options that you have specified to the link-editor: you may also have to
indicate them to the loader. At linking stage, the full library pathnames are allowed to be included into the executable to avoid specifying them again to the loader. Otherwise, the library search path environment variable must be set. It contains a list of directories (separated by colons) that will be searched for libraries when a dynamic program is executed. The library search path environment variable controlling the library search path is OSdependent. On some systems (Linux and Solaris), the LD_LIBRARY_PATH variable is used. On other UNIX systems, it is called SHLIB_PATH (HP-UX) or LIBPATH (IBM AIX). However that may be, it works in the same way. For example, you can set the LD_LIBRARY_PATH variable in the Bourne shell family as follows: LD_LIBRARY_PATH=path1:path2:… ; export LD_LIBRARY_PATH
If you use the C shell family, use the following syntax: setenv LD_LIBRARY_PATH path1:path2:…
This allows you to place shared libraries anywhere within the system provided that you indicate their location to the loader. For example, a customer could install them in the /opt/application directory and another one could place them in /opt/software. No recompilation is needed! The loader searches the directories stored in the library search path environment variable for the shared libraries. Hence shared libraries provided to the linkeditor (compiling environment) and those used by the loader (executing environment) may have different locations. To consolidate what has been said, let us work with a simple example. We will be building our shared libraries in three ways: first, the full pathname of the shared library is inserted into the executable, next only library names are inserted into the executable, and finally we will tell the link-editor to store several paths for shared libraries (called rpath) in the executable. o Storing the full pathname of shared libraries inside executables: Suppose that you build the dyn_prog executable as follows: $ gcc -o dyn_prog main.o $HOME/lib/libnumber.so $ ldd dyn_prog /users/michael/lib/libnumber.so libc.so.1 => /lib/libc.so.1 libm.so.2 => /lib/libm.so.2 libgcc_s.so.1 => /usr/lib/libgcc_s.so.1
In this case, you do not need to set the library search path environment variable. The
loader will automatically use the full pathname of the library (/users/michael/lib/libnumber.so) stored in the executable. This method does not let users choose the locations for libraries. They are imposed by library suppliers. o Storing library names into executables: You can link object files with shared libraries by specifying the option –L in addition to the option –l. The option –L allows adding a directory to the default list of directory path names used to seek the libraries specified by the –l option. In this case, the locations of the libraries are not put into the executables but only their names. For example: $ gcc -o dyn_prog main.o -L$HOME/lib -lnumber -lm $ ldd dyn_prog libnumber.so => (file not found) libm.so.2 => /lib/libm.so.2 libc.so.1 => /lib/libc.so.1
As you can see, the executable contains only the name of the library libnumber.so. If we execute the program, the loader will not be able to locate the library libnumber.so because the library search path environment variable LD_LIBRARY_PATH is unset. Remind the library search path environment variable contains a list of directories separated by colons (:) that will be searched for libraries whose locations are not defined in the executable. Now, if we set the library search path environment variable, the loader will be able to locate the library number as shown in the subsequent example: $ LD_LIBRARY_PATH=/users/michael/lib:/usr/local/lib $ export LD_LIBRARY_PATH $ ldd dyn_prog libnumber.so => /users/michael/lib/libnumber.so libm.so.2 => /lib/libm.so.2 libc.so.1 => /lib/libc.so.1 libgcc_s.so.1 => /usr/lib/libgcc_s.so.1
The directories listed in the library search path environment variable will be searched in sequential order for the dynamic dependencies. If not found, the loader will search directories in the default path (such as /usr/lib). The loader always searches the standard library directories, so you do not need to insert them in LD_LIBRARY_PATH. Of course, this variable is very useful, but is not recommended for security reasons. Another disadvantage of the library search path environment variable is that users have to set it to the right path. o Using rpath: You can also incorporate a list of directories (called rpath) into the executable file that
the loader will search for dynamic dependencies (shared libraries). The way to use rpath varies from system to system. The following example works on BSD, Oracle Solaris and Linux: $ gcc -o dyn_prog main.o -Wl,-R,/opt/lib,-R,/usr/local/lib -L$HOME/lib -lnumber $ ldd dyn_prog libnumber.so => (file not found) libc.so.1 => /lib/libc.so.1 $ cp libnumber.so /usr/local/lib $ ldd dyn_prog libnumber.so => /usr/local/lib/libnumber.so libc.so.1 => /lib/libc.so.1 libm.so.2 => /lib/libm.so.2 libgcc_s.so.1 => /usr/lib/libgcc_s.so.1
The gcc option –Wl, introduces a list of arguments, separated by a comma, that will be passed to the link-editor. Thus, –Wl,opt1,opt2,opt3 passes the arguments opt1 opt2 opt3 to the link-editor. In our example, the arguments -R /opt/lib –R /usr/local/lib are passed to the linkeditor, which means that the directories /opt/lib and /usr/local/lib will be added to the rpath. This is used when the shared libraries are located in directories known at link time. You can also combine rpath and the library search path environment variable; if a library is not found in rpath, you can set the library search path environment variable. XIII.18.4.7 SUID and SGID For security reasons, the library search path environment variable is disabled if the dynamic executable has the SUID bit set by default. XIII.18.4.8 Version Control Libraries evolve over time for many reasons: adding new functions, correcting bugs, and improving algorithms. Therefore, shared libraries may need updating. When a new library release is compatible with the previous one, programs work with the new one without any change. It is called a minor release. However, newer libraries, called major releases, may get unusable with programs that worked with the previous versions. The different versions of the libraries must be kept in the system so that older programs can continue working. The system distinguishes between the different library versions by using the versioning mechanism (except for IBM AIX). The way it is used depends on the system. A versioned file name has the following format (Sun Solaris, BSD): libname.so.major
or (HP-UX): libname.sl.major
or (GNU/Linux): libname.so.major.minor
Or in GNU/Linux, a micro release number is used to designate a minor change that does not add new interfaces: libname.so.major.minor.micro
Where name is the short name of the library, major, minor and micro are respectively the major, minor and the micro numbers of the library. The shared library name with the libname.so format, used by the link-editor, is simply a link to a versioned file name of the form: libname.so.major.minor.micro. XIII.18.4.9 loading shared libraries If a dynamic executable requires a shared library that is already in memory, the loader just binds it to the process (i.e. it attaches it to the address space of the process). Otherwise, [99] first, the loader has to place the shared library into memory from hard disk . This can be done in two ways: o A shared library can be loaded only when referenced. This is called lazy loading. o A shared library can be loaded as soon as the program is executed. This is called immediate or runtime loading. Either of them can be used according to the options used when linking executables. Generally, systems use the second method. XIII.18.4.10 Shared libraries and operating systems The procedure for building and using shared libraries varies from system to system: the default search path for shared libraries, the environment variable names, the options to link shared libraries, and so on are OS-dependant. For example, on HP-UX, shared libraries must have the execute permission. Several options and environment variables can be used to alter the behavior of the loader and the link-editor.
XIII.18.5 Dynamic or static executable? To determine whether your program is dynamic or static, you can use the commands file or
ldd. For example: $ file prog_dyn prog_stat prog_dyn: ELF-32 bit MSB executable SPARC version1, dynamically linked, not stripped prog_stat: ELF-32 bit MSB executable SPARC version1, statically linked, not stripped
The ldd command generates an error if the file provided is not a dynamic object file (shared library or executable file): $ ldd prog_stat ldd: prog_stat: file is not a dynamic executable of shared object
Unsurprisingly, static executables are larger than dynamic executables: $ ls -l prog_stat prog_dyn -rwxr-xr-x 1 michael users 6456 Apr 22 17:09 prg_dyn -rwxr-xr-x 1 michael users 362836 Apr 22 17:09 prg_stat
XIII.18.6 Shared or archive library? By default, the link-editor uses the shared library if both shared and archive libraries are available. For example, if you specify the option –lnumber, and both the archive library libnumber.a and the shared library libnumber.so are available, the link-editor will draw on libnumber.so. If you wish to use the archive library, you have to specify explicitly the libnumber.a library file name. Table XIII‑1 lists the difference between shared and archive libraries:
Table XIII‑1 Static and shared library comparison
CHAPTER XIV MAKEFILE XIV.1 Introduction Make is a utility commonly used to manage the compilation of programs in UNIX and UNIX-like systems. Compiling and maintaining programs can be quite complex because of dependencies between files. In this context, make can turn out to be very helpful: in addition to simplifying the compilation tasks, it compiles only source files that have been altered, which saves a lot of time. More generally, it manages dependencies between files and performs actions, if required, according to directives you define within a file commonly named a makefile. Several implementations of make are available but most known are SYSTEM V make, BSD make, GNU make, and POSIX make. In this chapter, we describe the main features of POSIX make that are usually available in most of the UNIX and UNIX-like systems. Moreover, features that do not conform to POSIX (described in the standard Open Group Base Specifications, Issue 7 also known as IEEE Std 1003.1-2013) are pointed out so that you could write portable makefiles. This chapter requires the reader to have a good knowledge on the UNIX systems or UNIX-like systems and shells. Otherwise, it would appear quite indigestible.
XIV.2 Invocation The location of the executable make depends on the system and the version. Its pathname could be /usr/ccs/bin/make, /usr/xpg4/bin/make, /usr/local/bin/make… Obviously, consult the man page of your system (man make) to know the list of the make implementations in your system. You could also download and install GNU make. On Linux systems, the default make is GNU make that can be invoked through the command make or gmake located under /usr/bin. First, programmers create a text file that describes the relationship between files and the actions to be performed. Then, the command make is executed with or without arguments: make [options] [–f mkfile] [target_list]
Where: o options are make options described later
o mkfile is a directive file telling make what to build and how to carry it out o target_list is a list of items, called targets, that will be created pr updated if necessary If mkfile is not supplied, by default make will search the working directory for the file called makefile. If not found, it will search for the file called Makefile. That is why, traditionally, the file containing the directives that make interprets is called makefile. For example, after writing the make instructions into a makefile, you could invoke it as follows: $ make –f Makefile
Or just: $ make
XIV.3 Makefile The goal of a makefile is to specify dependencies between files and give directives to the make utility in order to build some of them if they do not exist or rebuild if they have been updated. For example, suppose the file f1 depends on files f2 and f3. You can tell make to spawn f1 based on f2 and f3 if it does not exist or recreate it if the file f2 or f3 have been altered. The file that make owes to create or update is called target: the file f1 is a target in our example. The files on which the target depends are called prerequisites or dependencies: the files f2 and f3 are prerequisites in our example. A makefile is composed of the following entries: o Rules: they describe the relationships between files (i.e. targets and prerequisites) and provide a list of actions to carry out in order to generate targets. There are two kinds of rules: implicit rules and target rules. o Macros: they are memory locations that store text that will be reused later in the makefile. Macros are also called variables. o Comments: it starts with # and continues up to the newline character (end-of-line).
XIV.4 Rules A rule is a makefile entry consisting of a line that lists the relationships between targets and dependencies, as well as command lines that tell make how to create the targets if they do not exist or rebuild them if they are older than the dependencies. Make works with two types of rules: target rules and implicit rules (also called inference rules).
For now, to ease understanding, you can consider that a target is a file.
XIV.4.1 Target rules A target rule is an explicit makefile entry (i.e. you write it on your own) made up of three parts: o A list of targets separated by blanks and terminated by a colon. It specifies the targets to generate if they are out-of-date or missing. o A list of dependencies, also known as prerequisites, separated by blanks on the same line as the target list and terminated by the newline character. It informs make that the list of targets depends on the list of prerequisites. o Action lines separated by newlines. An action is a built-in shell command or an external command. A action line may contain several actions: we will call it command line or action line. Each command line must start with the tab character ( key). For example, $ cat Makefile f1 : f2 f3 cat f1 f2 > f3
In the example above, f1 is the target, f2 and f3 are dependencies and cat f1 f2 > f3 is the action line (command line). This simple makefile tells make to build the file f1 by executing the shell command cat f1 f2 > f3 if f1 does not exist or if f1 or f2 are more recent. More generally, when make is executed, it needs to know what to build and how to do it. A target line tells what to produce and command lines how to do it. A target rule takes the following form:
Where: o target_list is a list of files or fake targets (detailed in Section XIV.4.3) separated by blanks that make will attempt to bring up to date. At least one target is required. o dependency_list is a list of dependencies, also called prerequisites, separated by blanks. It can be a list of files or fake targets (explained later). Dependencies are optional. o The list of actions command1, command2… are commands separated by newlines that must be preceded by a tab with the exception of the first command that can be introduced with a semicolon when appearing in the target line. Command lines are optional. You can omit dependency_list and the list of actions but at least one target must be supplied. Even though the first command can be placed in the target line, prefer writing one list of actions per line (starting with a tab). A target is said to be out-of-date (i.e. no longer valid) if one of the following cases occurs: o The target is not an existing file or directory o The target is older than one or more of its dependencies. In other way to say it, a target rule tells make to bring a target up to date by using the command lines if one or more of its dependencies are newer than it is or if there is no file (or directory) with the same name as the target. The idea is simple: make revises a target only if necessary, which means, only if it is out-of-date. To tell make to check a particular target, just type make followed by the target name you
wish to build. For example, to update only the target called all, type make all. A dependency can also be in turn a target in other rules, which leads to a dependency graph (dependency tree). A dependency graph is diagram that shows the interdependencies between several items as shown in Figure XIV‑1, Figure XIV‑2, Figure XIV‑4 and Figure XIV‑7. The dependency graph lets you having a synthetic view of the relationships between targets and prerequisites. Before make actually updates a target, it performs recursive scans on its dependencies. What does this means? Let us give a small example. Suppose the target f1 depends on the prerequisite f2 that in turn is also a target depending on f3. Say f3 is an existing file. When make attempts to update f1, it takes a look at f1 and sees that first, it has to examine the prerequisite f2. Since f2 is also a target, it analyses its prerequisite f3. Since f3 is an existing file with no prerequisite, and then there is no further scan, it can build (or update) the target f2 from the prerequisite f3 and then create (or update) the target f1 from the prerequisite f2. Thus, for a given target, make checks every dependency to see if it is outdated and brings it up to date if required before generating the target. Make reaches the end of its scan when it encounters dependencies with no prerequisite. This occurs in the following cases: o If the prerequisite is an existing file and is not a target. Otherwise, if the prerequisite is not an existing file and is not listed as a target, make yields an error message notifying it cannot build it. o If the prerequisite is also a target (in another rule) with no dependency, the command lines are executed if it is not an existing file. Otherwise, if there is an existing file with the same name, make considers it up-to-date. Targets in target_list depend on the list of prerequisites dependency_list. We will call current target, the target that make is being processing. Make revises the current target after its dependencies have been updated so that it takes into account their modifications. An action target, often called a fake target or a phony target, is a target that is not an existing file and is not meant to be created but only to coordinate several actions. This implies make will always try to rebuild it by executing its command lines. Usually, the makefile contains several rules. If you execute make by itself, the first rule that does not start with a dot (.) or a percent sign (%) will be processed. If you would like make to examine a particular target, you have to pass it as an argument to the utility: make target.
XIV.4.2 Target files
If a target is a file, make updates it using the command lines only if it misses or one or more of its dependencies are newer. Suppose you have to maintain the file whole_file, which is the concatenation of the two files file1.list and file2.list. This implies that the file whole_file depends on the files file1.list and file2.list. You can use the following makefile that updates the target whole_file if it does not exist or if the dependencies file1.list and file2.list are newer: $ cat Makefile whole_file : file1.list file2.list cat file1.list file2.list > whole_file
When you run make with no argument the following steps are performed: o Since no target is specified on the command line, make searches the makefile for the first rule that does not start with a period or the percent sign %. The target whole_file will be checked. o It goes through the target line and guesses that the target whole_file has two dependencies. Before checking whether it is out-of-date or not, it has to check all its dependencies: ▪ It analyzes the first dependency file1.list. Since it has no dependency it is up-to-date ▪ It checks the second dependency file2.list. Since it has no dependency it is up-to-date o After examining all dependencies, finally, it looks at the target whole_file. At the first invocation of make, the file whole_file does not exist. Therefore, make builds it using the command line (do not forget the tab character preceding the command line). The command lines that follow the target line tell make how to build the target if it is outof-date. The first time you run make the target whole_file does not exist, so you will obtain the following processing: $ cat file1.list line 1 $ cat file2.list line 2 $ make cat file1.list file2.list > whole_file $ cat whole_file line 1 line 2
Next, if you rerun make, the following message occurs: `whole_file’ is up to date. $ make `whole_file’ is up to date
If you alter the dependencies file1.list or file2.list or you remove whole_file make will rebuild the target whole_file. Try this:
$ echo new line 1 > file1.list $ make cat file1.list file2.list > whole_file $ cat whole_file new line 1 line 2
As you can see, make compares the modification time of the dependency files with that of the target.
Figure XIV‑1 Dependency graph showing relationship between files
Now, suppose that the dependencies are also listed as targets in other rules as shown in the following example: $ cat Makefile whole_file : file1.list file2.list cat file1.list file2.list > whole_file file1.list : go echo line 1 > file1.list
file2.list : go echo line 2 > file2.list
The makefile and its corresponding dependency graph, depicted in Figure XIV‑1, show that the file whole_file depends on file1.list and file2.list that in turn depend on the file go. That is, if the file go is altered, all the targets depending on it will be updated. When you run make with no argument, the following steps are performed: o Since no target is specified on the command line, make searches the makefile for the first rule that does not start with a dot or %. The target whole_file will be checked. o It examines the target line and finds out that the target whole_file has two dependencies. Before checking whether it is out-of-date, it has to check all its dependencies: ▪ It analyzes the first dependency file1.list. It recognizes that it has the dependency file go. Before checking file1.list it has to check the prerequisite file go: − It looks at the dependency go. Since the prerequisite file go has no dependency, it is
up-to-date − After checking its dependencies, make checks the target file1.list. It updates the
target file1.list if it does not exist or if the prerequisite file go is newer than it. ▪ It looks at the second dependency file file2.list. It sees that it has the dependency file go. Before checking file2.list it has to check its prerequisite file go: − It looks at the dependency go. Since the prerequisite file go has no dependency, it is
up-to-date − After checking its dependencies, make checks the target file2.list. It updates the
target file2.list if it does not exist or if the prerequisite file go is newer. ▪ After checking all the dependencies, it examines the target whole_file. It updates the target whole_file if it does not exist or if any of the prerequisite files file1.list and file2.list is newer. Running make will produce: $ touch go $ make echo line 1 > file1.list echo line 2 > file2.list cat file1.list file2.list > whole_file $ make ‘whole_file’ is up to date $ touch go
$ make echo line 1 > file1.list echo line 2 > file2.list cat file1.list file2.list > whole_file
Keep in mind that command lines start with the key. Do not copy and paste using the mouse to copy command lines (tab character may not be copied).
XIV.4.3 Action target A target could be exploited to perform a series of actions only. In that case, it does not refer to a file, which implies that make will always attempt to build it using the command lines. We will refer to it as an action target (known as a fake target or a phony target). In the following example, the action target clean is used to remove the target whole_file: $ cat Makefile whole_file : file1.list file2.list cat file1.list file2.list > whole_file clean : rm whole_file $ make clean rm whole_file
Rules defining action targets should not appear first in your makefile because as said earlier, by default, if you do not specify a target when invoking make, the first rule in the makefile will be processed. That is, the file whole_file will be removed each time you invoke make with no argument. Our makefile is made up of two rules. The first one updates the file whole_file if required and the second one deletes the file. The command make invoked with no argument will check the first rule. That is why if you wish to execute the second rule, you have to explicitly supply its name on the command line as follows: make clean. You can also use action targets to coordinate several tasks in your makefile. For example, if you wish to process several targets sequentially, you could write a rule that looks this: all : target1 target2…
The command make all, or simply make if it is the first rule in the makefile, will check
recursively in sequence target1, target2… Thus, several target updates can be launched in sequential order as shown below: $ cat Makefile all : b c b: echo line 1 > b c: echo line 2 > c $ make all
When make is run, it will always try to build the target b and c. They are also used to perform some actions such as removing files, creating directories and so on. The following makefile will tell make to delete all the files having the .o extension: $ cat Makefile all: banner clean banner: echo “Starting Makefile” clean : rm -f *.o $ make all echo “Starting Makefile” Starting Makefile rm *.o $ make clean
XIV.4.4 Dependencies Dependencies, also known as prerequisites, may be listed as targets in order to tell make how to create them as well. It means you define explicit rules to yield them. The following makefile describes the relationship between three files: a.output, b.output and c.output and how to build them: $ cat Makefile a.output : b.output c.output
cat b.output c.output > a.output b.output : echo line 1 > b.output c.output : echo line 2 > c.output
The first rule states that if the file a.output does not exist or is older than any of its dependencies, it will be built using the command cat. The second and the third rule mean the file b.output and c.output will be updated if they do not exist using the command echo. If you execute make, the result will be: $ make echo line 1 > b.output echo line 2 > c.output cat b.output c.output > a.output $ make ‘a.output’ is up to date
XIV.4.5 Introduction to macros and shell variables A macro is a variable that stores a series of characters. It is defined outside rules as follows: VAR = text
When VAR is the identifier of the macro and text the value to be assigned. Blanks around the equal sign are allowed. Blanks can be part of text that ends with the newline character. To get the value of the value, use one of the following syntaxes: $(VAR)
Or ${VAR}
For example: $ cat Makefile VAR = This is an example of macro show : echo $(VAR) $ make echo This is an example of macro This is an example of macro
Do not confuse a macro with a shell variable. In a target rule, a shell variable is defined in
a command line in this way: var=word
Where var is the identifier of the shell variable and word is a sequence of characters [100] different from whitespace characters . You can insert whitespaces only if you place them within double or single quotes. Retain blanks around the equal sign are not accepted. To get the value of a shell variable, use the following syntax: $$var
For example: $ cat Makefile VAR = This is an example of macro show : VAR=“This is an example of shell variable”; echo $$VAR echo $(VAR) $ make VAR=“This is an example of shell variable”; echo $VAR This is an example of shell variable echo This is an example of macro This is an example of macro
If we pass the option –s to the command make, commands are not shown but only executed (this eases reading): $ make -s This is an example of shell variable This is an example of macro
In shells, there is another kind of variables: environment variables. They are visible by commands executed within the shell. They are defined in the same way as shell variables (that are local) except they are exported by the keyword export. In the following example, within the makefile, we attempt to the display the value of the variable VAR defined within the shell that executes the command make: $ cat Makefile show : echo VAR=$$VAR $ VAR=“shell variable” $ make -s VAR=
The example shows the shell variable VAR was empty: the variable defined in the parent shell that executed make was not visible by make. Now, let us turn it into an environment variable and see what happens: $ VAR=“shell variable” $ export VAR $ make -s VAR=shell variable
Environment variables (defined in the parent shell of make) are visible by make and can be used…
XIV.4.6 Command lines XIV.4.6.1 Execution environment Commands in the same line are executed in the same shell process and commands on separate lines are executed in different shell processes. It implies that commands on the same line share the same execution environment. An execution environment consists of the following: o shell variables o Working directory o Umask o Shell flags If you want commands to run in the same process (i.e. same shell) you must use semicolons (;) between them. The following sections give examples. XIV.4.6.1.1 Shell variables
Type in: $ cat Makefile example : V=VAR echo V=$$V $ make V=VAR echo V=$V $
Explanation: o In the first action line, the shell variable V is set to VAR. This line is executed by the shell. o In the second action line, executed by another shell, the command echo displays the value of V. Since the two command lines are executed in different shells, the variable V is undefined in the execution environment of the second command line. In a makefile, to take the value of a shell variable you must precede it with a double dollar $$ because a single dollar is meaningful to make: it expands a make macro. Thus, $V will be interpreted as a macro and expanded by make, while $$V will be interpreted as a shell variable and expanded by the shell. We will talk more about macros later, just retain that a macro is similar to a shell variable except it is not defined in a target rule and is meaningful only to make. Now, assume you use semicolons instead of newlines to separate commands, you will obtain: $ cat Makefile example : V=VAR ; echo V=$$V $ make V=VAR ; echo V holds $V V holds VAR
Using semicolons between commands causes make to spawn a unique shell that will execute them. Therefore, the variable V is defined in the execution environment of the shell that runs all the commands separated by semicolons.
Remember that to expand shell variables in a makefile, use a double-dollar (i.e. $$).
XIV.4.6.1.2 Working directory
Type in: $ cat Makefile example : cd /tmp pwd
$ pwd /users/kath $ make cd /tmp pwd /users/kath
Comment: o In the first action line, the command cd /tmp changes the working directory to /tmp o In the second action line, the command pwd displays the working directory: /users/kath As you can see, in the second command line, we did not get the expected working directory. Now, if you use semicolons instead of newlines to separate commands, you will obtain the expected behavior (because the same shell is used): $ cat Makefile example : cd /tmp ; pwd $ pwd /users/kath $ make cd /tmp ; pwd /tmp
XIV.4.6.1.3 Umask
Type in: $ cat Makefile example : umask 0000 umask $ make umask 0000 umask 0022
Comment: o In the first action line, the command umask 0000 changes the file mode creation mask
to 0000 o In the second action line, the command umask displays the current file mode creation mask: 0022 As shown above, the umask is not shared between commands because they appear on different lines executed by two different shells. If you use semicolons instead of newlines to separate commands, the same shell executes the commands and then you will obtain the expected behavior: $ cat Makefile example : umask 0000 ; umask $ make umask 0000 ; umask 0000
XIV.4.6.1.4 Shell Flags
A shell flag is an option that alters the behavior of the shell. Of course, it depends on the shell. In the UNIX system, $- expands to the set flags of your shell (Bourne shell, Korn Shell and POSIX shell). In a makefile, use $$- to expand it. The Bourne shell, Korn Shell and POSIX shell let you change the flags by using the command set options. In the following example, that works with the Bourne shell, the Korn shell, bash, and any POSIX-compliant shell, we set the flag x: $ cat Makefile example : echo Options=$$set -x echo Options=$$ $ make -s Options=es Options=es
As the example shows, the shell built-in command set in the second action line has no effect on the commands of the line following it unlike to the next example $ cat Makefile example : echo Options=$$- ;set –x ;echo Options=$$$ make -s
Options=es + echo Options=xse Options=esx
It worked as expected because commands separated by semicolons are executed by the same shell. When the x shell flag is set, the shell prints the commands and precedes them with the + character before executing them. In shell, $- expands to the list of shell turnedon flags. In the makefile, to display the shell flags, we used $$-. The –s option of make turns on the silent mode: the shell commands are executed without being display.
XIV.4.7 Controlling the behavior of commands XIV.4.7.1 Disabling echo As shown in the previous examples, make displays the commands before executing them. If you wish to suppress the echo, you have three methods: o Using the –s option: make –s o Preceding commands with the at character (i.e. @) as shown in the following example: $ cat Makefile all : f1 f2 @echo TARGET all done f1 : @echo TARGET f1 done f2 : @echo TARGET f2 done
o Using the special target .SILENT. For example: $ cat Makefile .SILENT : all : f1 f2 echo TARGET all done f1 : echo TARGET f1 done f2 : echo TARGET f2 done
You can provide after the colon a list of targets for which make will not echo commands before executing them. For example, if you wish to prevent make from displaying the commands of the targets f1 and f2, you insert the following line in your makefile: .SILENT: f1 f2. XIV.4.7.2 Errors in commands By default, when a command fails (non-zero exist status) make stops further processing and terminates. Sometimes, you would like commands to be executed without aborting the processing even after failure. For example, when some commands fail, such as rm that removes files or mkdir that creates directories, they should not stop the processing. You can tell make to ignore them. This can be accomplished using one of three methods described below: o Using the –i option: make –i. With the option, make will ignore errors in all commands. o Preceding action lines with a hyphen (i.e. -). Only errors in command lines starting with - are ignored. For example: $ cat Makefile all : f1 f2 @echo TARGET all done f2 : @echo TARGET f2 done f1 : @echo TARGET f1 done clean : -rm f1 f2
o Using the special target .IGNORE. This allows make to ignore errors in all commands. For example: $ cat Makefile .IGNORE: all : f1 f2 echo TARGET all done f2 : echo TARGET f2 done
f1 : echo TARGET F1 done clean : rm f1 f2
You can provide a list of targets after the colon for which make will ignore the command exit status. For example, if you wish to prevent make from checking the command exit status for the target clean and f2, insert the following line in your makefile: .IGNORE: clean f2. If you apply one of the three methods and an error occurs, make displays a message indicating that the error has been ignored and shows the exit status of the failed command as well. Then, it continues executing the next commands of the current rule (and then the subsequent rules if any) as if the command had succeeded. Consider the following example: $ cat Makefile f : f1 f2 cat f1 f2 > f f1 : Echo File f1 Echo File f1 > f1 echo File f1 is empty f2 : echo File f2 > f2 clean: rm -f f f1 f2
According to the makefile, the target f depends on the prerequisite f1 and f2. The dependency tree is shown in Figure XIV‑2.
Figure XIV‑2 Dependency graph showing target f depending on targets f1 and f2
If we run make, we get this (the files f, f1, and f2 are missing as shown by the ls command): $ ls f f1 f2 f: No such file or directory f1: No such file or directory f2: No such file or directory $ make –s sh: line 1: Echo: not found *** Error code 127 The following command caused the error: Echo File f1 make: Fatal error: Command failed for target `f1’ $ ls f f1 f2 f: No such file or directory f1: No such file or directory f2: No such file or directory
No file has been updated. The command terminates as a command fails. Our makefile contained two errors (Echo instead of echo) on the command lines associated with the target file f1. If you had executed the command make with the –i option, the errors in the command lines Echo File f1 and Echo File f1 > f1 would been ignored. Make would have considered them successful and would have continued updating the target f2 and then the target f. Now, let us run make with the option –i: $ make –si sh: line 1: Echo: not found
*** Error code 127 (ignored) The following command caused the error: Echo File f1 sh: line 1: Echo: not found *** Error code 127 (ignored) The following command caused the error: Echo File f1 > f1 File f1 is empty $ ls f f1 f2 f f1 f2
All the targets have been checked even after command failures. Another option, -k, changes the behavior make when an error occurs. As said earlier, when a command ends with a non-zero exit status, make terminates immediately. If you want make to stop building the current target (and the targets depending on it) but continue processing the subsequent targets, you can use the –k option. With our previous makefile, let us run the command make with the option –k: $ make –s clean $ make -ks sh: line 1: Echo: not found *** Error code 127 The following command caused the error: Echo File f1 make: Warning: Target `f’ not remade because of errors $ ls f f1 f2 f: No such file or directory f1: No such file or directory f2
With the –k option, make immediately stop processing the current target, skip it and starts processing the next target. With the –k option, make did not ignore the error in the command Echo line 1 > f1. It stopped processing the subsequent command lines and skipped the target f1 but it continued the processing with the target f2. The target f, depending on f1, was not updated because of the update failure of the target f1. As rule of thumb, errors in commands that affect the target updates, generating inconsistency, should not be ignored. XIV.4.7.3 Prefixing commands with + Prefixing command lines with + ensures that you execute them even if the options –t, -q or
–n are used at make invocation. This rule is not followed by all implementations but a
POSIX-compliant make always follows it. Assume you had the following makefile: $ cat Makefile a : echo command executed $ make -n echo command executed
If you supply the options –t, -q or –n, commands will not be executed. However, if you insert a plus sign at the beginning of the command line, you will obtain the following result: $ cat Makefile a : +echo command executed $ make -n echo command executed command executed
Take note that apart from POSIX-compliant make: o Not all make implementations implement the feature o Some make implementations will not execute command lines prefixed with + if the options –n, –t or –q are used (SYSTEM V behavior) XIV.4.7.4 Multiple prefixes in command lines You can employ more than one prefix at the beginning of the command lines as shown below: $ cat Makefile a : +-@echo command echo executed
It means: o Even if the –t, -q or –n options are used, the command will be executed (+) o Errors on that action line will be ignored (-) o Make will not display commands before executing it (@)
XIV.4.8 Interrupting make Run the following example and attempt to hit : $ cat Makefile
f1 : f2 cp f2 f1 f2 : echo line 1 > f2 sleep 50 $ make echo line 1 > f2 sleep 50
The makefile contains two command lines for the target f2: o The first command line creates the file f2 o The second command line causes make to sleep fifty seconds If you hit , you interrupt the command make causing the removal of the file f2 created by the first command line of the target f2. In order to ensure consistency of the target currently built, make removes it if it has been updated except if you place in the makefile the special target .PRECIOUS. The file names appearing after the colon following the target .PRECIOUS will not be removed when make is interrupted.
XIV.4.9 Defining your shell The SHELL macro references the shell that will run the commands in rules. A command line consists in external commands (such as date, ls, and ps), shell built-in commands (such as cd) and even shell control flow structures (such as if…then…else). External commands are executable files that you can launch in two ways: o You provide its name with no slash. In that case, you have to set properly the variable PATH in your shell environment or in the makefile. For example, if you simply type date, the variable PATH will be consulted to search for the command date o You give the right path name for the command. For example, if you employ the path name /bin/date, the command date will be executed. Shell built-in commands are defined in the shell itself. Therefore, built-ins you can draw on depend on the shell spawn by make: it is referenced by the macro SHELL that has a predefined value depending on the implementation of make. The SHELL macro should not be confused with the environment variable SHELL. Usually, macros not defined in the makefile take their value from the shell environment. There is one exception: if you do not define the SHELL macro in the makefile, the predefined value is used. It means the macro SHELL takes precedence over the environment variable SHELL.
It implies that if you wish to invoke a particular shell, you have to set it explicitly in your makefiles or on the command line at make invocation. For example, if you wish to use the Korn shell, you have to set the macro SHELL in your makefile as follows: SHELL=/bin/ksh or at make invocation as follows: make SHELL=/bin/ksh. In the following example, in a Linux operating system, we display the predefined value of the SHELL macro and the value of the SHELL environment variable: $ cat Makefile show_val: @echo MACRO SHELL=${SHELL} @echo VARIABLE SHELL=$$SHELL $ make MACRO SHELL=/bin/sh VARIABLE SHELL=/bin/bash
To display the SHELL macro, we surrounded its name with braces and preceded it by a $: ${SHELL}. To display the SHELL variable of the shell, we preceded it by $$: $$SHELL. In the following example, we change the SHELL macro to /bin/bash: $ cat Makefile SHELL=/bin/bash show_val: @echo MACRO SHELL=${SHELL} @echo VARIABLE SHELL=$$SHELL $ make MACRO SHELL=/bin/bash VARIABLE SHELL=/bin/bash
The following example is equivalent to the previous one: the SHELL macro is altered on the command line: $ cat Makefile show_val: @echo MACRO SHELL=${SHELL} @echo VARIABLE SHELL=$$SHELL $ make SHELL=/bin/bash MACRO SHELL=/bin/bash
VARIABLE SHELL=/bin/bash
XIV.4.10 Using shell compound commands In command lines of a makefile, you can also exploit shell compound commands. However, you have to pay attention to newlines that separate command lines. As said earlier, each command line is executed within a separate shell, which implies you have to escape newlines with the backslash character \ if you work with shell compound commands and separate commands with semicolons. A compound command is composed of multiple pieces. In a makefile, they have to be separated by semicolons. If you wish to ease reading by inserting newlines, you must escape them. The following example can be used with the Bourne shell, the Korn shell, bash and the POSIX shell: if command ; \ then command1 ; \ command2 ; \ … ; \ fi
Here is an example: $ cat Makefile HOSTS=/etc/hosts exec_cmd : if [ -f ${HOSTS} ] ; \ then echo ${HOSTS} found ; \ fi $ make –s /etc/hosts found
Here is another example that lists the .c files present in the working directory: $ cat Makefile exec_cmd : for i in *.c; do \ echo File $$i; \ done
XIV.5 Dependency graph In a makefile, the following entry is interpreted as a dependency graph (dependency tree): target : dependency1 dependency2 dependency3… … dependency1 : dependency11 dependency12… … dependencyN : dependencyN1 dependencyN2…
The dependency graph of such a rule is depicted in Figure XIV‑3
Figure XIV‑3 Recursive make processing from the top target up to the leaves
It sets out: o target depends on dependency1, dependency2… o dependency1 depends on dependency11, dependency12…When all the prerequisites have been processed dependency1 will then in turn be checked. o And so on
o After all the prerequisites dependency1, dependency2 have been checked, make will check the top target.
Figure XIV‑4 Dependency tree showing relationship between targets and prerequisites
Make will recursively check the dependencies before looking into target and then, if any of them has been updated, make updates target. Make reaches the end of the scan when it encounters a leaf of the dependency tree. A leaf is a node of the tree that has no branches (i.e. run out of dependencies). Assume you have the following makefile: $ cat Makefile all : a b b : b1 b2 cat b1 b2 > b a :
touch a b1 : touch b1 b2 : touch b2
The dependency tree associated with the makefile is shown in Figure XIV‑4. When you launch the command make, it will perform the following steps: o Make analyzes the first rule all:a b. Since the target all has several dependencies, it will check them first: ▪ The first dependency a is analyzed. Since it has no dependency, make rebuilds it if it does not exist ▪ The dependency b is considered. Since it has two dependencies, make will take a look at them before checking b: − The first dependency b1 is analyzed. Since it has no dependency, make rebuilds it if it does not exist − The second dependency b2 is examined. Since it has no dependency, make rebuilds it if it does not exist − After going through all the dependencies of the target b, make checks it and updates it if out-of-date. That is, make updates the target b if a prerequisite file has been updated or if the file b is missing o After checking the dependencies a and b, make looks into the target all. Since no command line is defined, make does nothing else. All the targets on which the target all depends directly or indirectly have been examined. This is known as a recursive scan.
XIV.6 Macros XIV.6.1 User defined macros In a makefile, it is possible to store a text in a memory location called macro (or variable) that can be used later. To define a macro, use the following syntax: macro=string
Where: o macro is the name of the macro composed of a sequence of alphabetic letters, digits, underscores (_) and periods (.) o string is a text containing any character except # (that introduces a comment) and the
newline character. The text string could be an empty string. o blanks around the equal sign (=) are permitted Macros can be used anywhere in the makefile but must be defined outside rules. To retrieve a value stored in a macro (i.e. macro expansion), you have two ways: $(macro) or ${macro}
Any appearance of $(macro) or ${macro} in the makefile will be replaced by the content of the variable macro: the macro is expanded. Undefined macros expand to the null string. Macros are expanded only when used in rules. That is: o In target lines, macros are expanded when analyzed o In command lines, macros are expanded when executed For example: $ cat Makefile # working directory stored in macro DIR DIR=/tmp/maketest $(DIR)/a.output : $(DIR)/b.output $(DIR)/c.output cat $(DIR)/b.output $(DIR)/c.output > $(DIR)/a.output $(DIR)/b.output : echo line 1 > $(DIR)/b.output $(DIR)/c.output : echo line 2 > $(DIR)/c.output
Later, if you wish to use another directory, you just need to assign a new value to the DIR macro. You may think it works as in any programming language but it does not. Since make allows you to define macros anywhere in a makefile (but outside rules), you may think a macro can hold different values varying with its position in the makefile. Type in: $ cat Makefile all : A B V=First value a : echo target a V=$(V) V=Last value
b : echo target b V=$(V) $ make –s target a V=Last value target b V=Last value
Retain this: a macro has the same value in the whole makefile. A makefile is not a script, the way it works has nothing to do with interpreted programming languages such as shells, awk, perl... In a makefile, only the last assignment takes effect. The reason is that before processing, make reads the whole makefile. Thus, after the makefile is entirely loaded, the last assignment of each macro is actually kept. You can define macros from other macros as shown in the following example: $ cat Makefile Y=$(X) search.c X=main.c TEST : echo Y=$(Y) and X=$(X) $ make –s Y=main.c search.c and X=main.c
It sounds strange but as the example above shows it, you can use a macro before defining it. There are two reasons for that: o The makefile is entirely loaded before processing rules. That is, after reading the makefile, make has all the definitions of the variables in memory. In our previous example, after loading the makefile, make will have in memory Y=$(X) search.c and X=main.c
o The macros are not expanded while make is reading the makefile. Macro expansion occurs only when target lines are processed and command lines executed. For example, the macro Y defined as $(X) search.c expands to main.c seach.c when make processes the rule for the target TEST. Since only the last macro assignment is retained, a string cannot be appended to a macro as follows: X=main.c X=${X} search.c
Only the assignment X=${X} search.c will remain! Make will then detect a loop and abort the processing. The following macro assignment allows you to append characters to a macro: macro += string
Unlike the first macro assignment form, this one requires blanks on both side of +=. In
some implementation of make such as GNU make, you can tell make to perform an immediate expansion in assignment while reading the makefile. For that, GNU make uses the syntax: macro := value. For example: $ cat Makefile Y:=$(X) search.c X=main.c TEST : echo Y=$(Y) $ make –s Y= search.c and X=main.c
If we had set the variable X before the variable Y, we would have obtained the following output: $ cat Makefile X=main.c Y:=$(X) search.c TEST : echo Y=$(Y) $ make –s Y=main.c search.c and X=main.c
Since the dollar sign introduces make macro expansion, you have to precede it with another dollar sign $$ if you wish to use $ as a literal character. That is, make evaluates $$ to $. Another form performing macro substitution worth considering: $(macro:word1=word2)
This syntax causes each word1 appearing at the end of each word stored in the variable macro to be replaced by word2. The words stored in macro are separated by blanks. For example: $ cat mk SRC=main.c search.c example : @echo ${SRC:.c=.o } $ make –f mk main.o search.o
The example shows that if the variable SRC stores the two words main.c and search.c then $(SRC:.c=.o) expands to main.o search.o. Macros provide the following benefits: o They ease reading o Modification is done once in macros o As some parts of a makefile may depend on the systems and the version of make, nonportable items can be put in macros.
XIV.6.2 Environment variables and macros In a makefile, you can resort to environment variables in command lines. The following example uses the environment variable HOME: $ cat env.mk example : @echo $(HOME) $ make –f env.mk /users/jan
Now, if you define a macro called HOME in your makefile, it will hide the environment variable as shown in the following example: $ cat env.mk HOME=macro defined in makefile example : @echo $(HOME) $ make –f env.mk macro defined in makefile
If you pass the option –e to make, the shell environment variables supplant the macro definitions in your makefile: $ make –f env.mk –e /users/michael
XIV.6.3 Passing macros Macro definitions passed to the command make on the shell command line override those made in the makefile. Here is the syntax: make [–f makefile] macro=string
For example:
$ cat macro.mk CC=cc example : @echo CC holds $(CC) $ make –f macro.mk CC=gcc CC holds gcc
XIV.6.4 MAKEFLAGS Options, with the exception of –f and –p, and macro assignments (except for SYSTEM V make) passed to the command make are added to the macro MAKEFLAGS. The macro MAKEFLAGS does not behave as other macros: it is accessible to the commands executed by make unlike other macros. The following shell script, which we will use in our makefile, displays two variables MAKEFLAGS and MYVAR: $ cat disp.sh #!/bin/sh echo MAKEFLAGS=[$MAKEFLAGS] echo MYVAR=[$MYVAR] echo MYMAC=[$MYMAC]
In shells, variable expansions can be performed without using braces (i.e. {}). With make, macro expansions require parentheses or braces
Now, consider the following makefile: $ cat Makefile MYMAC=my Macro all: ./disp.sh
The shell script disp.sh displays the content of the macros MAKEFLAGS and MYVAR. In the following example, we run GNU make: $ make -s MYVAR=“This is an example” MAKEFLAGS=[s — MYVAR=This\ is\ an\ example] MYVAR=[This is an example] MYMAC=[]
In the following example, we run a SYSTEM V make in Oracle Solaris operating system: $ make -s MYVAR=“This is an example” MAKEFLAGS=[-s] MYVAR=[This is an example] MYMAC=[]
In the following example, we run a POSIX make in Oracle Solaris operating system: $ make -s MYVAR=“This is an example” MAKEFLAGS=[-s MYVAR=This\ is\ an\ example] MYVAR=[This is an example] MYMAC=[]
The examples show four things: o The macro MAKEFLAGS is visible by commands executed by make o The arguments passed to make stored in the variable MAKEFLAGS o Macros defined on the shell command line at the invocation of make are exported. That is, commands run by make can use them. o Macros defined in a makefile are not visible by commands executed by make o The contents of the macro MAKEFLAGS depends on the implementation. Normally, users do not need to resort to the MAKEFLAGS macro. It is internally used by make to pass options and macros to sub-makes (invocation of new instance of make with another makefile). If you alter it manually, your makefile is no longer portable.
The macro MAKEFLAGS is often used to pass macros such as CFLAGS to sub-makes.
The contents of MAKEFLAGS depend on make implementations.
XIV.6.5 Predefined macros In addition to the user-defined macros in the makefile, a number of predefined macros are also available. To display them, call the command make –p as in the following example:
$ make –p | grep =
Try the following example: $ cat Makefile example : @echo CC holds $(CC) $ make -s CC holds cc
XIV.6.6 Precedence of macro assignments The value of a macro comes from one of the following sources listed in order of precedence if the make option –e is not passed: o Macro definition passed to the command make on the shell command line o Macro definition given by the MAKEFLAGS environment variable (not to be confused with the macro MAKEFLAGS that holds the arguments passed to the command make) o Macro definition within the makefile o Shell environment variable o Make predefined macro If the option –e is passed, the order of precedence becomes: o Macro definition passed to the command make on the shell command line o Macro definition given by the MAKEFLAGS environment variable (not to be confused with the macro MAKEFLAGS) o Shell environment variable o Macro definition within the makefile o Make predefined macro This means that after reading a makefile, the definition of a macro specified in the makefile overrides that of a predefined macro. The definition of a macro within a makefile also overrides the definition of a shell environment variable unless the –e option is specified. However, the definition of a macro passed to make on the shell command line or provided by the environment variable MAKEFLAGS take precedence over the macro definition within a makefile. The following example clearly shows what we have just said: $ cat Makefile
VAR=IN_MAKEFILE showvar : @echo VAR=$(VAR) $ export MAKEFLAGS=“VAR=IN_MAKEFLAGS” $ export VAR=“IN_ENV_VAR” $ make VAR=IN_CMD_LINE VAR=IN_CMD_LINE $ make -e VAR=IN_CMD_LINE VAR=IN_CMD_LINE $ make VAR=IN_MAKEFLAGS $ make –e VAR=IN_MAKEFLAGS $ unset MAKEFLAGS $ make VAR=IN_MAKEFILE $ make -e VAR=IN_ENV_VAR
XIV.6.7 Internal macros Internal macros are built-in macros that make sets automatically during the rule processing. Their value depends on the rule that make is examining. They are used in command lines to build targets. Table XIV‑1 shows a non-exhaustive list of internal macros specified by POSIX traditionally defined.
Table XIV‑1 Dynamic macros
In some implementations of make (such as GNU make), the macros $* and $< can also be used in explicit rule (see the next section) but you should appeal to them only in implicit rules if you wish to write portable makefiles. For example: $ cat Makefile .SILENT : f1.date: f1.txt f2.txt
echo ‘$$@=$@’ echo ‘$$?=$?’ $ touch f1.txt f2.txt $ make $$@=f1.date $$?=f1.txt f2.txt $ touch f1.txt $ touch f2.txt $ make $@=f1.date $?=f1.txt f2.txt
Moreover, every internal macro c is associated with the two special macros $(cD) and $(cF) (where c is ?, > f1.date
The rule states that if file f1.date is missing or is older than file f1.txt then the target f1.date will be rebuilt as follows: o Creates f1.date as a copy of f1.txt o Appends the output of the command date to the file f1.date Now suppose there are several files to be drawn up in this way. Instead of writing several times similar rules, you can define only an implicit rule that describes what to build (target line) and how to do it (command lines). This can be achieved by defining a makefile entry having the following syntax: .src_suffix.target_suffix: command1 command2 …
This rule means that: o A target file having the suffix .target_suffix depends on the file (implicit dependency) having the suffix .src_suffix and the same base name o command1, command2… are commands separated by newlines and starting with tabs, which will update the targets having the suffix .target_suffix. Every command line starting with tab will be run in a separate process. To inform make we are going to work with new suffixes in our own implicit rules, the special target .SUFFIXES is required:
.SUFFIXES: .suf1 .suf2 …
For example, in the following example, every target file with the suffix .date is derived from the dependency file having the suffix .txt and the same base name. $ cat Makefile .SUFFIXES: .txt .date .txt.date: cp $< $@ date >> $@ echo Implicit rule: $@ done using $< f1.date : go f2.date : echo ‘Implicit rule not used here’ $ touch f1.txt f2.txt f3.txt go $ make –s f1.date Implicit rule: f1.date done using f1.txt $ make –s f2.date Implicit rule not used here $ make –s f3.date Implicit rule: f3.date done using f3.txt
Make builds the target file f1.date and f3.date using the implicit rule because: o The target rule that builds f1.date does not define command lines for it o There is no explicit rule to build f3.date Make does not use the implicit rules to build f2.date because actions yielding it are specified in an explicit rule. The target f3.date has no dependency other than the implicit dependency Consequently, we do not need to define explicitly a rule for it.
f3.txt.
XIV.7.2 Pattern matching rules A pattern-matching rule is an alternative to suffix rules that also describes a way to spawn
automatically targets from dependencies using the pattern matching mechanism. It is not available in all implementations and not specified by POSIX. It is implemented by GNU make. It takes the following form: %target_prefix%target_suffix: %src_prefix%src_suffix command1 command2 …
It means: o A target file with the prefix .target_prefix and the suffix .target_suffix depends on the prerequisite file having the prefix .src_prefix, the suffix .src_suffix and the same base name (i.e. the name with no suffix or prefix) denoted by % o command1, command2… are command lines separated by newlines and starting with tabs. They build and update the matching targets. For example, in the following example every target file with the suffix .date is stems from the dependency file having the .txt suffix: $ cat Makefile %date: %txt cp $< $@ date >> $@ echo $@ done using $< f1.date : go f2.date : echo ‘Implicit rule not used here’ $ touch f1.txt f2.txt f3.txt go $ make –s f1.date f1.date done using f1.txt $ make –s f2.date Implicit rule not used here $ make –s f3.date f3.date done using f3.txt
The implicit rule is used to build f1.date and f3.date because: o The target rule that builds f1.date does not define actions for it o There is no explicit rule to build f3.date Make does not use the implicit rules to build f2.date because the actions building it are defined in an explicit rule. The target f3.date has no dependency other than the implicit
dependency f3.txt. Therefore, we do not need to define explicitly a rule for it.
XIV.8 Controlling make behavior XIV.8.1 Special targets Make defines a number of special targets: they start with a dot. They alter the behavior of make. Table XIV‑2 lists special targets, specified in the POSIX standard, accepted by most make implementations. Name
Meaning
.DEFAULT
Defines the list of actions to be performed for targets that have no rule defining how to build them
.IGNORE
Causes make to ignore command errors. If followed by a list of prerequisites that are also targets, only errors resulting from the commands associated with the listed target are ignored
.PRECIOUS
By default, when make is interrupted (for example, when hitting ) while revising targets, it removes them. This special target tells make to keep them. If a list of targets is specified, only these files are preserved.
.SILENT
This special target prevents make from displaying commands before running them.
.SUFFIXES
Defines a list of user-defined suffixes for which implicit rules are used. It has to be used with suffix rules. Table XIV‑2 Special targets
Another special target .POSIX is specified in POSIX-compliant make. It ensures you write POSIX-compliant makefiles. We recommend you to use it if your makefiles are to be run on multiple platforms. These special targets are explained in the next sections.
XIV.8.2 Make options The options of Table XIV‑3, specified in the POSIX standard, are acknowledged by most make implementations.
Table XIV‑3 Make options
XIV.9 Recursive make If you have to run several make instances whose makefiles are located in different directories, you can run make recursively by using the shell command cd combined with the macro $(MAKE). Let us suppose you have two sub-directories A and B in the working directory, each containing a makefile as described below: o In the working directory, you have the following makefile:
$ cat Makefile all: main build_A build_B main : touch main build_A : cd A && echo Enter directory A && $(MAKE) build_B : cd B && echo Enter directory B && $(MAKE)
o In directory A, you have the following makefile: $ cat Makefile A1 : touch A1
o In directory B, you have the following makefile: $ cat Makefile B1 : touch B1
Running make will generate this: $ make touch main cd A && echo Enter directory A && make Enter directory A touch A1 cd B && echo Enter directory B && make Enter directory B touch B1
The symbol && is a shell operator which executes a command only if the previous one has succeeded. The command cmd1 && cmd2 &&… means that shell executes command cm1 and then cmd2 only if cmd1 has a zero exit status, and so on. You could also execute command lines such as cmd1;cmd2… which means cmd1 is executed, then cmd2 is executed regardless of the exit status of the commands.
When you run make, the following steps are performed: o The action target all is analyzed. It has three dependencies: main, build_A and build_B: ▪ Firstly, make deals with the dependency main. Since there is a rule defining how to build it, make will update it if out-of-date ▪ Secondly, it analyzes the dependency build_A that is also listed as a target. Since it has no dependency and does not exist, make will run the command line which consists of entering the directory A and executing make: − cd A && echo Enter directory A && $(MAKE). This command line tells make to enter the
sub-directory A if it exists, prints the text Enter directory A and then runs an new instance of make − The target file A is created if missing.
▪ The same process is performed for the dependency build_B o After updating main, build_A and build_B, make terminates its processing. The macro MAKE holds the path name you used to invoke make. That is to say, if you invoke /usr/local/bin/make on the command line then $(MAKE) will expand to /usr/local/bin/make. You have to pay attention that in some implementations of make (SYSTEM V make and GNU make), even with the option -n, command lines containing $(MAKE) will be executed as if + had been placed in front of them. Remember that POSIX make does not execute command lines if you execute it with the options –n, -t or –q.
XIV.9.1 Inheritance of options There are two ways of passing options to make: o At make invocation. For example, make –i, make –s… o Using the macro MAKEFLAGS. If make finds options set in the macro MAKEFLAGS, it will use them. Options and macros passed to the command make are stored in MAKEFLAGS. The macro MAKEFLAGS does not behave like other macros: it is accessible to all sub-makes while other macros are only visible by the make interpreting the makefile in which they are defined. Options used at make invocation are added to the macro MAKEFLAGS that will also be available to sub-makes with the exception of the –f and –p options. For example, let us suppose you had a sub-directory A in the working directory. We create the following makefiles: o In the working directory, we have the following makefile:
$ cat Makefile all: main build_A main : echo In the top make MAKEFLAGS=$(MAKEFLAGS) build_A : cd A && $(MAKE)
o In directory A, we create the following makefile: $ cat Makefile A : echo In the sub-make MAKEFLAGS=$(MAKEFLAGS)
If you run make with the –s option, the result would be: $ make -s In the top make MAKEFLAGS=-s In the sub-make MAKEFLAGS=-s
As already mentioned, the format of the contents of MAKEFLAGS depends on the implementation. For example, if you run GNU make, the result will be: $ make -s In the top make MAKEFLAGS=s In the sub-make MAKEFLAGS=s
The variable MAKEFLAGS allows you to propagate make options from the top make to submakes.
XIV.9.2 Inheritance of macros Macros defined in makefiles are not exported. That is to say, they are not visible to the commands executed by make including sub-makes. The macro MAKEFLAGS, that is exported, is used to make options and macros accessible to sub-makes. Say you have the following makefile: $ cat Makefile all: main build_A main : echo In the top make A=$(A) and B=$(B) build_A :
cd A && $(MAKE)
And in sub-directory A, you have the following makefile: $ cat Makefile A : echo In the sub-make A=$(A) and B=$(B)
If you define macros on the command line when invoking make, they will be placed in the variable MAKEFLAGS and then exported as shown below: $ make –s A=VAR_A B=VAR_B In the top make A=VAR_A and B=VAR_B In the sub-make A=VAR_A and B=VAR_B
XIV.10 Using multiple rules for one target Several rules can be applied to the same target if at most one rule holds command lines, in which case dependencies are merged into one dependency list. For example: $ cat Makefile a.new: a.txt a.new: data.txt a.new: echo $? Newer than $@ > a.new
is equivalent to: $ cat Makefile a.new: a.txt data.txt build.sh a.new
The list of command lines for targets is not required if implicit rules are defined. If you provide more than one rule with command lines for the same target, it yields an error.
XIV.11 Multiple targets in the same rule Even though in practice it is not frequently used, more than one target can appear in a target rule. It is equivalent to writing several target entries. You can use it in two cases: o Case 1. The same command lines apply to several targets: $ cat Makefile a.new b.new: data.txt cat $(@:.new=.txt) data.txt >> $@ echo Date of update: `date`>> $@
a.new: a.txt b.new: b.txt
Explanation: ▪ The first rule tells make the targets a.new and b.new depend on the prerequisite file data.txt. The command lines in the following lines are executed to update them if needed. This rule is the same as the following: a.new: data.txt cat $(@:.new=.txt) data.txt >> $@ echo Date of update: `date`>> $@ b.new: data.txt cat $(@:.new=.txt) data.txt >> $@ echo Date of update: `date`>> $@
IN the first target rule, the special target $@ will expand to a.new and $(@:.new=.txt) will expand to a.txt. Likewise, in the second target rule, $@ will expand to b.new and $(@:.new=.txt) will expand to b.txt ▪ The second rule formulates that a.new depends on the prerequisite file a.txt ▪ The third rule states that b.new depends on the prerequisite file b.txt ▪ As explained previously, when multiple target rules are defined for the same target, the dependencies are merged into a single dependency list. Thus, the previous Makefile is equivalent to: a.new: a.txt data.txt cat $(@:.new=.txt) data.txt >> $@ echo Date of update: `date`>> $@ b.new: b.txt data.txt cat $(@:.new=.txt) data.txt >> $@ echo Hour of update: `date`>> $@
You have to be careful when using such a target rule. As notified earlier, if you run make with no argument, it will search the first target rule. In our example, it is a.new. It will not check all the targets in the target list but only the first one. If you wish to update b.new you have to type explicitly make b.new. To test the makefile written at the beginning of the section, you have to create the files a.txt, b.txt and data.txt. For example, you could create them as follows: $ echo This is an example > data.txt $ echo File a > a.txt
$ echo File b > b.txt
If you execute make, you will obtain something like this: $ make cat a.txt data.txt > a.new echo update: `date ’+%X’`>> a.new $ cat a.new File a This is an example Hour of update: 08:00:00 $ make b.new cat a.txt data.txt > a.new echo update: `date`>> a.new $ cat b.new File b This is an example Hour of update: 08:00:10
o Case 2. Implicit rules build the targets. Consider the following makefile: $ cat Makefile .SUFFIXES: .new .txt .txt.new: cp $< $@ echo Date of the last update: `date` >> $@ a.new b.new: go $ touch go $ make cp a.txt a.new echo Date of the last update: `date`>> a.new
Explanation: ▪ The first line defines the list of suffixes that will trigger the use of the used-defined implicit rules. ▪ The second line defines an implicit rule. It states that a target file with the extension .new depends on the implicit prerequisite file which has the same base name with the .txt suffix ▪ The subsequent two lines are command lines that make will run to update the target file ▪ The last line is a target rule stating that the files a.new and b.new depend on the file go
(in addition to the implicit prerequisites) The previous makefile is equivalent to the following: $ cat Makefile .SUFFIXES: .new .txt .txt.new: cp $< $@ echo Date of the last update: `date` >> $@ a.new: go b.new: go
XIV.12 Continuation line If you need to break a long line into multiple lines, insert the backslash (i.e. \) before hitting the key. For example: $ cat Makefile V=a \ c a.new: echo file \ a.new updated
It is the same as: $ cat Makefile V=a c a.new: echo file a.new updated
The backslash must immediately be followed by a newline (generated by the key).
XIV.13 Compiling C programs with make The make utility can help you compile your programs. You provide a makefile with rules that define relationships between object files, executables and source files and the commands to build them. A makefile ensures that only object files and executables
depending on altered source files will be recompiled. It also allows you to update and maintain archive and dynamic libraries. In addition, it can perform clean up, automatic installation, tests… In our examples, we will invoke the GNU gcc as compiler and linkeditor.
Figure XIV‑5 Compilation steps of C source files
XIV.14 Dependency graph When you compile C source files, having the .c suffix, the compiler generates object files having the same base name with the .o suffix. The link-editor will then combine all the
object files and libraries to create an executable. This implies that executables depend on object files and libraries, which in turn depend on source files. In addition, object files depend on header files that have the .h suffix. Figure XIV‑5 summarizes the compilation process. The corresponding dependency tree is depicted in Figure XIV‑6.
Figure XIV‑6 Tree showing dependencies between the executable and the source files
XIV.14.1 Target rules
Figure XIV‑7 Dependency tree of our project
Let us suppose we have to write a C program called psuser that displays information about users. We create three modules: main.c, getinfo.c and display.c. (To test the makefile we are going to write, put in the source dummy functions: what you put in the source files do not matter). In the source files getinfo.c and display.c, we define functions called in main.c containing the main() program. In the file getinfo.h, we declare the function prototypes defined in the source file getinfo.c, and in file display.h we declare prototypes of the functions defined in the source file display.c. The source files include our header files: o display.c: #include “display.h”
o getinfo.c: #include “getinfo.h”
o main.c: #include “display.h” #include “getinfo.h”
The corresponding dependency tree is shown in Figure XIV‑7. The dependency graph helps us order the program formation: o The executable psuser deriving from the object files display.o, main.o and getinfo.o is built by the link-editor. It can be generated as follows: $ gcc display.o main.o getinfo.o -o psuser
o The object files are created by the compiler: ▪ The C compiler generates the object file display.o from the source file display.c and the header file display.h. It can be generated it as follows: $ gcc -c display.c
▪ The C compiler generates the object file main.o from the source file main.c and the header files display.h and getinfo.h: $ gcc -c main.c
▪ The C compiler generates the object file getinfo.o from the source file getinfo.c and the header file getinfo.h. You can yield it as follows: $ gcc -c getinfo.c
The following makefile maintains and creates the executable psuser along with object files: $ cat Makefile psuser : main.o display.o getinfo.o gcc main.o display.o getinfo.o -o psuser main.o : main.c display.h getinfo.h gcc -c main.c display.o : display.c display.h gcc -c display.c getinfo.o : getinfo.c getinfo.h gcc -c getinfo.c
If you execute make, it will produce: $ make gcc -c main.c gcc -c display.c
gcc -c getinfo.c gcc main.o display.o getinfo.o -o psuser
If you modify the source file display.c (we simulate it by altering the modification time using the command touch), make will produce the following output: $ touch display.c $ make gcc -c display.c gcc main.o display.o getinfo.o -o psuser $
The example shows that only the object files depending on the altered files are recompiled. The linking step is then performed because the executable depends on all object files. If you modify the file header file display.h, the object files main.o and display.o will be rebuilt as shown below: $ touch display.h gcc -c main.c gcc -c display.c gcc main.o display.o getinfo.o -o psuser $
The basic makefile that we have written work well but it is not convenient for modifying. In the following sections, we will show how to improve it by using additional features: macros and implicit rules.
XIV.14.2 Macros XIV.14.2.1 User-defined macros You have noticed that our previous makefile contains redundant data. For example, we could improve it replacing the following entry: psuser : main.o display.o getinfo.o gcc main.o display.o getinfo.o -o psuser
by: OBJECT=main.o display.o getinfo.o psuser : $(OBJECT) gcc $(OBJECT) -o psuser
Therefore, our makefile can be rewritten as follows: $ cat Makefile
OBJECTS=main.o display.o getinfo.o psuser : $(OBJECTS) gcc $(OBJECT) -o psuser main.o : main.c display.h getinfo.h gcc -c main.c display.o : display.c display.h gcc -c display.c getinfo.o : getinfo.c getinfo.h gcc -c getinfo.c
Assume the GNU compiler gcc is not available on the system but another one (for example cc). This causes you to modify all the lines containing gcc by cc! So, it is wiser to define a macro for storing the compiler name. Traditionally, a compiler name is stored in the CC macro and the link-editor in the LD macro. In the following example, macros CC and LD are set to gcc: $ cat Makefile OBJECTS=main.o display.o getinfo.o CC=gcc LD=gcc psuser : $(OBJECTS) $(LD) $(OBJECT) -o psuser main.o : main.c display.h getinfo.h $(CC) -c main.c display.o : display.c display.h $(CC) -c display.c getinfo.o : getinfo.c getinfo.h $(CC) -c getinfo.c
Programmers sometimes need to pass special options to the compiler or the link-editor. For example, we could use the –O option that tells gcc to optimize the object codes. Traditionally, compiler options are set in the macro CFLAGS and link-editor options are stored in LDFLAGS. Our makefile becomes: $ cat Makefile
OBJECTS=main.o display.o getinfo.o CC=gcc LD=gcc CFLAGS=-O -std=c99 -pedantic -Wall LDFLAGS= psuser : $(OBJECTS) $(LD) $(LDFLAGS) $(OBJECTS) -o psuser main.o : main.c display.h getinfo.h $(CC) $(CFLAGS) -c main.c display.o : display.c display.h $(CC) $(CFLAGS) -c display.c getinfo.o : getinfo.c getinfo.h $(CC) $(CFLAGS) -c getinfo.c
XIV.14.2.2 Special macros As make has special macros whose values are dynamic, we could use $@ which holds the target name being processed and $(@:.o=.c) which expands to the current target base name being processed followed by the .c suffix. Our makefile would take the following form: $ cat Makefile OBJECTS=main.o display.o getinfo.o CC=gcc LD=gcc LDFLAGS= CFLAGS=-O -std=c99 -pedantic -Wall psuser : $(OBJECTS) $(LD) $(OBJECTS) $(LDFLAGS) -o $@ main.o : main.c display.h getinfo.h $(CC) $(CFLAGS) -c $(@:.o=.c) display.o : display.c display.h $(CC) $(CFLAGS) -c $(@:.o=.c) getinfo.o : getinfo.c getinfo.h $(CC) $(CFLAGS) -c $(@:.o=.c)
XIV.14.2.3 Predefined macros A number of predefined macros can be used in your makefiles but it is safer to define explicitly all your macros. For example, the macro CC is already set when you run make. If you wish to display all the predefined macros, just type the command make –p.
XIV.14.3 Implicit rules If you look at our makefile, you will see that the same command lines occur several times. The reason is that all object files are built in the same way. Consequently, you could write implicit rules or use predefined implicit rules that tell make how object files are derived from source files and how executables are made from object files and libraries. Even though, all versions of make come with predefined rules, it is safer to define your own rules and macros. The only case where you can use predefined rules is for building archive libraries. XIV.14.3.1 User-defined implicit rules .c.o: $(CC) $(CFLAGS) –c $
example.c Revision 1.1 done
o Example 2. Get the latest version of the file example.c in order to alter it $ co -l example.c RCS/example.c,v —> example.c Revision 1.1 (locked) done
o Example 3. Check out for reading the version 1.1 of the file example.c: $ co -r1.1 example.c RCS/example.c,v —> example.c Revision 1.1 done
o Example 4. Extract the version 1.1 of the file example.c for altering: $ co -l1.1 example.c RCS/example.c,v —> example.c Revision 1.1 (locked) done
XV.6.2.5 Check in When you are satisfied with your working version and you would like to register it, invoke the ci command (check in): ci [-r[RID]] filename
If RID is not specified, RCS automatically increments the last number (level or sequence) of the current RID. Otherwise, it assigns RID to the new revision. o Example 1. The following example checks in the latest revision: $ ci example.c RCS/example.c,v > Text added >> . done
o Example 2. The following example checks in the revision 2.1:
$ ci -r2.1 example.c RCS/example.c,v > Text added >> . done
Keep in mind that you will be able to check in a file being updated only if you have retrieved a modifiable version with the command co –l. Otherwise, the following error message is produced: $ co example.c RCS/example.c,v —> example.c Revision 2.1 done $ ci example.c ci: RCS/example.c,v: no lock set by Michael
The message indicates that you have to fetch a modifiable version (co -l option) if you wish to check in a new revision. XV.6.2.6 Listing history The command rlog shows information on files managed by RCS: rlog [-L] [-R] [-h] filename
Where: o filename is an RCS administrative file and a file under RCS control. o –L: displays RCS information on the file retrieved for modifying o –R: displays only RCS administrative file names. o –h: displays RCS information about filename: RCS file name, real file name, current RID, and so on. The following example displays all files checked out for altering: $ rlog -L -R RCS/* RCS/example.c,v $
XV.6.2.7 Comparing revision The command rcsdiff compares two revisions: rcsdiff –rRID1 –rRID2 filename
The following example compares the revisions 1.2 and 2.2 of the file example.c: $ rcsdiff -r1.2 -r2.2 example.c
XV.6.2.8 Cleaning To remove the read-only retrieved versions from the working directory: rcsclean
Only the read-only versions under RCS control are deleted from the working directory. XV.6.2.9 Removing revision The following syntax will remove the revision identified by RID: rcs –oRID filename
The following example removes the revision 1.1.2.2 of the file example.c: $ rcs -o1.1.2.2 example.c RCS file: RCS/example.c,v deleting revision 1.1.2.2 done
XV.6.2.10 RCS Keywords RCS defines a number of keywords, described in Table XV‑9, you can include in your files. They will be expanded when versions are checked out. Keyword
Meaning
$Author$
Expand to login name of the user who registered the revision
$Date$
Expand to the registration date of the revision
$Header$
Expand to path name of the RCS file, the revision ID, the registration date of the revision and the author
$Revision$
Revision ID
$Source$
Revision path name of the RCS file Table XV‑9 RCS keywords
For example: $ cat kwd Revision=$Revision$ Date=$Date$ $ ci kwd kwd,v > Example of keywords >> . initial revision: 1.1 done $ co kwd kwd,v —> kwd revision 1.1 done $ cat kwd Revision=$Revision: 1.1 $ Date=$date: 2004/08/13 07:22/29 $
References 1. ISO/IEC 9899:1990 2. ISO/IEC 9899/AMD1:1995 3. ISO/IEC 9899:1999 4. ISO/IEC 9899:2011. 5. Kernighan Brian W. and Ritchie Dennis M., C Programming Language, Prentice Hall, 1988 6. Peter Van der Linden, Expert C Programming, Prentice Hall, 1994 7. Samuel P. Harbison III and Guy L. Steele Jr., A Reference Manual, Fifth Edition, Prentice Hall, 2002 8. Aho Alfred V., Sethi Ravi, Ullman Jeffrey, Compilers: Principles, Techniques, and Tools, Addison-Wesley, 1986 9. TIS committee, Tool Interface Standard (TIS), Executable and Linking Format (ELF) Specification, 1995 10. Linker and Libraries Guide, SUN Microsystems, Inc., April 2003 11. Cobb Bradford, Hook Gary, Strauss Christopher, Ambati Ashok, Govindjee Anita, Huang Wayne, Kumar Vandana, AIX Linking and Loading Mechanisms, IBM Corporation, May 2001 12. HP-UX Linker and Libraries User’s Guide, Hewlett-Packard Company, November 1997 13. Drepper Ulrich, How To Write Shared Libraries, Red Hat Inc., January 22, 2005 14. Stallman Richard M., McGrath Roland, Smith Paul D., GNU Make –A Program for Directing Recompilation, Free Software Foundation, Inc., July 2002, April 2006 15. Matzigkeit Gordon, Oliva Alexandre , Tanner Thomas, Vaughan Gary V., GNU Libtool, Free Software Foundation, Inc, April 2003, January 2008 16. OpenExtensions Advanced Application Programming Tools, IBM, 2001 17. Evans, D., Splint Manual, University of Virginia, 2003 18. Stallman Richard, Pesch Roland H., Shobs, Stan, et al., Debugging with GDB, Free Software Foundation, Inc., 2003 19. Fenlason Jay, Stallman Richard M., GNU gprof, The GNU Profiler, Free Software Foundation, Inc., 2000 20. Pesch Roland H., Osier Jeffrey M., Cygnus Support, The GNU Binary Utilities, Free Software Foundation, Inc. 21. Stallman Richard M., the GCC Developer Community, Using the GNU Compiler Collection (GCC), Free Software Foundation, Inc., October 2003
[1] [2] [3] [4]
Such as EMC² VMWARE or Oracle VirtualBox The main() function is the entry point Preprocessor directives are not C statements, that’s why they do not end with a semi-colon. In C99 standard, the main() function must have the int return value.
[5]
Remember that throughout the book, the term shell refers to bash, ksh or a POSIX shell. Under the C shell, the command echo $status is equivalent to echo $? under bash, ksh, and POSIX shell. On UNIX and UNIX-like systems, the C shell is not usually used; users work normally with bash, ksh or a POSIX shell. [6]
A C90, or earlier, compiler can give any return value since the behavior is unspecified in the C standards preceding C99. [7] [8] [9]
A mathematician would say the int type is a subset of the Z set Remember what we said earlier. A block is a group of statements enclosed between curly braces. Sometimes called a bare machine or bare metal
[10]
For this reason, in computing, a byte is often considered synonym for an octet but it is not true. An octet is synonym for a group of 8 bits. [11] There exists another endian representation: mixed-endian (also known as middle-endian). It is not often used. We do not talk about it to ease the discussion. [12] [13]
Constant is a fixed value known before the startup of the program, it cannot be changed. Right aligned means the text is lined up against the right side. Left aligned means the text is lined up against the left
side [14]
American Standard Code for Information Interchange. ASCII character set defines 128 characters represented by seven bits [15]
The code of a character printed depends on the coded character set used by your computer by default. We will talk again about it… [16]
A process is an instance of a running program. See our book The UNIX & Linux Operating Systems: The Tutorial for further details. [17] [18] [19]
The long long type appears in C99. In C90, only char, short, int, and long integers were specified. Located, on the UNIX systems, in the directory /usr/include The C language does not enforce the way to represent floating-point numbers.
[20] [21] [22] [23]
The subscript 2 indicates we are working in base 2. More generally, Xn means the number X is expressed in base n. The type qualifier was introduced in C99 A declaration with an initialization is called definition. Remember that the address of an array is the address of its very element.
[24]
The value of a pointer (address) may fit in an integer type or not. It may fit in an int, long, long long or even something else. So, do not conclude a pointer is always represented by an int only because this is the case on your computer. [25]
For more information, see our book “The UNIX & Linux Operating Systems: The Tutorial”.
[26]
Dereferencing a pointer means accessing the object it points to. The * operator is a dereference pointer and the & operator is the address-of operator sometimes called a reference operator. [27] [28] [29] [30] [31]
Or calloc() or realloc() function. An array is converted to a pointer to its first element when passed to a function. The sizeof operator has precedence over the multiplication operator *. It means each byte can be accessed individually. Pointers to function will be broached later. A function pointer is different from an object pointer.
[32]
Real types = integer types + real floating types. Floating types = real floating types + complex types. Arithmetic types = real types + complex types = integer types + real floating types + complex types.. [33]
Real types = integer types + real floating types.
[34]
Real types = integer types + real floating types. Arithmetic types = integer types + real floating types + complex types. [35] [36]
Scalar types = arithmetic types + pointer types. Scalar types = arithmetic types + pointer types.
[37]
The operand must be a modifiable lvalue. We will talk later in the chapters about lvalues. For now, consider an lvalue is an object: a modifiable lvalue is then an object than can be altered. Variables and pointers not declared with the qualifier const are lvalues. [38] [39] [40] [41] [42]
Real types = integer types + real floating types. The operand must be a modifiable lvalue. The operand must be a modifiable lvalue. It means the value is determined before the startup of the program (during the compilation of the program). Arithmetic operations, relational operations, equality operations, bitwise operations and ternary operations.
[43] [44]
char, signed char, unsigned char, short, signed short and unsigned short This depends on the way an integer number is represented. In our computer, an int is represented by 32 bits.
[45]
For example, the type of the resulting value of the relational operation a > b is int. Both the operands a and b are subject to the usual arithmetic conversions but the resulting value of the relational operation (that is 0 or 1) is of type int. [46] [47] [48]
An expression E is not interpreted in the statement sizeof E unless E is a VLA. An lvalue having static storage duration. We will talk storage duration in Chapter VII Section VII.7 A tag is not a type name. For structures, struct tag is a type specifier. For enumerations, enum tag is a type specifier.
[49]
On UNIX systems and UNIX-based systems (Linux, BSD systems), C standard header files are located in the directory /usr/include. On Microsoft Windows and other operating systems, it depends on the compiler software. [50] [51] [52]
A function body is a block. The C standard does not use the word global but file scope instead. As of C99, undeclared functions cannot be called. Until C95, functions could be called without being declared at
all. [53]
The gcc compiler generates no warnings if the conditions are not me (the feature is ignored). Microsoft Visual Studio does not accept this feature. [54] [55] [56] [57]
In some systems, the program name may not available. In such a case, argv[0] holds the null string (”\0”).. C89/C90 and C94/C95 accept such a program enough though not recommended. This is not tolerated as of C99. On UNIX and UNIX-like systems, they are usually located in the /usr/include and /usr/include/sys directories Outside functions, there can be only declarations. Statements are not allowed outside functions.
[58]
This is required as of C99. Pre-ANSI C and standards C90 (also known as C89, ISO C or ANSI C), and C94 (also called C95) recommend it but do not demand it. [59]
Remember the scope of an identifier determines the places where an identifier is visible
[60]
Storage–class specifiers: static, extern, auto, and register. The specifiers auto and register cannot be used for a function. For a function, only static or extern can be used. [61] [62] [63]
Such identifiers have external linkage: the same identifier refer to the same object in the entire program. Such identifiers have internal linkage: the same identifier refer to a unique object in the file in which it is defined. Parameters of a function in a declaration that is not a definition have function prototype scope.
[64]
A translation unit is produced by the C preprocessor. For us, throughout the book, translation units or source files are equivalent. [65]
A simple declaration of an object introduces an identifier with its type without allocating storage for it. A definition is declaration that allocates storage.
[66] [67] [68] [69]
Identifiers with file scope, external identifiers or global identifiers have the same meaning. However, if a declaration with the extern keyword has an initializer, it is a definition that creates the object. A declaration with extern and an initializer is a real external definition as if the storage-specifier was omitted. An object with external linkage is an object with file scope declared without the storage-class specifier static.
[70]
A tentative definition is a declaration of a global object with no storage-class specifier or with the storage-class static, and with no initializer. [71]
The compiler generates object files from translation units spawn by the preprocessor from source files. In translation units, there is no directives such as #include, #define… [72]
An external declaration is just a declaration of an entity that is not in the body of a function (file scope). An external definition is an external declaration that is also a definition. [73]
In C, functions cannot be declared within another function. Therefore, an identifier of a function has file scope.
[74]
If the definition of a structure (or union) is not visible, you cannot declare an object of that type. As already explained, an object can be created only if its size is known. If the type is incomplete, the compiler cannot guess its size and then the object cannot be created. [75]
Pointers to incomplete types are allowed because the size of a pointer is known by the compiler. A pointer does not represent an object but it points to an object. A pointer is an object on its own holding the address of the referenced object. [76]
UTF means Unicode Transformation Format. It maps a code point to a bit sequence (encoding).
[77]
If you work with Microsoft DOS or Powershell, the code page (character encoding) can be changed, if required, with the command chcp in order to interpret the characters output by the programs. [78] [79] [80] [81]
The environment in which the program is written. Runtime environment: the system running the executable. JIS encoding, used by the Japanese language, is not a Unicode encoding. Nowadays, Unicode is preferred. A basic character always fits in one byte (char) whatever the locale and implementation used.
[82]
Not all UCN can be used to name identifiers; C99 lists in annex D the ranges of code points that can be used in identifiers.
[83] As of C99, the keyword restrict is used but this does not change the behavior of the function. It just indicates overlapping pointers should not be used with the function. [84] [85] [86]
The function strcoll() is slower than strcmp() and strncmp() A computer has many devices: keyboard, monitors, hard drives, network cards, tape drives… Macros should not have arguments with side effects.
[87]
Whitespace characters: space (‘ ‘), vertical tab (‘\v’), horizontal tab (‘\t’), form feed (‘\f’), newline (‘\n’), and carriage return (‘\r’). [88]
Whitespace characters: space (‘ ‘), vertical tab (‘\v’), horizontal tab (‘\t’), carriage return (‘\r’), newline (‘\n’), and carriage return (‘\r’). [89]
Blanks are combinations of spaces, newlines, and tabs
[90]
However, arguments are evaluated as any arguments. For example, in the call printf(“%d\n”, x, i++), the argument i++ will be evaluated but ignored by the function. [91]
For example, using SEEK_END in a call to fseek() with a binary file has an undefined behavior if the file has a trailing null character or has state-dependant encoding that does not end in the initial shift state. [92]
Remember that several streams may be associated with the same file.
[93]
You can view it as a device composed of a keyboard and a monitor used to transmit data to the computer and display data produced by the computer. [94]
Each file has its own encoding, to interpret its contents properly, you have to set the right locale using the appropriate encoding. [95]
For small arrays, of course, you can still work with subscripts of type int.
[96]
Whitespaces are space (‘ ‘), horizontal tab (‘\t’), vertical tab (‘\v’), newline (‘\n’), form-feed (‘\f’) and carriagereturn (‘\r’). [97]
Provided the same object of type jmp_buf is visible within the portions of the program in which setjmp() and lonjmp() are called. [98]
Such as Microsoft Visual Studio (On Microsoft Windows only), Oracle Solaris Studio (on Oracle Solaris only), Anjuta DevStudio (Linux), Eclipse (Microsoft Windows, Linux, MacOS X), NetBeans (Microsoft Windows, Linux, MacOS X, Solaris…), MonoDevelop (Microsoft Windows, Linux, MacOS X)… [99]
See our book The UNIX & Linux Operating Systems: The Tutorial, chapter 9 Memory Management.
[100] [101]
For more information, refer to our book “UNIX & Linux Shell Scripting: The Tutorial”. The text displayed by the command what ends when one of the following characters is encountered: >, newline, \
and “