359 31 13MB
English Pages 482 [476] Year 2023
Undergraduate Topics in Computer Science
Karsten Lehn Merijam Gotzes Frank Klawonn
Introduction to Computer Graphics Using OpenGL and Java Third Edition
Undergraduate Topics in Computer Science Series Editor Ian Mackie, University of Sussex, Brighton, UK Advisory Editors Samson Abramsky , Department of Computer Science, University of Oxford, Oxford, UK Chris Hankin , Department of Computing, Imperial College London, London, UK Mike Hinchey , Lero – The Irish Software Research Centre, University of Limerick, Limerick, Ireland Dexter C. Kozen, Department of Computer Science, Cornell University, Ithaca, NY, USA Andrew Pitts , Department of Computer Science and Technology, University of Cambridge, Cambridge, UK Hanne Riis Nielson , Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kongens Lyngby, Denmark Steven S. Skiena, Department of Computer Science, Stony Brook University, Stony Brook, NY, USA Iain Stewart , Department of Computer Science, Durham University, Durham, UK Joseph Migga Kizza, College of Engineering and Computer Science, The University of Tennessee-Chattanooga, Chattanooga, TN, USA
‘Undergraduate Topics in Computer Science’ (UTiCS) delivers high-quality instructional content for undergraduates studying in all areas of computing and information science. From core foundational and theoretical material to final-year topics and applications, UTiCS books take a fresh, concise, and modern approach and are ideal for self-study or for a one- or two-semester course. The texts are all authored by established experts in their fields, reviewed by an international advisory board, and contain numerous examples and problems, many of which include fully worked solutions. The UTiCS concept relies on high-quality, concise books in softback format, and generally a maximum of 275–300 pages. For undergraduate textbooks that are likely to be longer, more expository, Springer continues to offer the highly regarded Texts in Computer Science series, to which we refer potential authors.
Karsten Lehn · Merijam Gotzes · Frank Klawonn
Introduction to Computer Graphics Using OpenGL and Java Third Edition
Karsten Lehn Faculty of Information Technology Fachhochschule Dortmund, University of Applied Sciences and Arts Dortmund, Germany
Merijam Gotzes Hamm-Lippstadt University of Applied Sciences Hamm, Germany
Frank Klawonn Data Analysis and Pattern Recognition Laboratory Ostfalia University of Applied Sciences Braunschweig, Germany
ISSN 1863-7310 ISSN 2197-1781 (electronic) Undergraduate Topics in Computer Science ISBN 978-3-031-28134-1 ISBN 978-3-031-28135-8 (eBook) https://doi.org/10.1007/978-3-031-28135-8 Translation from the German language edition ‘Grundkurs Computergrafik mit Java’ by Frank Klawonn, Karsten Lehn, Merijam Gotzes © Springer Fachmedien Wiesbaden 2022. Published by Springer Vieweg Wiesbaden. All Rights Reserved. 1st & 2nd editions: © Springer-Verlag London Limited 2008, 2012 3rd edition: © Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface to the Third Edition
Since the publication of the first edition of this book, computer graphics has evolved considerably. More applications for diverse fields of application have emerged in which computer graphics are used. This is true for mobile devices, such as smartphones, as well as for classic desktop computers and notebooks. Computer graphics have always been of great importance in the computer games and film industries. These areas of application have grown significantly in recent years. In many other areas, such as medicine, medical technology or education, computer-generated graphics have become established. This is particularly evident in the emerging fields of augmented reality (AR) and virtual reality (VR), which are based on computer graphics. Hand in hand with the wider dissemination and use of computer graphics, the performance of modern graphics processors has been significantly and steadily increased. In parallel to this, the software interfaces to the graphics processors and the available software tools for the creation of computer graphics have been significantly further developed. An essential development is the tendency to have more and more computing tasks performed by graphics processors in order to relieve the central processing unit for other tasks. This requires methods for programming the graphics pipeline of the graphics processors flexibly and efficiently. Because of this need, programs that can be executed directly on the graphics card—the so-called shaders—have gained immense importance. The use of shaders has increased accordingly and the shaders themselves have become more and more powerful, so that in the meantime almost any computation, even independent of graphics processing, can be performed by graphics processors using the compute shader. Until the third edition, this book contained examples of the Java 2D and Java 3D graphics systems. Although it is still possible to learn many principles of computer graphics using Java 2D and Java 3D, it seems timely, given the above background, to also introduce the basics of shader programming. Since Java 2D and Java 3D do not allow this, this edition contains only examples for the graphics programming interface Open Graphics Library (OpenGL).
v
vi
Preface to the Third Edition
The OpenGL is a graphics programming interface that has become very widespread over the last decades due to its open concept and platform independence. It is now supported by the drivers of all major graphics processors and graphics cards for the major operating systems. The web variant WebGL (Web Graphics Library) is now supported by all major web browsers, enabling shaderbased graphics programming for web applications. Despite the very widespread use of the OpenGL, it serves in this book only as an example of the concepts of modern graphics programming that can also be found in other programming interfaces, such as Direct3D, Metal or Vulkan. Vulkan is a graphics programming interface which, like the OpenGL, was specified by the Khronos Group and was regarded as the successor to the OpenGL. One objective in the development of Vulkan was to reduce the complexity of the drivers that have to be supplied with the graphics hardware by its manufacturer. Likewise, more and detailed control over the processing in the graphics pipeline should be made possible. This led to the development of Vulkan, a programming interface that has a lower level of abstraction than the OpenGL and thus leads to a higher programming effort for the graphics application and to a higher effort for initial training. In the meantime, it has become apparent that the OpenGL will coexist with Vulkan for even longer than originally thought. The Khronos Group continues to further develop the OpenGL, and leading GPU manufacturers announced they will continue the support for the OpenGL. In 2011, the Khronos Group developed the first specification of WebGL, which enables hardware-supported and shader-based graphics programming for web applications. Currently, WebGL 2.0 is supported by all major web browsers. Its successor, WebGPU, is currently under development. In addition, the graphics programming interface OpenGL has been successfully used in teaching at many colleges and universities for years. Because of this widespread use of the OpenGL and its application in industry and education, the OpenGL was chosen for this book for a basic introduction to modern graphics programming using shaders. Java is one of the most popular and well-liked object-oriented high-level programming languages today. It is very widespread in teaching at colleges and universities, as its features make the introduction to programming very easy. Furthermore, it is easy to switch to other programming languages. This is confirmed again and again by feedback from students after a practical semester or from young professionals. Although the OpenGL specification is independent of a programming language, many implementations and development tools exist for the programming languages C and C++. For this reason, many solutions to graphics programming problems, especially in web forums, can be found for these programming languages. The aim of this edition of this book is to combine the advantages of the easyto-learn Java programming language with modern graphics programming using the OpenGL for a simple introduction to graphics programming. The aim is to make it possible to enter this field even with minimal knowledge of Java. Thus, this book can be used for the teaching of computer graphics already early in the studies. For
Preface to the Third Edition
vii
this purpose, the Java binding Java OpenGL (JOGL) was chosen, which is very close in its use to OpenGL bindings to the programming language C. This makes it easy to port solutions implemented in C to Java. The limitation to a single Java binding (JOGL) was deliberate in order to support the learner in concentrating on the core concepts of graphics programming. All in all, the combination of Java and JOGL appears to be an ideal basis for beginners in graphics programming, with the potential to transfer the acquired knowledge to other software development environments or to other graphics programming interfaces with a reasonable amount of effort. With the Java Platform Standard Edition (Java SE), the two-dimensional programming interface Java 2D is still available, which was used for examples in this book until the third edition. Since the concepts of OpenGL differ fundamentally from the Java 2D concepts, all examples in this book are OpenGL examples with the Java binding JOGL. This simplifies learning for the reader and eliminates the need for rethinking between Java 2D, Java 3D or OpenGL. With this decision, the explicit separation in this book between two-dimensional and three-dimensional representations was also removed, resulting in a corresponding reorganisation of the chapter contents. Nevertheless, the first chapters still contain programming examples that can be understood with a two-dimensional imagination. It is only in later chapters that the expansion to more complex three-dimensional scenes takes place. Due to the use of the OpenGL, the contents for this edition had to be fundamentally revised and expanded. In particular, adjustments were made to account for the terminology used in the OpenGL specification and community. This concerns, for example, the representation of the viewing pipeline in the chapter on geometry processing. Likewise, the chapter on rasterisation was significantly expanded. Since the OpenGL is very complex, a new chapter was created with an introduction to the OpenGL, in which very basic OpenGL examples are explained in detail, in order to reduce the undoubtedly high barrier to entry in this type of programming. In particular, there is a detailed explanation of the OpenGL graphics pipelines, the understanding of which is essential for shader programming. Furthermore, a fully revised and expanded chapter explains the basic drawing techniques of the OpenGL, which can be applied to other graphics systems. In the other chapters, the aforementioned adaptations and extensions have been made and simple OpenGL examples for the Java binding JOGL have been added. The proven way of imparting knowledge through the direct connection between theoretical content and practical examples has not changed in this edition. Emphasis has been placed on developing and presenting examples that are as simple and comprehensible as possible in order to enable the reader to create his or her own (complex) graphics applications from this construction kit. Furthermore, this edition follows the minimalist presentation of the course content of the previous editions in order not to overwhelm beginners. On the other hand, care has been taken to achieve as complete an overview as possible of modern graphics programming in order to enable the professional use of the acquired knowledge. Since both
viii
Preface to the Third Edition
goals are contradictory and OpenGL concepts are complex, this was not possible without increasing the number of pages in this book. The theoretical and fundamental contents of the book are still comprehensible even without a complete understanding of the programming examples. With such a reading method, the sections in which “OpenGL” or “JOGL” occurs can be skipped. It is nevertheless advisable to read parts of the introduction chapter to the OpenGL and the chapter on the basic geometric objects and drawing techniques of the OpenGL, since these basics are also used in other graphics systems. In these two chapters, the abstraction of the theoretical contents and thus a strict separation from an application with the OpenGL were dispensed with in favour of a simple introduction to OpenGL programming and a reduction in the number of pages. All these changes and adaptations have been made against the background of many years of experience with lectures, exercises and practical courses in computer graphics with Java 2D, Java 3D and the OpenGL under Java and thus include optimisations based on feedback from many students. Additional material for this edition is also available for download; see https:// link.springer.com. A web reference to supplementary material (supplementary material online) can be found in the footer of the first page of a chapter. The aim of this edition is to provide an easy-to-understand introduction to modern graphics programming with shaders under the very successful Java programming language by continuing the proven didactic concept of the previous editions. We hope that we have succeeded in this and that the book will continue the success of the previous editions. Dortmund, Germany Lippstadt, Germany Wolfenbüttel, Germany July 2022
Karsten Lehn Merijam Gotzes Frank Klawonn
Acknowledgement
Through years of work with our students in lectures, exercises and practical courses, we were able to collect suggestions for improvement and use them profitably for the revision of this textbook. Therefore, we would like to thank all students who have directly or indirectly contributed to this book. Our very special thanks go to Mr. Zellmann, Mr. Stark, Ms. Percan and Mr. Pörschmann, whose constructive feedback contributed decisively to the improvement of the manuscript. Furthermore, we would like to thank Springer International Publishing, who has made a fundamental revision and the participation of two new authors possible with this edition. On a very personal note, we would like to thank our families for their support and understanding during the preparation of the manuscript and the supplementary materials for the book. Dortmund, Germany Lippstadt, Germany Wolfenbüttel, Germany July 2022
Karsten Lehn Merijam Gotzes Frank Klawonn
ix
Contents
1
2
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Application Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 From the Real Scene to the Computer Generated Image . . . . . . 1.3 Rendering and Rendering Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Objectives of This Book and Recommended Reading Order for the Sections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Structure of This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 2 4 6 9 11 12 13
The Open Graphics Library (OpenGL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Graphics Programming Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 General About the OpenGL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 The OpenGL and Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 OpenGL Graphics Pipelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Vertex Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Vertex Post-Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 Primitive Assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.4 Rasterisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.5 Fragment Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.6 Per-Fragment Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.7 Framebuffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Shaders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 OpenGL Programming with JOGL . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 Example of a JOGL Program Without Shaders . . . . . . . . . . . . . . . 2.9 Programming Shaders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9.1 Data Flow in the Programmable Pipeline . . . . . . . . . . . 2.9.2 OpenGL and GLSL Versions . . . . . . . . . . . . . . . . . . . . . . . 2.9.3 OpenGL Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9.4 Functions of the GLSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9.5 Building a GLSL Shader Program . . . . . . . . . . . . . . . . . .
15 15 17 19 19 21 21 23 23 24 26 26 29 29 33 36 40 40 43 43 44 46
xi
xii
Contents
2.10 Example of a JOGL Program Using GLSL Shaders . . . . . . . . . . 2.11 Efficiency of Different Drawing Methods . . . . . . . . . . . . . . . . . . . . . 2.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48 57 59 61
3
Basic Geometric Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Surface Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Basic Geometric Objects in the OpenGL . . . . . . . . . . . . . . . . . . . . . 3.2.1 Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Triangles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.4 Polygon Orientation and Filling . . . . . . . . . . . . . . . . . . . . 3.2.5 Polygons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.6 Quadrilaterals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 OpenGL Drawing Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Indexed Draw . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Triangle Strips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 Primitive Restart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.4 Base Vertex and Instanced Rendering . . . . . . . . . . . . . . . 3.3.5 Indirect Draw . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.6 More Drawing Commands and Example Project . . . . 3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63 63 67 67 71 73 75 77 79 81 82 85 87 89 89 92 92 94
4
Modelling Three-Dimensional Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 From the Real World to the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Three-Dimensional Objects and Their Surfaces . . . . . . . . . . . . . . . 4.3 Modelling Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Modelling the Surface of a Cube in the OpenGL . . . . . . . . . . . . . 4.5 Surfaces as Functions in Two Variables . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Representation of Landscapes . . . . . . . . . . . . . . . . . . . . . . 4.6 Parametric Curves and Freeform Surfaces . . . . . . . . . . . . . . . . . . . . 4.6.1 Parametric Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.2 Efficient Computation of Polynomials . . . . . . . . . . . . . . 4.6.3 Freeform Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Normal Vectors for Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
95 95 96 99 104 111 114 115 116 121 122 124 126 128
5
Geometry Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Geometric Transformations in 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Homogeneous Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 Applications of Transformations . . . . . . . . . . . . . . . . . . . . 5.1.3 Animation and Movements Using Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.4 Interpolators for Continuous Changes . . . . . . . . . . . . . . .
129 129 133 136 138 140
Contents
xiii
5.2
142 143 144 144
Geometrical Transformations in 3D . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Translations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Scalings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Rotations Around x-, y- and z-Axis . . . . . . . . . . . . . . . . 5.2.4 Calculation of a Transformation Matrix with a Linear System of Equations . . . . . . . . . . . . . . . . . 5.3 Switch Between Two Coordinate Systems . . . . . . . . . . . . . . . . . . . . 5.4 Scene Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Animation and Movement . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.3 Matrix Stacks and Their Application in the OpenGL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Arbitrary Rotations in 3D: Euler Angles, Gimbal Lock, and Quaternions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.1 Rotation Around Any Axis . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Eulerian Angles and Gimbal Lock . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.1 Quaternions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Clipping Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Orthogonal and Perspective Projections . . . . . . . . . . . . . . . . . . . . . . 5.9 Perspective Projection and Clipping Volume in the OpenGL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.10 Viewing Pipeline: Coordinate System Change of the Graphical Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.11 Transformations of the Normal Vectors . . . . . . . . . . . . . . . . . . . . . . . 5.12 Transformations of the Viewing Pipeline in the OpenGL . . . . . . 5.13 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
180 186 187 189 192
6
Greyscale and Colour Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Greyscale Representation and Intensities . . . . . . . . . . . . . . . . . . . . . 6.2 Colour Models and Colour Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Colours in the OpenGL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Colour Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
193 193 196 204 205 208 210
7
Rasterisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Vector Graphics and Raster Graphics . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Rasterisation in the Graphics Pipeline and Fragments . . . . . . . . . 7.3 Rasterisation of Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 Lines and Raster Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.2 Midpoint Algorithm for Lines According to Bresenham . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.3 Structural Algorithm for Lines According to Brons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
211 211 214 216 216
145 146 149 149 152 154 155 156 157 161 165 168 174
218 225
xiv
Contents
7.3.4 Midpoint Algorithm for Circles . . . . . . . . . . . . . . . . . . . . . 7.3.5 Drawing Arbitrary Curves . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Parameters for Drawing Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 Fragment Density and Line Style . . . . . . . . . . . . . . . . . . . 7.4.2 Line Styles in the OpenGL . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.3 Drawing Thick Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.4 Line Thickness in the OpenGL . . . . . . . . . . . . . . . . . . . . . 7.5 Rasterisation and Filling of Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 Odd Parity Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.2 Scan Line Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.3 Polygon Rasterisation Algorithm According to Pineda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.4 Interpolation of Associated Data . . . . . . . . . . . . . . . . . . . . 7.5.5 Rasterising and Filling Polygons in the OpenGL . . . . 7.6 Aliasing Effect and Antialiasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.1 Examples of the Aliasing Effect . . . . . . . . . . . . . . . . . . . . 7.6.2 Antialiasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.3 Pre-Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.4 Pre-Filtering in the OpenGL . . . . . . . . . . . . . . . . . . . . . . . . 7.6.5 Post-Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.6 Post-Filtering Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.7 Sample Arrangements for Post-Filtering . . . . . . . . . . . . 7.6.8 Post-Filtering in the OpenGL . . . . . . . . . . . . . . . . . . . . . . . 7.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
228 232 233 233 236 238 240 241 242 243 245 250 253 254 256 259 261 264 269 272 276 278 283 284
8
Visibility Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Line Clipping in 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.1 Cohen–Sutherland Clipping Algorithmus . . . . . . . . . . . 8.1.2 Cyrus–Beck Clipping Algorithmus . . . . . . . . . . . . . . . . . 8.2 Image-Space and Object-Space Methods . . . . . . . . . . . . . . . . . . . . . 8.2.1 Backface Culling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.2 Partitioning Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.3 The Depth Buffer Algorithm . . . . . . . . . . . . . . . . . . . . . . . 8.2.4 Scan-Line Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.5 Priority Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
287 287 289 291 294 295 296 297 300 301 304 304
9
Lighting Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 Light Sources of Local Illumination . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Reflections by Phong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 The Lighting Model According to Phong in the OpenGL . . . . . 9.4 Shading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5 Shading in the OpenGL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6 Shadows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
305 306 309 320 334 343 344
Contents
xv
9.7 Opacity and Transparency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.8 Radiosity Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.9 Raycasting and Raytracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
346 348 352 356 357
10 Textures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Texturing Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.1 Mipmap and Level of Detail: Variety in Miniature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.2 Applications of Textures: Approximation of Light, Reflection, Shadow, Opacity and Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Textures in the OpenGL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
359 359
11 Special Effects and Virtual Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Factors for Good Virtual Reality Applications . . . . . . . . . . . . . . . . 11.2 Fog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Fog in the OpenGL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 Particle Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5 A Particle System in the OpenGL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.6 Dynamic Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.7 Interaction and Object Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.8 Object Selection in the OpenGL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.9 Collision Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.10 Collision Detection in the OpenGL . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.11 Auralisation of Acoustic Scenes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.11.1 Acoustic Scenes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.11.2 Localisability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.11.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.11.4 Reproduction Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.11.5 Ambisonics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.11.6 Interfaces and Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.12 Spatial Vision and Stereoscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.12.1 Perceptual Aspects of Spatial Vision . . . . . . . . . . . . . . . . 11.12.2 Stereoscopy Output Techniques . . . . . . . . . . . . . . . . . . . . . 11.12.3 Stereoscopic Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.12.4 Stereoscopy in the OpenGL . . . . . . . . . . . . . . . . . . . . . . . . 11.13 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
389 389 391 393 398 400 405 407 413 419 421 424 424 426 427 429 430 431 432 432 435 440 443 454 455
367
369 379 387 388
Appendix A: Web References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
1
Introduction
Computer graphics is the process of creating images using a computer. This process is often referred to as graphical data processing. In this book, an image is understood in an abstract sense. An image not only can represent a realistic scene from the everyday world, but it can also be graphics such as histograms or pie charts, or the graphical user interface of the software. In the following section, some application fields of computer graphics are presented as examples to give an impression of the range of tasks in this discipline. This is followed by explanations of the main steps in computer graphics and an overview of how a rendering pipeline works using the graphics pipeline of the Open Graphics Library (OpenGL). Computer graphics belongs to the field of visual computing. Visual computing, also known as image informatics, deals with both image analysis (acquisition, processing and analysis of image data) and image synthesis (production of images from data). Visual computing is an amalgamation of individual merging fields such as image processing, computer graphics, computer vision, human–machine interaction and machine learning. Computer graphics is an essential part of image synthesis, just as image processing is an essential part of image analysis. Therefore, in basic introductions to visual computing, the two disciplines of computer graphics and image processing are taught together. This book also integrates solutions to image processing problems, such as the reduction of aliasing in rasterisation or the representation of object surfaces using textures. Further links to computer graphics exist with neighbouring disciplines such as computer-aided design/manufacturing (CAD/CAM), information visualisation, scientific visualisation and augmented reality (AR) and virtual reality (VR) (see Chap. 11).
© Springer Nature Switzerland AG 2023 K. Lehn et al., Introduction to Computer Graphics, Undergraduate Topics in Computer Science, https://doi.org/10.1007/978-3-031-28135-8_1
1
2
1
Introduction
1.1 Application Fields Although graphical user interfaces (GUIs) are fundamentally an application area of computer graphics, the basics of computer graphics now play a rather subordinate role in this field. On the one hand, there are standardised tools for the design and programming of graphical user interfaces that abstract the technical processes, and on the other hand, the focus is primarily on usability and user experience and thus in the area of human–computer interaction. In advertising and certain art forms, many images are now produced entirely by computer. Artists rework or even alienate photographs using computer graphics techniques. Besides the generation and display of these rather abstract graphics, the main application of computer graphics is in the field of displaying realistic—not necessarily real—images and image sequences. One such application is animated films, which are created entirely using computer graphics methods. The realistic rendering of water, hair or fur is a challenge that can be achieved already in a reasonable amount of time using sophisticated rendering systems. For the very large amounts of data that are recorded in many areas of industry, business and science, not only suitable methods for automatic analysis and evaluation are required but also techniques for the visualisation of graphic presentations. The field of interactive analysis using visualisations is called visual analytics, which belongs to the field of information visualisation. These visualisations go far beyond simple representations of function graphs, histograms or pie charts, which are already accomplished by spreadsheet programs today. Such simple types of visualisations are assigned to data visualisation. This involves the pure visualisation of raw data without taking into account an additional relationship between the data. Data becomes information only through a relationship. Information visualisations include two- or three-dimensional visualisations of high-dimensional data, problem-adapted representations of the data [3,7,12,13] or special animations that show temporal progressions of, for example, currents or weather phenomena. The interactive visualisation of this information and the subsequent analysis by the user generates knowledge. The last step is assigned to visual analytics. Artificial intelligence techniques can be applied in this context as well. Classic applications come from the fields of computer-aided design (CAD) and computer-aided manufacturing (CAM), which involve the design and construction of objects such as vehicles or housings. The objects to be represented are designed on the computer, just as in computer games or architectural design programs for visualising real or planned buildings and landscapes. In these cases, the objects to be designed, for example an extension to a house, are combined with already existing objects in the computer. In the case of driving or flight simulators, real cities or landscapes have to be modelled with the computer. Not only does the ability to model and visualise objects plays an important role in computer graphics, but also the generation of realistic representations from measured data. To generate such data, scanners are used that measure the depth information of the surfaces of objects. Several calibrated cameras can also be used for this purpose,
1.1 Application Fields
3
from which three-dimensional information can be reconstructed. A very important field of application for these technologies is medical informatics [6]. In this area, for example, three-dimensional visualisations of organs or bone structures are generated, which are based on data that was partly obtained by various imaging procedures, such as ultrasound, computer tomography and magnetic resonance imaging. The coupling of real data and images with computer graphics techniques is expected to increase in the future. Computer games allow navigating through scenes and viewing a scene from different angles. Furthermore, 360-degree videos allow for an all-around view of a scene. Such videos can be recorded using special camera arrangements, such as omnidirectional cameras. These are spherical objects equipped with several cameras that take pictures in different directions. In most films in the entertainment industry, the viewer can only watch a scene from a fixed angle. The basic techniques for self-selection of the viewer position are already in place [4]. However, this requires scenes to be recorded simultaneously from multiple perspectives and intelligent playback devices. In this case, the viewer must not be bound to the positions of the cameras, but can move to any viewpoint. The corresponding representation of the scene is calculated from the information provided by the individual cameras. For this purpose, techniques of computer graphics, which serve the synthesis of images, have to be combined with procedures from image processing, which deal with the analysis of the recorded images [5]. Other important fields of application of computer graphics are virtual reality (VR) and augmented reality (AR). In virtual reality, the aim is to create an immersion for the user through the appropriate stimulation of as many senses as possible, which leads to the perception of really being in the artificially generated world. This desired psychological state is called presence. This results in a wide range of possible applications, of which only a few are mentioned as examples here. Virtual realities can be used in architectural and interior simulations, for telepresence or learning applications in a wide variety of fields. Psychotherapeutic applications to control phobia (confrontation therapy) or applications to create empathy by a user putting himself in the situation of another person are also possible and promising applications. Through the use of augmented reality technologies, real perception is enriched by additional information. Here, too, there are many possible applications, of which only a few are mentioned. Before a purchase, products can be selected that fit perfectly into their intended environment, for example, the new chest of drawers in the living room. Likewise, clothes can be adapted to individual styles in augmented reality. Learning applications, for example a virtual city or museum guide, are just as possible as assembly aids in factories or additional information for the maintenance of complex installations or devices, for example elevators or vehicles. Conferences can be supplemented by virtual avatars of participants in remote locations. In medicine, for the planning of an operation or during an operation, images can be projected precisely onto the body to be operated on as supplementary information, which was or is generated before or during the operation by imaging techniques. As these examples show, computer graphics applications can be distinguished according to whether interaction with the application is required or not. For the production of advertising images or classic animated films, for example, no interaction
4
1
Introduction
is intended, so elaborate computer graphics processes can be used for the most realistic representation possible. In the film industry, entire computer farms are used to calculate realistic computer graphics for a feature-length animated film, sometimes over a period of months. However, many computer graphics applications are interactive, such as applications from the fields of virtual or augmented reality. This results in requirements for the response time to generate a computer graphic. For flight or vehicle simulators, which are used for the training of pilots or train drivers, these response times must be strictly adhered to in order to be able to represent scenarios that create a realistic behaviour of the vehicle. Therefore, in computer graphics it is important to have methods that generate realistic representations as efficiently as possible and do not necessarily calculate all the details of a scene. For professional flight simulators or virtual reality installations, high-performance computer clusters or entire computer farms are potentially available. However, this is not the case for computer games or augmented reality applications for portable devices, even though in these cases the available computing power is constantly growing. For this reason, simple, efficient models and methods of computer graphics, which have been known for a long time, are in use today to create interactive computer graphics applications.
1.2 From the Real Scene to the Computer Generated Image From the example applications presented in Sect. 1.1, the different tasks to be solved in computer graphics can be derived. Figure 1.1 shows the rough steps needed to get from a real or virtual scene to a perspective image. First, the objects of the scene must be replicated as computer models. This is usually done with a modelling tool. This replication is generally only an approximation of the objects in the scene. Depending on the effort to be expended and the modelling types available, the objects can be approximated more or less well. This is illustrated in Fig. 1.1a, where a house was modelled by two simple geometric shapes. The deciduous trees consist of a cylinder and a sphere. The coniferous trees are composed of a cylinder and three intersecting cones. One of the first steps in graphics processing is to determine the viewer location, also known as the camera position or eye point. Figure 1.1b shows the modelled scene as seen by a viewer standing at some distance in front of the house. The modelled objects usually cover a much larger area than what is visible to the viewer. For example, several building complexes with surrounding gardens could be modelled, through which the viewer can move. If the viewer is standing in a specific room of the building, or is at a specific location in the scene looking in a specific direction, then most of the modelled objects are not in the viewer’s field of view and can be disregarded in the rendering of the scene. The example scene in Fig. 1.1 consists of a house and four trees, which should not be visible at the same time. This is represented in Fig. 1.1c by the three-dimensional area marked by lines. If a perspective projection is assumed, then this area has the shape of a frustum (truncated
1.2 From the Real Scene to the Computer Generated Image
5
(a)
(b)
(c)
(d)
(e)
(f)
(g)
Fig. 1.1 Main steps of computer graphics to get from a real scene to a computer generated perspective image
6
1
Introduction
pyramid) with a rectangular base. The (invisible) top of the pyramid is the viewer’s location. The smaller boundary surface at the front (the clipped top of the pyramid) is called the near clipping plane (near plane). The area between the viewer and the near clipping plane is not visible to the viewer in a computer generated scene. The larger boundary surface at the rear (the base surface of the pyramid) is called far clipping plane (far plane). The area behind the far plane is also not visible to the viewer. The other four boundary surfaces limit the visible area upwards, downwards and to the sides. Figure 1.1d shows a view of the scene from above, revealing the positions of the objects inside and outside the frustum. The process of determining which objects are within the frustum and which are not is called (three-dimensional) clipping. This is of great interest in computer graphics in order to avoid unnecessary computing time for parts of the scene that are not visible. However, an object that lies in this perceptual area is not necessarily visible to the viewer, as it may be obscured by other objects in the perceptual area. The process of determining which parts of objects are hidden is called hidden face culling. Clipping and culling are processes to determine the visibility of the objects in a scene. It is relatively easy to remove objects from the scene that are completely outside of the frustum, such as the deciduous tree behind the far clipping plane and the conifer to the right of the frustum (see Fig. 1.1c–d). Clipping objects that are only partially in the clipping area, such as the conifer on the left in the illustration, is more difficult. This may require splitting geometrical objects or polygons and closing the resulting shape. Figure 1.1e–f shows the scene after clipping. Only the objects and object parts that are perceptible to the viewer and thus lie within the clipping area need to be processed further. The objects that lie within this area must then be projected onto a two-dimensional surface, resulting in an image that can be displayed on the monitor or printer as shown in Fig. 1.1g. During or after this projection, visibility considerations must be made to determine which objects or which parts of objects are visible or obscured by other objects. Furthermore, the illumination and light effects play an essential role in this process for the representation of visible objects. In addition, further processing steps, such as rasterisation, are required to produce an efficiently rendered image of pixels for displaying on an output device, such as a smartphone screen or a projector.
1.3 Rendering and Rendering Pipeline The entire process of generating a two-dimensional image from a three-dimensional scene is called rendering. The connection in the series of the individual processing units roughly described in Fig. 1.1 from part (b) onwards is called computer graphics pipeline, graphics pipeline or rendering pipeline. The processing of data through a pipeline takes place with the help of successively arranged stages in which individual (complex) commands are executed. This type of processing has the advantage that new data can already be processed in the first stage
1.3 Rendering and Rendering Pipeline
7
Fig.1.2 Abstracted rendering pipeline of the Open Graphics Library (OpenGL). The basic structure is similar to the graphics pipeline of other graphics systems
of the pipeline as soon as the old data has been passed on to the second stage. There is no need to wait for the complete computation of the data before new data can be processed, resulting in a strong parallelisation of data processing. Data in graphical applications is particularly well suited for pipeline processing due to its nature and the processing steps to be applied to it. In this case, parallelisation and thus a high processing speed can easily be achieved. In addition, special graphics hardware can accelerate the execution of certain processing steps. Figure 1.2 shows an abstract representation of the graphics pipeline of the Open Graphics Library (OpenGL) on the right. Since the stages shown in the figure are also used in this or similar form in other graphical systems, this pipeline serves to explain the basic structure of such systems. More details on OpenGL and the OpenGL graphics pipeline can be found in Chap. 2. On the left side in the figure is the graphics application, which is usually controlled by the main processor, the central processing unit (CPU), of the computer. This application sends the three-dimensional scene (3D scene) consisting of modelled objects to the graphics processing unit (GPU). Graphics processors are integrated
8
1
Introduction
into modern computer systems in different ways. A graphics processor can be located in the same housing as the CPU, in a separate microchip on the motherboard or on a separate graphics card plugged into the computer. Graphics cards or computers can also be equipped with several graphics processors. In this book, the word graphics processor or the abbreviation GPU is used to represent all these variants. In most cases, the surfaces of the individual (3D) objects in a 3D scene are modelled using polygons (for example, triangles) represented by vertices. A vertex (plural: vertices) is a corner of a polygon. In addition, in computer graphics it is a data structure that stores, for example, the position coordinates in 3D-space and colour values (or other data) at the respective corner of the polygon (see Sect. 2.5.1). A set of vertices (vertex data) is used to transfer the 3D scene from the application to the GPU. Furthermore, additional information must be passed on how the surfaces of the objects are to be drawn from this data. In Chap. 3, the basic geometric objects available in OpenGL and their application for drawing objects or object parts are explained. Chapter 4 contains basics for modelling three-dimensional objects. Furthermore, the graphics application transmits the position of the viewer, also called camera position or viewer location, (mostly) in the form of matrices for geometric transformations to the GPU. With this information, vertex processing takes place on the GPU, which consists of the essential step of geometry processing. This stage calculates the changed position coordinates of the vertices due to geometric transformations, for example, due to the change of the position of objects in the scene or a changed camera position. As part of these transformations, a central projection (also called perspective projection) or a parallel projection can be applied to obtain the spatial impression of the 3D scene. This makes the scene mapping onto a two-dimensional image. Details on geometry processing can be found in Chap. 5. Furthermore, in vertex processing, an appropriate tessellation (see Sect. 4.2) can be used to refine or coarsen the geometry into suitable polygons (mostly triangles). After vertex processing (more precisely in vertex post-processing; see Sect. 2.5.2), the scene is reduced to the area visible to the viewer, which is called clipping (see Sect. 8.1). Until after vertex processing, the 3D scene exists as a vector graphic due to the representation of the objects by vertices (see Sect. 7.1). Rasterisation (also called scan conversion) converts this representation into a raster graphic.1 Rasterisation algorithms are presented in Chap. 7. The chapter also contains explanations of the most undesirable aliasing effect and ways to reduce it. The conversion into raster graphics usually takes place for all polygons in the visible frustum of the scene. Since at this point in the graphics pipeline it has not yet been taken into account which objects are transparent or hidden by other objects, the individual image points of the rasterised polygons are called fragments. The rasterisation stage thus provides a set of fragments.
1 The primitive assembly step executed in the OpenGL graphics pipeline before rasterisation is described in more detail in Sect. 2.5.4.
1.4 Objectives of This Book and Recommended Reading Order for the Sections
9
Within the fragment processing, the illumination calculation usually takes place. Taking into account the lighting data, such as the type and position of the light sources and material properties of the objects, a colour value is determined for each of the fragments by special algorithms (see Chap. 9). To speed up the graphics processing or to achieve certain effects, textures can be applied to the objects. Often, the objects are covered by two-dimensional images (two-dimensional textures) to achieve a realistic representation of the scene. However, three-dimensional textures and geometry-changing textures are also possible (see Chap. 10). The calculation of illumination during fragment processing for each fragment is a typical procedure in modern graphics pipelines. In principle, it is possible to perform the illumination calculation before rasterisation for the vertices instead of for the fragments. This approach can be found, for example, in the OpenGL fixedfunction pipeline (see also Sect. 2.5). Within the fragment processing, the combination of the fragments into pixels also takes place through visibility considerations (see Sect. 8.1). This calculation determines the final colour values of the pixels to be displayed on the screen, especially taking into account the mutual occlusion of the objects and transparent objects. The pixels with final colour values are written into the frame buffer. In the OpenGL graphics pipeline, the polygon meshes after vertex processing or completely rendered images (as pixel data) can be returned to the graphics application. This enables further (iterative) calculations and thus the realisation of complex graphics applications. The abstract OpenGL graphics pipeline shown in Fig. 1.2 is representative of the structure of a typical graphics pipeline, which can deviate from a concrete system. In addition, it is common in modern systems to execute as many parts as possible in a flexible and programmable way. In a programmable pipeline, programs—so-called shaders—are loaded and executed onto the GPU. Explanations on shaders are in Sects. 2.6, 2.9 and 2.10.
1.4 Objectives of This Book and Recommended Reading Order for the Sections This book aims to teach the basics of computer graphics, supported by practical examples from a modern graphics programming environment. The programming of the rendering pipeline with the help of shaders will play a decisive role. To achieve easy access to graphics programming, the graphics programming interface Open Graphics Library (OpenGL) with the Java binding Java OpenGL (JOGL) to the Java programming language was selected (see Sect. 2.1). OpenGL is a graphics programming interface that evolved greatly over the past decades due to its platform independence, the availability of increasingly powerful and cost-effective graphics hardware and its widespread use. In order to adapt this programming interface to the needs of the users, ever larger parts of the rendering pipeline became flexibly programmable directly on the GPU. On the other hand, there
10
1
Introduction
was and still is the need to ensure that old graphics applications remain compatible with new versions when extending OpenGL. Reconciling these two opposing design goals is not always easy and usually leads to compromises. In addition, there are a large number of extensions to the OpenGL command and feature set, some of which are specific to certain manufacturers of graphics hardware. Against this background, a very powerful and extensive graphics programming interface has emerged with OpenGL, which is widely used and supported by drivers of all common graphics processors for the major operating systems. Due to this complexity, familiarisation with OpenGL is not always easy for a novice in graphics programming and it requires a certain amount of time. The quick achievement of seemingly simple objectives may fail to materialise as the solution is far more complex than anticipated. This is quite normal and should be met with a healthy amount of perseverance. Different readers have different prior knowledge, different interests and are different types of learners. This book is written for a reader2 with no prior knowledge of computer graphics or OpenGL, who first wants to understand the minimal theoretical basics of OpenGL before starting practical graphics programming and deepening the knowledge of computer graphics. This type of reader is advised to read this book in the order of the chapters and sections, skipping over sections from Chap. 2 that seem too theoretical if necessary. Later, when more practical computer graphics experience is available, this chapter can serve as a reference section. The practically oriented reader who wants to learn by programming examples is advised to read this chapter (Chapter 1). For the first steps with the OpenGL fixedfunction pipeline, Sect. 2.7 can be read and the example in Sect. 2.8 can be used. The corresponding source code is available in the supplementary material to the online version of Chap. 2. Building on the understanding of the fixed-function pipeline, it is useful to read Sect. 2.9 and the explanations of the example in Sect. 2.10 for an introduction to the programmable pipeline. The source code of this example is available in the supplementary material to the online version of Chap. 2. Afterwards, selected sections of the subsequent chapters can be read and the referenced examples used according to interest. The OpenGL basics are available in Chap. 2 if required. If a reader is only interested in the more theoretical basics of computer graphics and less in actual graphics programming, he should read the book in the order of the chapters and sections. In this case, the detailed explanations of the OpenGL examples can be skipped. A reader with prior knowledge of computer graphics will read this book in the order of the chapters and sections, skipping the sections that cover known prior knowledge. If OpenGL is already known, Chap. 2 can be skipped entirely and used as a reference if needed. Chapter 2 was deliberately designed to be a minimal introduction to OpenGL. However, care has been taken to cover all the necessary elements of the graphics
2 It always refers equally to persons of all genders. To improve readability, the masculine form is used in this book.
1.5 Structure of This Book
11
pipeline in order to give the reader a comprehensive understanding without overwhelming him. For comprehensive and more extensive descriptions of this programming interface, please refer to the OpenGL specifications [9,10], the GLSL specification [1] and relevant books, such as the OpenGL SuperBible [11], the OpenGL Programming Guide [2] and the OpenGL Shading Language book [8].
1.5 Structure of This Book The structure of this book is based on the abstract OpenGL graphics pipeline shown in Fig. 1.2. Therefore, the contents of almost all chapters of this book are already referred to in Sect. 1.3. This section contains a complete overview of the contents in the order of the following chapters. Section 1.4 gives recommendations for a reading order of the chapters and sections depending on the reading type. Chapter 2 contains detailed explanations of the OpenGL programming interface and the individual processing steps of the two OpenGL graphics pipelines, which are necessary to develop your own OpenGL applications with the Java programming language. Simple examples are used to introduce modern graphics programming with shaders and the shader programming language OpenGL Shading Language (GLSL). In Chap. 3, the basic geometric objects used in OpenGL for modelling surfaces of three-dimensional objects are explained. In addition, this chapter contains a presentation of the different OpenGL methods for drawing these primitive basic shapes. The underlying concepts used in OpenGL are also used in other graphics systems. Chapter 4 contains explanations of various modelling approaches for threedimensional objects. The focus here is on modelling the surfaces of three-dimensional bodies. In Chap. 5, the processing of the geometric data is explained, which mainly takes place during vertex processing. This includes in particular the geometric transformations in the viewing pipeline. In this step, among other operations, the position coordinates and normal vectors stored in the vertices are transformed. Models for the representation of colours are used at various steps in the graphics pipeline. Chapter 6 contains the basics of the most important colour models as well as the basics of greyscale representation. An important step in the graphics pipeline is the conversion of the vertex data, which is in a vector graphics representation, into a raster graphics representation (images of pixels). Procedures for this rasterisation are explained in Chap. 7. This chapter also covers the mostly undesirable aliasing effects caused by rasterisation and measures to reduce these effects. A description of procedures for efficiently determining which parts of the scene are visible (visibility considerations) can be found in Chap. 8. This includes methods for clipping and culling. During fragment processing, lighting models are usually used to create realistic lighting situations in a scene. The standard lighting model of computer graphics
12
1
Introduction
is explained together with models for the so-called global illumination in Chap. 9. Effects such as shading, shadows and reflections are also discussed there. Chapter 10 contains descriptions of methods for texturing the surfaces of threedimensional objects. This allows objects to be covered with images (textures) or the application of special effects that change their appearance. In the concluding Chap. 11, selected advanced topics are covered that go beyond the basics of computer graphics and lead to the popular fields of virtual reality (VR) and augmented reality (AR). Important for the realisation of interactive computer graphics applications are methods for object selection and the detection and handling of object collisions. Furthermore, methods for the realisation of fog effects or particle systems are presented, whereby realistic effects can be created. The immersion for virtual reality applications can be increased by auralising three-dimensional acoustic scenes. Therefore, a rather detailed section from this area has been included. Stereoscopic viewing of three-dimensional scenes, colloquially called “seeing in 3D”, is an important factor in achieving immersion in a virtual world and thus facilitating the grasp of complex relationships.
1.6 Exercises Exercise 1.1 Background of computer graphics: Please research the answers to the following questions: (a) (b) (c) (d)
What is the purpose of computer graphics? On which computer was the first computer graphic created? Which film is considered the first fully computer-animated film? What is meant by the “uncanny valley” in the context of computer graphics? Explain this term.
Exercise 1.2 Main steps in computer graphics (a) Name the main (abstract) steps in creating a computer graphic. (b) What are the main processing steps in a graphics pipeline and which computations take place there? Exercise 1.3 Requirements in computer graphics: Please research the answers to the following questions: (a) Explain the difference between non-interactive and interactive computer graphics. (b) From a computer science point of view, which requirements for computer graphics software systems are particularly important in order to be able to create computer graphics that are as realistic as possible? Differentiate your answers
References
13
according to systems for generating non-interactive and interactive computer graphics. (c) Explain the difference between a real-time and non-real-time computer graphics. (d) In computer graphics, what is meant by “hard real time” and “soft real time”?
References 1. J. Kessenich, D. Baldwin and R. Rost. The OpenGL Shading Language, Version 4.60.6. 12 Dec 2018. Specification. Abgerufen 2.5.2019. The Khronos Group Inc, 2018. URL: https://www. khronos.org/registry/OpenGL/specs/gl/GLSLangSpec.4.60.pdf. 2. J. Kessenich, G. Sellers and D. Shreiner. OpenGL Programming Guide. 9th edition. Boston [u. a.]: Addison-Wesley, 2017. 3. F. Klawonn, V. Chekhtman and E. Janz. “Visual Inspection of Fuzzy Clustering Results”. In: Advances in Soft Computing. Ed. by Benítez J.M., Cordóon O., Hoffmann F. and Roy R. London: Springer, 2003, pp. 65–76. 4. M. Magnor. “3D-TV: Computergraphik zwischen virtueller und realer Welt”. In: Informatik Spektrum 27 (2004), pp. 497–503. 5. A. Nischwitz, M. Fischer, P. Haberäcker and G. Socher. Computergrafik. 4. Auflage. Computergrafik und Bildverarbeitung. Wiesbaden: Springer Vieweg, 2019. 6. D. P. Pretschner. “Medizinische Informatik - Virtuelle Medizin auf dem Vormarsch”. In: Carolo-Wilhelmina Forschungsmagazin der Technis-chen Universitäat Braunschweig 1 (2001). Jahrgang XXXVI, pp. 14–22. 7. F. Rehm, F. Klawonn and R. Kruse. “POLARMAP - Efficent Visualizationof High Dimensional Data”. In: Information Visualization. Ed. by E. Banissi, R.A. Burkhard, A. Ursyn, J.J. Zhang, M. Bannatyne, C. Maple, A.J. Cowell, G.Y. Tian and M. Hou. London: IEEE, 2006, pp. 731–740. 8. R. J. Rost and B. Licea-Kane. OpenGL Shading Language. 3rd edition. Upper Saddle River, NJ [u. a.]: Addison-Wesley, 2010. 9. M. Segal and K. Akeley. The OpenGL Graphics System: A Specification (Version 4.6 (Compatibility Profile) - October 22 2019. Abgerufen 8.2.2021. The Khronos Group Inc, 2019. URL: https://www.khronos.org/registry/OpenGL/specs/gl/glspec46.compatibility.pdf. 10. M. Segal and K. Akeley. The OpenGL Graphics System: A Specification (Version 4.6 (Core Profile) - October 22, 2019). Abgerufen 8.2.2021. The Khronos Group Inc, 2019. URL: https:// www.khronos.org/registry/OpenGL/specs/gl/glspec46.core.pdf. 11. G. Sellers, S. Wright and N. Haemel. OpenGL SuperBible. 7th edition. New York: AddisonWesley, 2016. 12. T. Soukup and I. Davidson. Visual Data Mining. New York: Wiley, 2002. 13. K. Tschumitschew, F. Klawonn, F. Höoppner and V. Kolodyazhniy. “Landscape Multidimensional Scaling”. In: Advances in Intelligent Data Analysis VII. Ed. by R. Berthold, J. ShaweTaylor and N. Lavraˇc. Berlin: Springer, 2007, pp. 263–273.
2
The Open Graphics Library (OpenGL)
The Open Graphics Library (OpenGL) is a graphics programming interface that has become very widespread in recent decades due to its open concept and platform independence. Drivers of common graphics processors and graphics cards for the major operating systems support the OpenGL. After a brief overview of existing programming interfaces for graphics applications, this chapter explains the basics of the OpenGL in detail. The functionality is presented in such detail to enable an understanding and classification of the basic concepts of computer graphics contained in the following chapters. At the same time, this chapter can serve as a concise reference book about the OpenGL. This is limited to the OpenGL variant for desktop operating systems. Furthermore, the programming of OpenGL applications with the Java programming language using the OpenGL binding Java OpenGL (JOGL) with and without shaders is introduced.
2.1 Graphics Programming Interfaces An application accesses the GPU through special application programming interfaces (APIs). As shown in Fig. 2.1, these programming interfaces can be distinguished according to their degree of abstraction from the GPU hardware. The highest level of abstraction is provided by user interface frameworks, which provide relatively simple (mostly) pre-built windows, dialogues, buttons and other user interface elements. Examples of such an interface are Windows Presentation Foundation (WPF), Java Swing or JavaFX. At the abstraction level below is scene graph middleware, which manages individual objects in a scene graph. A scene graph is a special data structure to store a
Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/978-3-031-28135-8_2.
© Springer Nature Switzerland AG 2023 K. Lehn et al., Introduction to Computer Graphics, Undergraduate Topics in Computer Science, https://doi.org/10.1007/978-3-031-28135-8_2
15
16
2 The Open Graphics Library (OpenGL)
Fig. 2.1 Degree of abstraction of graphics programming interfaces
scene in such a way that it can be efficiently manipulated and rendered (see Sect. 5.4). Furthermore, it can be used to define and control the properties, the structure of the graphic objects and their animations. Examples of this are OpenSceneGraph, which is implemented in C++, and the Java-based Java 3D. The lowest level of abstraction in the figure is found in the graphics programming interfaces, through which the drawing of objects and the functioning of the graphics pipeline can be very precisely controlled and programmed. An example of this is Direct3D by Microsoft. Other examples are the programming interfaces Open Graphics Library (OpenGL) and Vulkan, both specified by the industry consortium Khronos Group. Here, Direct3D and Vulkan are at a lower level of abstraction than OpenGL. A graphics programming interface provides uniform access to the GPU, so that the functionality of each GPU of each individual manufacturer does not have to be considered separately when programming graphics applications. The advantage of a low degree of abstraction is the possibility of very precise control of rendering by the GPU and thus a very efficient implementation and execution of drawing processes. In contrast to the higher abstraction levels, the programming effort is significantly higher, which can potentially lead to more errors. Programming interfaces with a high level of abstraction can be used much more easily without having to worry about the specifics of graphics programming or a specific GPU. On the other hand, the control and flexibility of rendering are limited. For some use cases, the execution speed in particular will be slower, as the libraries used have to cover a large number of use cases in order to be flexible and thus cannot be optimised for every application. The use of the programming interface OpenGL was chosen for this book to teach the basics of computer graphics. At this degree of abstraction, the principles of the processing steps in the individual stages of graphics pipelines can be taught clearly and with a direct reference to practical examples. In contrast to the programming interfaces Direct3D and Vulkan, the abstraction is high enough to achieve initial programming results quickly. In addition, OpenGL (like Direct3D and Vulkan) offers the up-to-date possibility of flexible programming of the graphics pipeline on the
2.2 General About the OpenGL
17
GPU using shaders. OpenSceneGraph also supports shader programming, but this middleware is only available for integration into C++ programs. Java 3D was developed for the development of Java programs, but offers no support for programming shaders. JavaFX provides an interface to a window system at a high level of abstraction and allows working with a scene graph for two- and three-dimensional graphics. Support for shaders was planned, but is not yet available in the latest version at the time of writing this book. In the following sections of this chapter, the OpenGL programming interface is introduced to the extent necessary to explain the basic mechanisms of computer graphics. Comprehensive descriptions are available in the OpenGL specifications [9,10], the GLSL specification [4] or the comprehensive books [5,12].
2.2 General About the OpenGL The programming interface Open Graphics Library (OpenGL) was specified by the industry consortium Khronos Group and is maintained as an open standard. Many companies belong to the Khronos Group, which has made OpenGL very widespread. Current specifications, additional materials and information can be found at www. opengl.org. OpenGL was designed by the Khronus Group as a platform-independent interface specification for the hardware-supported rendering of high-quality three-dimensional graphics. The hardware support can be provided by a Graphics Processing Unit (GPU) that is integrated into the central processing unit (CPU) and is located as a separate processor on the motherboard of the computer or on a separate graphics card. Support by several GPUs or several graphics cards is also possible. Although OpenGL is designed as an interface for accessing graphics hardware, the interface can in principle be implemented by software that is largely executed on the CPU. This, of course, eliminates the execution speed advantage of hardware support. OpenGL provides a uniform interface so that graphics applications can use different hardware support (for example, by different graphics cards on different devices) without having to take this into account significantly in the programming.1 OpenGL specifies only an interface. This interface is implemented by the manufacturer of the graphics hardware and is supplied by the graphics driver together with the hardware. This allows the connection of a specific hardware to the general programming interface, which gives the manufacturer freedom for optimisations in graphics processing depending on the features and performance of the GPU. This concerns hardware parameters such as the size of the graphics memory or the
1 Since drivers for graphics processors only support OpenGL up to a certain version or a limited number of OpenGL extensions, a proportion of hardware-dependent programming code may still be necessary. Depending on the specific application, a large part of the graphics application will still be programmable independently of GPUs using OpenGL.
18
2 The Open Graphics Library (OpenGL)
number of processor cores that allow parallel processing. Furthermore, algorithms can be implemented that optimise, for example, the timing and the amount of data transfer between the CPU and the GPU. Since the OpenGL specifications only define the behaviour at the interface level and certain variations in the implementation are permitted, a graphics application with the same input parameters on different graphics processors will not produce bitwise identical outputs in all cases. However, the results will be the same, except for details. In order to use this abstraction for learning graphics programming using OpenGL, this book only describes the conceptual behaviour of the GPU from the perspective of a graphics application using OpenGL. The actual behaviour below the OpenGL interface level may differ from this conceptual behaviour and is of little relevance to understanding the fundamentals of graphics programming. Therefore, this book often describes data “being passed to the GPU” or processing “on the GPU”, although this actually occurs at different times on different GPUs due to the aforementioned optimisations of the graphics driver. The graphics driver also determines the supported OpenGL version, so this driver should be updated regularly. OpenGL is also specified independently of a computer platform. Nevertheless, OpenGL drivers are available for the common operating systems Windows, macOS and Linux. OpenGL for Embedded Systems (OpenGL ES) is a subset of (desktop) OpenGL and provides specific support for mobile phones and other embedded systems, such as game consoles and automotive multimedia devices. In particular, these specifications take into account the need to reduce the power consumption of mobile and embedded systems. The Android mobile operating system, for example, offers OpenGL ES support. Furthermore, modern browsers support WebGL (Web Graphics Library), which is a special specification for the hardware-supported display of threedimensional graphics for web applications. WebGL is shader-based and has very similar functionalities to the OpenGL ES. Separate specifications exist for OpenGL ES and WebGL with their own version counts (see www.khronos.org). The basic mechanisms and available shaders (in OpenGL ES from version 2.0) are—depending on the version—identical to the desktop variant of OpenGL. Since starting to develop applications for mobile phones or other embedded systems requires further training and the mechanisms of OpenGL form a basis for understanding the other variants, only OpenGL applications for desktop operating systems are considered in this book. OpenGL is specified independently of any programming language. On the reference pages www.opengl.org, the commands are explained in a syntax that is similar to the syntax of the C programming language. Many graphics applications with OpenGL support are implemented in the C or C++ programming language, for which an OpenGL binding exists. Since OpenGL is independent of specific operating systems and computer platforms (see above), such bindings usually do not include support for specific window systems or input devices (mouse, keyboard or game controllers). For this, further external libraries are needed. A short overview of the necessary building blocks for development under C/C++ can be found in [6, p. 46f]. Detailed instructions for setting up the development environment for OpenGL graphics programming under C/C++ are available on the relevant websites and
2.4 Profiles
19
forums. There are also bindings for a wide range of other programming and scripting languages, such as C#, Delphi, Haskell, Java, Lua, Perl, Python, Ruby and Visual Basic.
2.3 The OpenGL and Java Two commonly used OpenGL language bindings to the Java programming language are Java OpenGL (JOGL) and Lightweight Java Game Library (LWJGL). With the very widespread programming language Java and the associated tools, such as the Java Development Kit (JDK), an environment exists that makes it easy for beginners to get started with programming. Furthermore, the programming syntax of the OpenGL language binding JOGL is close to the syntax of the OpenGL documentation. In addition to the possibility of displaying computer graphics generated with hardware support via the JOGL window system (GLWindow), the Abstract Window Toolkit (AWT) or Swing window systems included in current Java versions can also be used. This makes it possible to combine the teaching of the basics of computer graphics with an easy introduction to practical graphics programming. Knowledge of Java and OpenGL form a solid foundation for specialisations such as the use of OpenGL for Embedded Systems (OpenGL ES) for the realisation of computer graphics on mobile phones or the Web Graphics Library (WebGL) for the generation of GPU-supported computer graphics in browsers. These basics can be used for familiarisation with Vulkan as a further development of a graphics programming interface at a lower level of abstraction. Furthermore, this basis allows an easy transition to other programming languages, such as C++, C#, to other window systems or to future developments in other areas. For these reasons, programming examples in Java with the OpenGL binding JOGL are used in this book to motivate and support the core objective—teaching the basics of computer graphics—through practical examples.
2.4 Profiles In 2004, with the OpenGL specification 2.0, the so-called programmable pipeline (see Fig. 2.2, right) was introduced, which allows for more flexible programming of the graphics pipeline. Before this time, only parameters of fixed-function blocks in the so-called fixed-function pipeline (see Fig. 2.2, left) could be modified. By a blockwise transmission of data into buffers on the GPU, much more efficient applications can be implemented using the programmable pipeline (see Sect. 2.10). The flexible programming in this pipeline is mainly achieved by programming shaders. (cf. [6, pp. 44–45]) Since more complex and efficient graphics applications can be implemented using the programmable pipeline and the language scope of this pipeline is significantly
20
2 The Open Graphics Library (OpenGL)
Fig. 2.2 Fixed-function pipeline (left) and programmable pipeline (right) Table 2.1 Differences between the compatibility profile and the core profile in OpenGL Compatibility profile
Core profile
Number of commands
Approximately 700
Approximately 250
Fixed-function pipeline
Available
Not available
Programmable pipeline
Available
Available
smaller in contrast to the fixed-function pipeline, an attempt was made in 2008 and 2009 with the OpenGL specifications 3.0 and 3.1 to remove the old fixed-function pipeline commands from the OpenGL specifications. However, this met with heavy criticism from industrial OpenGL users, so that with version 3.2 the so-called compatibility profile was introduced. This profile, with its approximately 700 commands, covers the language range of the fixed-function pipeline and the programmable pipeline. In the core profile, a reduced language scope with approximately 250 commands of the programmable pipeline is available. (cf. [6, p. 46]) The relationship between the two profiles and the availability of the two pipelines is summarised in Table 2.1. The core profile is specified in [10] and the compatibility profile in [9].
2.5 OpenGL Graphics Pipelines
21
2.5 OpenGL Graphics Pipelines Figure 2.2 shows the basic processing stages of the graphics pipelines in the OpenGL. Both pipelines in the figure are detailed representations of the abstract OpenGL graphics pipeline introduced in Sect. 1.1. To simplify matters, the presentation of special data paths and feedback possibilities to the graphics application was omitted. As already explained, the essential further development of the fixed-function pipeline to the programmable pipeline was the replacement of fixed-function blocks with programmable function blocks, each of which is colour-coded in Fig. 2.2. The vertex shader of the programmable pipeline can realise the function blocks geometric transformations, lighting and colouring in the fixed-function pipeline. The function block texturing, colour sum and fog in the fixed-function pipeline can be implemented in the programmable pipeline by the fragment shader. Shader-supported programming naturally results in additional freedom in graphics processing. For example, in modern applications, the lighting is realised by the fragment shader instead of the vertex shader. In addition, a programmable tessellation stage and a geometry shader are optionally available in the programmable pipeline (see Sect. 2.6). The non-coloured fixed-function blocks are identical in both graphics pipelines and can be configured and manipulated through a variety of parameter settings. Through the compatibility profile (see Sect. 2.4), both graphics pipelines are available in modern graphics processors. It is obvious to simulate the functions of the fixed-function pipeline by (shader) programs using the programmable pipeline. This possibility is implemented by the graphics driver and is not visible to users of the interface to the fixed-function pipeline. In the following sections, the function blocks of both graphics pipelines are explained in more detail.
2.5.1 Vertex Processing The vertex processing stage receives the scene to be drawn from the graphics application. In the fixed-function pipeline, the scene is supplied to the geometric transformations stage and in the programmable pipeline to the vertex shader. The objects of the scene are described by a set of vertices. A vertex (plural: vertices) is a corner of a geometric primitive (or primitive for short), for example, an endpoint of a line segment or a corner of a triangle. Furthermore, additional information is stored in these vertices in order to be able to draw the respective geometric primitive. Typically, the vertices contain position coordination in three-dimensional space (3D coordinates), colour values, normal vectors or texture coordinates. The colour values can exist as RGBA colour values, whose fourth component is the so-called alpha value, by which the transparency of an object is described. Strictly speaking, the alpha value represents the opacity of an object. A low alpha value means high transparency and a high value means low transparency (see Sect. 6.3 for details). The normal vectors are displacement vectors oriented perpendicular to the surface to be drawn. They are used to calculate and display the illumination of the objects and the scene (see Chap. 9). The texture coordinates are used to modify objects with
22
2 The Open Graphics Library (OpenGL)
textures. Often objects are covered by two-dimensional images (see Chap. 10). In the fixed-function pipeline, fog coordinates can be used to achieve fog effects in the scene. In the programmable pipeline, in principle, any content can be stored in the vertices for processing in the correspondingly programmed vertex shader. The available data in addition to the three-dimensional position coordinates are called associated data. The three-dimensional position coordinates of the vertices are available in socalled object coordinates (also called model coordinates) before the execution of the geometric transformations function block. Through the application of the socalled model matrix, the position coordinates of the individual vertices are first transformed into world coordinates and then, through the application of the so-called view matrix, into camera coordinates or eye coordinates. The latter transformation step is also called view transformation, which takes into account the position of the camera (or the position of the observer) through which the scene is viewed. The two transformations can be combined by a single transformation step using the model view matrix, which results from a matrix multiplication of the model matrix and the view matrix (see Chap. 5). The model view matrix is usually passed from the graphics application to the GPU. In the next step of geometry processing, the projection matrix is applied to the position coordinates of the vertices, transforming the vertices into clip coordinates (projection transformation). This often results in a perspective distortion of the scene, so that the representation gets a realistic three-dimensional impression. Depending on the structure of the projection matrix, for example, perspective projection or parallel projection can be used. These two types of projection are frequently used. The described use of the model view matrix and the projection matrix is part of the viewing pipeline, which is described in more detail in Sect. 5.10. Besides the transformation of the position coordinates, the transformation of the normal vectors of the vertices is also necessary (see Sect. 5.11). In the programmable pipeline, the transformations using the model view matrix and the projection matrix are implemented in the vertex shader. Since this must be individually programmed, variations of the calculation are possible according to the needs of the application. The second part of the viewing pipeline is carried out in the vertex post-processing stage (see Sect. 2.5.2). In the function block lighting and colouring of the fixed-function pipeline, the computation of colour values for the vertices takes place. This block allows to specify light sources and material parameters for the surfaces of individual object faces, which are taken into account in the lighting calculation. In particular, the normal vectors of the individual vertices can be taken into account. The use of the Blinn– Phong illumination model (see [1]), which is often used in computer graphics, is envisaged. This model builds on Phong’s illumination model [7] and modifies it to increase efficiency. Details are given in Sect. 9.3. Due to the high performance of modern graphics processors, the illumination values in current computer graphics applications are no longer calculated for vertices but for fragments. Normally, more fragments than vertices are processed in a scene. In this case, the corresponding
2.5 OpenGL Graphics Pipelines
23
calculations are performed in the programmable pipeline in the fragment shader (see Sect. 2.5.5).
2.5.2 Vertex Post-Processing The function block vertex post-processing allows numerous post-processing steps of the vertex data after the previous steps. The main steps are explained below. The optional transform feedback makes it possible to save the result of the geometry processing or processing by the preceding shader in buffer objects and either return it to the application or use it multiple times for further processing by the graphics pipeline. Using flat shading, all vertices of a graphic primitive can be assigned the same vertex data. Here, the vertex data of a previously defined vertex, the provoking vertex, is used within the primitive. This allows, for example, a triangle to be coloured with only one colour, which is called flat shading (or constant shading) (see Sect. 9.4). Clipping removes invisible objects or parts of objects from a scene by checking whether they are in the visible area—the view volume. In this process, non-visible vertices are removed or new visible vertices are created to represent visible object parts. Efficient clipping methods are described in Sect. 8.1. Furthermore, perspective division converts the position coordinates of the vertices from clip coordinates into a normalised representation, the normalised device coordinates. Based on this normalised representation, the scene can be displayed very easily and efficiently to any two-dimensional output device. Subsequently, the viewport transformation transforms the normalised device coordinates into window coordinates (also device coordinates) so that the scene is ready for output to a twodimensional window, the viewport. These two transformations for the position coordinates of the vertices represent the second part of the viewing pipeline, which is explained in detail in Sect. 5.10. Explanations of the viewport are available in Sect. 5.1.2.
2.5.3 Primitive Assembly An OpenGL drawing command invoked by the graphics application allows to specify for a number of vertices (a vertex stream) which geometric shape to draw. The available basic shapes are referred to in computer graphics geometric primitives, or primitives for short. In the fixed-function pipeline, points, line segments, polygons, triangles and quadrilaterals (quads) are available. In the programmable pipeline, only points, line segments and triangles can be drawn. See Sect. 3.2 for the available basic objects in the OpenGL. To increase efficiency, drawing commands exist that allow a large number of these primitives to be drawn by reusing shared vertices. For example, complete object surfaces can be rendered by long sequences of triangles (GL_TRIANGLE_STRIP)
24
2 The Open Graphics Library (OpenGL)
or triangle fans (GL_TRIANGLE_FAN) with only one call to a drawing command. Various possibilities for drawing primitives in the OpenGL can be found in Sect. 3.3. The essential task of the function block primitive assembly is the decomposition of these vertex streams into individual primitives, also called base primitives. The three base primitives in the OpenGL are points, line segments and polygons. In the programmable pipeline, the polygon base primitive is a triangle. Through this function block, a line sequence consisting of seven vertices, for example, is converted into six individual base line segments. If a tessellation evaluation shader or a geometry shader (see Sect. 2.6) is used, part of the primitive assembly takes place before the shader is or shaders are executed. The decomposition of a vertex stream into individual base primitives determines the order in which the vertices in the vertex stream are specified (enumerated) relative to the other vertices of a primitive. This allows the front and back of the primitives to be identified and defined. According to a common convention in computer graphics, the front side of a primitive is the viewed side where the vertices are named (enumerated) counter-clockwise in the vertex stream. For the backsides, this orientation, which is also called winding order, is exactly the opposite (see Sect. 3.2.4). With this information, it is possible to cull (suppress) front or backsides of the polygons by frontface culling or backface culling before the (computationally intensive) rasterisation stage. For example, for an object viewed from the outside, only the outer sides of the triangles that make up the surface of the object need to be drawn. This will usually be the front sides of the triangles. If the viewer is inside an object, for example, in a cube, then it is usually sufficient to draw the backsides of the polygons. If the objects in the scene are well modelled, i.e., all the front sides actually face outwards, then this mechanism contributes to a significant increase in drawing efficiency without reducing the quality of the rendering result. Details on visibility considerations can be found in Sect. 8.2.1.
2.5.4 Rasterisation Up to this point of processing in the OpenGL graphics pipeline, all objects of the scene to be rendered are present as a set of vertices—i.e., as a vector graphic. The function block rasterisation converts the scene from this vector format into a raster graphic (see Sect. 7.1 for an explanation of these two types of representations) by mapping the individual base primitives onto a regular (uniform) grid.2 Some algorithms for efficient rasterisation (also called scan conversion) lines are presented in Sect. 7.3. Methods for rasterising and filling of areas can be found in Sect. 7.5. The results of rasterisation are fragments, which are potential pixels for display on an output device. For a clear linguistic delimitation, we will explicitly speak of fragments, as is common in computer graphics, and not of pixels, since at this point
2 In the OpenGL specification, a regular (uniform) grid consisting of square elements is assumed for simplicity. In special OpenGL implementations, the elements may have other shapes.
2.5 OpenGL Graphics Pipelines
25
in the pipeline it is not yet clear which of the potentially overlapping fragments will be used to display which colour value as a pixel of the output device. Only in the following steps of the pipeline is it taken into account which objects are hidden by other objects, how the transparency of objects affects them and how the colour values of the possibly overlapping fragments are used to compute a final colour value of a pixel. After rasterisation—with the standard use of the pipeline—for each fragment at least the position coordinates in two-dimensional window coordinates (2D coordinates), a depth value (z-value) and possibly a colour value are available. The threedimensional position coordinates of the vertices are used to determine which raster points belong to a geometric primitive as a fragment. In addition, the assignment of additional data stored in each vertex to the generated fragments takes place. These are, for example, colour values, normal vectors, texture coordinates or fog coordinates. This additional data stored in vertices and fragments is called associated data (see above). Since the conversion of a vector graphic into a raster graphic usually results in many fragments from a few vertices, the additional intermediate values required are calculated by interpolating the vertex data. This is done by using the two-dimensional position coordinates of the fragment within the primitive in question. The interpolation is usually done using barycentric coordinates within a triangle (see Sect. 7.5.4). In Chap. 10, this is explained using texture coordinates. Such an interpolation is used in both pipeline variants for the colour values. This, for example, can be used to create colour gradients within a triangle. If this interpolation, which has a smoothing effect, is not desired, it can be disabled by activating flat shading in the function block vertex post-processing (see Sect. 2.5.2). Since in this case all vertices of a primitive already have the same values before rasterisation, all affected fragments also receive the same values. This mechanism of rasterisation can be used to realise flat shading (constant shading), Gouraud shading or Phong shading (siehe Sect. 9.4). When converting a scene from a vector graphic to a raster graphic, aliasing effects (also called aliasing for short) appear, which can be reduced or removed by antialiasing methods (see Sect. 7.6). Already the linear interpolation during the conversion of vertex data into fragment data (see last paragraph) reduces aliasing, as this has a smoothing effect. The OpenGL specification provides the multisample antialiasing (MSAA) method for antialiasing. With this method, colour values are sampled at slightly different coordinates in relation to the fragment coordinate. These samples (also called subpixels) are combined at the end of the graphics pipeline into an improved (smoothed) sample per fragment (see Table 2.2 in Sect. 2.5.6). Furthermore, a so-called coverage value is determined for each fragment, which indicates how large the proportion of the samples is that lie within the sampled geometric primitive and contribute to the colour of the respective fragment (see Sect. 7.6.3). For the following pipeline steps, the computations can be performed for all samples (in parallel) per fragment. In this case, true oversampling is applied. This method is called supersampling antialiasing (SSAA) (see Sect. 7.6.5). The pipeline can be configured in such a way that the subsequent stages use only a smaller proportion of samples or only exactly one sample (usually the middle sample) per fragment
26
2 The Open Graphics Library (OpenGL)
for the computations. This option of the multisampling antialiasing method allows optimisations to achieve a large gain in efficiency.
2.5.5 Fragment Processing In the pipeline stage texturing, colour sum and fog of the fixed-function pipeline textures can be applied to objects. Often the objects are covered by two-dimensional images (2D textures) to achieve a more realistic rendering of the scene (see Chap. 10). The texturing operation uses the (interpolated) texture coordinates that were determined for individual fragments. The colour values of the texture are combined with the colour values of the fragments when applying the texture. Due to the multiplicative combination of the colour values, specular reflections (see Sect. 9.1) may be undesirably strongly reduced. Therefore, with the function colour sum, a second colour can be added after this colour combination to appropriately emphasise the specular reflections. This second colour (secondary colour) must be contained in the vertices accordingly. Furthermore, at this stage, it is possible to create fog effects in the scene (see Sect. 11.3). In the programmable pipeline, the operations of the fixed-function pipeline can be realised by the fragment shader. In modern graphics applications, the computations for the illumination per fragment take place in the fragment shader. In contrast to the lighting computation per vertex in the fixed-function pipeline (see Sect. 2.5.1), more realistic lighting effects can be achieved. Such methods are more complex, but with modern graphics hardware, they can easily be implemented for real-time applications. Methods for local and global illumination are explained in Chap. 9.
2.5.6 Per-Fragment Operations Before and after fragment processing, a variety of operations are applicable to modify fragments or eliminate irrelevant or undesirable fragments. By excluding further processing of fragments, efficiency can be significantly increased. In addition, various effects can be realised by modifying fragments. The possible per-fragment operations are listed in Table 2.2 along with possible applications or examples. The order of listing in the table corresponds to the processing order in the OpenGL graphics pipelines. As can be seen from the table, the pixel ownership test, the scissor test and the multisample fragment operations are always executed before fragment processing. From the point of view of the user of the fixed-function pipeline, all other operations are only available after this function block. In the programmable pipeline, the operations stencil test, depth buffer test and occlusion queries can optionally be executed before the fragment shader. This is triggered by specifying a special layout qualifier for input variables of the fragment shader.3 If these operations are
3 For
details, see, for example, https://www.khronos.org/opengl/wiki/Early_Fragment_Test.
2.5 OpenGL Graphics Pipelines
27
Table 2.2 Per-fragment operations in the OpenGL graphics pipelines Per-fragment operation
Execution Possible applications or example, partly before or after in combination with other per-fragment fragment operations processing
Pixel ownership test
Before
Identifying the occlusion of the framebuffer by windows outside of the OpenGL, preventing drawing in the hidden area
Scissor test
Before
Preventing drawing outside the viewport, support of drawing rectangular elements, for example, buttons or input fields
Multisample fragment operations
Before
Antialiasing in conjunction with transparency, if multisampling is activated
Alpha to coverage
After
Antialiasing in conjunction with transparency, for example, for the representation of fences, foliage or leaves, if multisampling is active
Alpha test1
After
Representation of transparency
Stencil test
After, optional before2
Cutting out non-rectangular areas, for example, for rendering shadows from point light sources or mirrors or for masking the cockpit in flight or driving simulations
Depth buffer test
After, optional before2
Occlusion of surfaces, culling
Occlusion queries
After, optional before2
Rendering of aperture spots or lens flare, occlusion of bounding volumes, collision control
Blending
After
Mixing/cross-fading (of images), transparency
sRGB conversion
After
Conversion to the standard RGB colour model (sRGB colour model)
Dithering
After
Reduction of interferences when using a low colour depth
Logical operations
After
Mixing/cross-fading (of images), transparency
Additional multisample fragment operations
After
Determination whether alpha test, stencil test, depth buffer test, blending, dithering and logical operations are performed per sample. Combination of the colour values of the samples to one colour value per fragment, if multisampling is active
1 Available
in the compatibility profile only the programmable pipeline, optional execution before the fragment shader instead of after the fragment shader (see explanations in the text)
2 In
28
2 The Open Graphics Library (OpenGL)
performed before the fragment shader, they are called early fragment tests and are not performed again after the fragment shader. For advanced readers, it should be noted that in addition to this explicit specification of whether early fragment tests are performed before or after fragment processing, optimisations of the GPU hardware and the OpenGL driver implementation exist that include early execution independently of the specification in the fragment shader. Since in modern GPUs the fixed-function pipeline is usually realised by shader implementations, these optimisations are in principle also possible or available for the fixed-function pipeline. In order to remain compatible with the OpenGL specification, the behaviour of these optimisations in both pipelines must be as if these operations were performed after fragment processing, although this actually takes place (partially) beforehand. Of course, this is only true if early fragment tests are not explicitly enabled. The user of OpenGL may not notice any difference in the rendering results apart from an increase in efficiency due to this optimisation. The functionalities of the individual operations are partially explained in the following chapters of this book. Detailed descriptions of the mechanisms can be found in the OpenGL specifications (see [9,10]) and in books that comprehensively describe the OpenGL programming interface (see, for example, [5,12]). The following are examples of some applications of the per-fragment operations. With the help of the scissor test, a rectangular area can be defined in which to draw. Fragments outside this rectangle can be excluded from further processing. Due to the rectangular shape of this area, it is possible to check very efficiently whether a fragment lies inside or outside this area. To check whether a fragment lies in an area with a more complex shape than a rectangle, the stencil test is applicable. For this, a stencil buffer must be created that contains values for each two-dimensional coordinate. With this test, for example, only the fragments from the area defined by the stencil buffer can be drawn—as with a template—and those outside this area can be excluded from drawing. In addition, more complex comparison operations, modifications of the stencil buffer and a link with the depth buffer test (see next paragraph) are possible. In principle, it is possible to render shadows or mirrors with this test. However, today these effects are mostly realised by drawing in textures (rendering to texture) and then applying these textures to the corresponding objects. In a depth buffer, also called z-buffer, the z-values (depth values) of the fragments can be stored so that the depth buffer test can be used to check whether certain fragments overlap each other. In this way, the mutual occlusion of objects (called culling) can be implemented. In Sect. 8.2.5, the depth buffer algorithm for performing occlusion calculation is explained. By blending or using logical operations, colours of fragments (in buffers) can be mixed for rendering transparent objects. Furthermore, Table 2.2 lists operations for performing antialiasing using the multisampling antialiasing method. The multisample fragment operations can modify or mask the coverage value for a fragment to improve the appearance of transparency. In the additional multisample fragment operation, it can be determined whether
2.6 Shaders
29
alpha test, stencil test, depth buffer test, blending, dithering and logical operations are performed per sample or only per fragment. This operation is the last step of the multisampling process, in which the colour values of the individual samples are combined into one colour value per fragment. Section 7.6.6 contains details on multisample antialiasing. The alpha to coverage operation derives a temporary coverage value from the alpha components of the colour values of the samples of a fragment. This value is then offset by a logical AND operation with the coverage value of the fragment and used as the new coverage value. This makes it possible to combine good transparency rendering and good antialiasing with efficient processing in complex scenes (for example, to render foliage, leaves or fences). If the reduction of the colour depth is desired, the dithering operation can be used to minimise possible interference. The basic approach of such a procedure is presented in Sect. 6.1.
2.5.7 Framebuffer The result of the previous processing steps can be written into the framebuffer to display the result on a screen. The OpenGL usually uses as default framebuffer a buffer provided by the windowing system used, so that the render result is efficiently output in a window. Alternatively, framebuffer objects can be used whose content is not directly visible. These objects can be used for computations in a variety of applications, for example, to prepare the output via the visible framebuffer before the actual content is displayed.
2.6 Shaders In 2004, with the introduction of the programmable pipeline, the vertex shader and the fragment shader were introduced as programmable function blocks in the OpenGL graphics pipeline (see Sects. 2.4 and 2.5). Although these shaders are very freely programmable, they typically perform certain main tasks. The vertex shader usually performs geometric transformations of the vertices until they are available in clip coordinates. Furthermore, it often performs preparatory computations for the subsequent processing stages. The task of the fragment shader is mostly to calculate the lighting for the fragments depending on different light sources and the material properties of objects. In addition, the application of textures is usually taken over by this shader. In the compatibility profile, the vertex and fragment shaders are optional. In the core profile, these two shaders must be present; otherwise, the rendering result is undefined (see [10]). For advanced readers, it should be noted that only the colour values of the fragment processing result are undefined if there is no fragment shader in the core profile. Depth values and stencil values have the same values as on the input side, so they
30
2 The Open Graphics Library (OpenGL)
Fig. 2.3 Vertex processing in the programmable pipeline with all function blocks: The Tessellation Control Shader, the Tessellation Evaluation Shader and the Geometry Shader are optional
are passed through this stage without change.4 This may be useful, for example, for the application of shadow maps (see Sect. 10.1.2). As of OpenGL version 3.x in 2008, the geometry shader was introduced as a further programmable element. This shader is executed after the vertex shader or after the optional tessellation stage (see Fig. 2.3). With the help of this shader, the geometry can be changed by destroying or creating new geometric primitives such as points, lines or polygons. For this purpose, this shader also has access to the vertices of neighbouring primitives (adjacencies). Before the geometry shader is executed, the decomposition into base primitives takes place through the function block primitive assembly (see Sect. 2.5.3). Thus, this decomposition step is carried out earlier in the case of using the geometry shader. The described functionality can be used for the dynamic generation of different geometries. For example, a differently detailed geometry can be generated depending on the distance of the viewer to the camera. This type of distance-dependent rendering is called level of detail (LOD) (see Sect. 4.5.1). Furthermore, the geometry can be duplicated and rendered from different angles using the graphics pipeline. This is useful, for example, to create shadow textures from different angles in order to simulate realistic shadows. Similarly, so-called environment mapping (see Sect. 10.1.2) can be used to simulate reflective objects by projecting the object’s surroundings (from different angles) onto the inside of a cube and then applying this as a texture to
4 See
https://www.khronos.org/opengl/wiki/Fragment_Shader for details.
2.6 Shaders
31
the reflective object. Together with the transform feedback, the geometry shader can also be used for the realisation of particle systems. Transform feedback allows the result of vertex processing to be buffered in buffer objects and returned to the graphics application or to the first stage of the graphics pipeline for further processing. This allows the (slightly) modified graphics data to be processed iteratively several times. In this case, the geometry shader takes over the iterative creation, modification and destruction of the particles, depending on the parameters (such as point of origin, initial velocity, direction of motion, and lifetime) of each individual particle. Such particle systems (see also Sect. 11.4) can be used to create effects such as fire, explosions, fog, flowing water or snow. The use of the geometry shader is optional in both OpenGL profiles. The OpenGL version 4.0 in 2010 introduced the tessellation unit, which allows the geometry to be divided into primitives dynamically during runtime (for an explanation of tessellation, see Sect. 4.2). As shown in Fig. 2.3, the tessellation unit consists of three parts, the tessellation control shader, the configurable tessellation primitive generation and the tessellation evaluation shader. The two shaders are thus the programmable elements of the tessellation unit. With the tessellation unit, a new primitive, the so-called patch primitive was introduced. A patch primitive is a universal primitive consisting of individual patches, which in turn consist of a given number of vertices. This can be used, for example, to define parametric surfaces that are broken down into triangles by the subsequent tessellation steps. The tessellation control shader controls the division of the patches into smaller primitives by defining parameters for the subsequent stage per patch. Care must be taken that the edges of adjacent patches have the same tessellation level, i.e., are divided into the same number of primitives, in order to avoid discontinuities or gaps in a continuously desired polygon mesh. In addition, further attributes can be added per patch for the subsequent shader. The tessellation control shader is optional. If it is not available, tessellation is performed with preset values. The tessellation primitive generation function block performs the division of the patches into a set of points, line segments and triangles based on the parameters defined by the tessellation control shader. The tessellation evaluation shader is used to calculate the final vertex data for each of the newly generated primitives. For example, the position coordinates, the normal vectors or the texture coordinates for the (newly generated) vertices have to be determined. If the tessellation evaluation shader is not present, no tessellation takes place through the rendering pipeline. As indicated earlier, parametric surfaces can be processed by the tessellation unit. Such surfaces can be used to describe the surfaces of objects by a few control points. The graphics pipeline generates the (detailed) triangle meshes to be displayed on this basis. This means that these tessellated triangle meshes no longer have to be provided by the graphics application, possibly using external tools, but can be determined by the GPU. On the other hand, this means a higher computing effort for the GPU. With this mechanism, it is possible to pass the parameters of the freeform surfaces often used for modelling surfaces, such as Bézier surfaces or NURBS(non-uniform rational B-splines) (see Sect. 4.6), to the graphics pipeline and only then perform the calculation of the triangle meshes.
32
2 The Open Graphics Library (OpenGL)
Furthermore, through a dynamic degree of tessellation, the level of detail of the triangle meshes that make up the 3D objects can be scaled depending on the distance to the camera. This can save computational time for more distant objects without losing detail when the camera is very close to the object. Such a distance-dependent rendering is called level of detail (see Sect. 4.5.1) and often used for landscapes or objects that may be temporarily in the background. As shown above, a similar realisation is possible using the geometry shader. Another application for the tessellation unit is the realisation of displacement mapping. This involves applying a fine texture (the displacement map) to an object modelled by a coarse polygon mesh, resulting in a refined geometry (see Sect. 10.2). Furthermore, an iterative algorithm can be implemented by the tessellation unit together with the transform feedback, which realises the technique of subdivision surfaces. Here, a coarse polygon mesh undergoes a refinement process several times until the desired high resolution is achieved. Details on this technique can be found, for example, in [3, p. 607ff]. As of OpenGL version 4.3 from 2012, the compute shader can be used. This shader is intended for performing general-purpose tasks that do not need to be directly related to computing and rendering computer graphics. It is not directly integrated into the OpenGL graphics pipelines and therefore does not appear in the figures of this chapter. Typical applications for graphics applications using this shader are, for example, calculations upstream of the graphics pipeline for the animation of water surfaces or particle systems. Likewise, the compute shader can be used for ray tracing algorithms to calculate global illumination (see Sect. 9.9). Filter operations for images or the scaling of images are also possible applications for the compute shader.
Table 2.3 Shaders in the OpenGL with their typical tasks or with application examples for the shaders Shader
Must/optional in the core profile
Typical tasks or examples of use
Vertex
Must
Geometric transformations into clip coordinates: Model, view and projection transformations, preparation of subsequent calculations
Tessellation control
Optional
Tessellation control per patch
Tessellation evaluation
Optional
Calculation of the final vertex data after tessellation. If this shader is missing, no tessellation takes place
Geometry
Optional
Dynamic creation and destruction of geometry, application examples: special shadow calculation, reflecting objects, particle systems
Fragment
Must (see notes in the text)
Realistic lighting, application of textures, fog calculation
Compute
Optional
Universal computations on the GPU, not directly included in the graphics pipeline
2.7 OpenGL Programming with JOGL
33
Table 2.3 shows an overview of the shaders that can be used in the OpenGL with their typical functions or application examples for the shaders. The shaders are listed in the order in which they are processed in the OpenGL graphics pipeline. As can be seen from the explanations for advanced readers at the beginning of this section, the fragment shader is mandatory according to the OpenGL specification. However, it is optional from a technical point of view. If this shader is missing, the colour values of the result of the fragment processing are undefined in any case. Due to the increasing possibility of flexible programming of GPUs through shaders, the use of graphics cards for tasks outside computer graphics has become very interesting in recent years. The speed advantages of GPUs over standard processors used as central processing units (CPUs) essentially result from the higher degree of parallelism for the execution of certain uniform operations in the graphics processing domain (for example, matrix operations). This area is known as general-purpose computing on graphics processing units (GPGPU). One way to use the different processors in computers is the Open Computing Language (OpenCL), which is specified as an open standard like the OpenGL by the Khronos Group. However, an in-depth discussion of this topic would go beyond the scope of this book.
2.7 OpenGL Programming with JOGL This section explains the basic mechanisms for graphics programming using the OpenGL binding Java OpenGL (JOGL). Sections 2.8 and 2.10 contain simple examples that illustrate how a JOGL renderer works. The URL https://jogamp.org contains the Java archives for installing JOGL and further information, such as tutorials and programming examples. The main task in graphics programming using the OpenGL binding JOGL is to program a renderer that forwards the drawing commands to the OpenGL for drawing on the GPU. The integration of such a renderer into the JOGL architecture and the connection to a JOGL window (GLWindow) is shown in Fig. 2.4 using a UML class diagram.5 To be able to react to OpenGL events, the renderer must implement the methods init,display, reshape and dispose defined by the interface GLEventListener. The interface GLAutoDrawable calls these methods when certain events occur. To receive these calls, an object of the renderer must be registered as an observer of an object that implements the GLAutoDrawable interface and that is to be observed accordingly. In the figure, the class GLWindow implements this interface.6
5 Introductions to Unified Modelling Language (UML) class diagrams can be found, for example, in [11]. 6 This mechanism is based on the observer design pattern, also called listener pattern. Details of this design pattern are, for example, in [2].
34
2 The Open Graphics Library (OpenGL)
Fig. 2.4 Integrating a graphics application renderer into the JOGL architecture
By using an object of the JOGL class GLWindow, a window (of the JOGL system) is created that can be used with Java on a Windows operating system, for example. See https://jogamp.org for instructions on how to integrate a GLWindow object into other windowing systems, such as the Abstract Windowing Toolkit (AWT) or Swing. This allows the connection to a Frame object or a Jframe object as an alternative output window. Figure 2.4 shows that in the main method of the Main class a GLWindow object and a Renderer object are created. Subsequently, the addGLEventListener method is used to register the Renderer object as an observer of the GLWindow object. To realise a graphic animation, the display method of the GLWindow object is usually called regularly by an animation object. This is done, for example, 60 times per second in order to achieve a smooth rendering of the (moving) graphic content. After a call of this method, through the GLAutoDrawable and GLEventListener interfaces, the display methods of all registered Renderer objects are called, to trigger drawing of the OpenGL content by the Renderer objects. In the figure, there is only one Renderer object, which represents the simplest case.
2.7 OpenGL Programming with JOGL
35
Fig. 2.5 Basic structure of a JOGL renderer (Java)
Calling the other methods of the interface GLEventListener works according to the same mechanism as for the display method. The init method is called when the OpenGL context is created. This is the case after starting the program. The method reshape is called when the size of the window to which the Renderer object is connected changes. This occurs, for example, when the user enlarges the window with the mouse. The method dispose is called just before the OpenGL context is destroyed, for example, by the user exiting the program. This method should contain source code that deallocates resources that were reserved on the GPU. Figure 2.5 shows the basic structure of a JOGL renderer with implemented but empty methods of the GLEventListener interface. As explained in this section, the methods of the renderer are invoked through the GLAutoDrawable interface, which is implemented and used by a GLWindow object. The creation of objects of class GLWindow takes place using a parameter of type GLCapabilities, by which the OpenGL profile used (see Sect. 2.4) and
36
2 The Open Graphics Library (OpenGL)
Fig. 2.6 JOGL source code to set an OpenGL profile and to create a window and a renderer (Java) Table 2.4 JOGL profiles to select the OpenGL profile and OpenGL version: A selection of profiles for desktop applications is shown JOGL profile name
Compatibility profile
Core profile
OpenGL versions
GL2
X
–
1.0–3.0
GL3bc
X
–
3.1–3.3
GL4bc
X
–
4.0–4.5
GL3
–
X
3.1–3.3
GL4
–
X
4.0–4.5
the OpenGL version are specified. Figure 2.6 shows the source code lines to set the OpenGL profile and the OpenGL version and to create a GLWindow object. A Renderer object is then created and registered as the observer of the GLWindow object. In the example, the JOGL profile GL2 is selected, which activates the OpenGL compatibility profile and supports all methods of OpenGL versions 1.0 to 3.0. Table 2.4 provides an overview of selected profiles for applications for desktop computers and notebooks. There are also profiles for OpenGL ES support and profiles containing subsets of frequently used OpenGL methods (not shown in the table). In order for the graphics application to run on many graphics cards, a JOGL profile with a version number as low as possible should be selected. It should be noted that very efficient graphics applications can be created with the help of the core profile.
2.8 Example of a JOGL Program Without Shaders In this section, the basic principles of programming a renderer with the OpenGL binding JOGL are illustrated by means of an example. To simplify matters, this example uses functions of the fixed-function pipeline and not yet shaders. Building on this, the following sections of this chapter explain how to use the programmable pipeline, including shaders.
2.8 Example of a JOGL Program Without Shaders
37
Fig. 2.7 Init method of a JOGL renderer (Java)
For this purpose, consider the Java source code of the init method in Fig. 2.7 and the display method in Fig. 2.8, through which the triangle shown in Fig. 2.9 is drawn. The first line of the init method or the display method stores in the variable gl the OpenGL object passed by the parameter object drawable. The object in this variable gl is used to draw the graphics content using OpenGL. This gl object plays a central role in JOGL, as all subsequent OpenGL commands are executed using this object (calls of methods of the gl object). These OpenGL commands and their parameters are very similar to the OpenGL commands used in OpenGL programming interfaces for the C programming language. Therefore, OpenGL source code for the C programming interface can be easily translated into JOGL source code from relevant OpenGL literature or web sources. Furthermore, the OpenGL version and the OpenGL profile are selected in the first line of the renderer methods by choosing the type for the gl object (see Table 2.4). In this case, it is the compatibility profile up to OpenGL version 3.0. The selected OpenGL profile must match the selected OpenGL profile when creating the output window (GLWindow; see Fig. 2.6). It is worth remembering that the init method is executed once when the OpenGL program is created, typically shortly after the application is started. The display method is called for each frame, typically 60 times per second. In the init method (Fig. 2.7), after saving the gl object, an object is created for accessing the OpenGL Utility Library (GLU), which is later used in the display method. This library provides useful functions that are not directly in the OpenGL language scope. The command glClearColor defines the colour of the background of the drawing area. The background colour is specified as an RGB colour triple with a transparency value (alpha value) as the last component. Valid values for the individual components lie in the interval [0, 1] (see Chap. 6 for explanations on colour representations). In this example, the non-transparent colour white is set. Since OpenGL is a state machine, once set, values and settings are retained until they are overwritten or deleted. In this example, the background colour is not to be changed by the animation, so it is sufficient to set this colour once during initialisation. In the display method (Fig. 2.8), after the gl object has been saved, the glClear command initialises the colour buffer and the depth buffer to build up the display for a new frame. The previously defined background colour for the colour buffer is used.
38
2 The Open Graphics Library (OpenGL)
Fig. 2.8 Display method of a JOGL renderer (Java)
Fig. 2.9 A triangle drawn with the OpenGL binding JOGL
2.8 Example of a JOGL Program Without Shaders
39
The following two commands compute the model view matrix (see Sect. 2.5.1). First, the model view matrix stored in the gl object is reset to the unit matrix to delete the calculation results from the last frame. This is followed by the use of the glu object already created in the init method. By gluLookAt , the view matrix is computed and multiplied by the existing matrix in the gl object. Here, the first three parameter values represent the three-dimensional coordinates of the viewer location (eye point, camera position). The next three parameter values are the threedimensional coordinates of the point at which the viewer (or camera) is looking at. In this example, this is the origin of the coordinate system. The last three parameter values represent the so-called up-vector, which indicates the direction upwards at the viewer position. This allows the tilt of the viewer (or camera) to be determined. In this case, the vector is aligned along the positive y-axis. After calculating the view transformation, commands can be executed to perform a model transformation, such as moving, rotating or scaling the triangle. In this case, the glRotatef command rotates the triangle by 10 degrees around the zaxis. It should be noted that OpenGL uses a right-handed coordinate system. If no transformations have taken place, the x-axis points to the right and the y-axis points upwards in the image plane. The z-axis points in the direction of the viewer, i.e., it points out of the image plane. After initialisation (and without any transformation having been applied), the origin of the coordinate system is displayed in the centre of the viewport. At the edges of the viewport are the coordinates −1 (left and lower edge) and 1 (right and upper edge), respectively. The axis around which to rotate is specified by the last three arguments in the glRotatef call. In this case, it is the z-axis. The amount of rotation in degrees around this axis is determined by the first argument of this command. This operation multiples this model transformation with the model view matrix in the gl object by matrix multiplication and stores the result in the gl object. Details on geometric transformations are in Chap. 5. The glColor4f method sets the (foreground) colour for subsequent drawing commands. Since there are different variants of this command, the number at the end of the command name indicates how many arguments are expected. In this case, it is an RGBA colour value. The last letter of the command name indicates the expected type of arguments. The f in this case sets the argument type to float so that the colour components can be specified as floating point values in the interval [0, 1]. In the OpenGL command syntax, there are a number of command groups that enable different parameter formats. These are distinguished by the number– letter combination mentioned above. The glRotate command, for example, is also available as a double variant and is then called glRotated. The actual drawing of the triangle is triggered by the glBegin/glEnd block. The glBegin command specifies the geometric primitives to be drawn, which are defined by the vertices specified within the block. These are (separate) triangles in this example. The glVertex3f commands each defines the three-dimensional position coordinates of the individual vertices. In this case, these are the vertices of the triangle to be drawn. Section 3.2 provides explanations of the geometric primitives in the OpenGL. Section 3.3 contains details on the possible drawing commands in the OpenGL.
40
2 The Open Graphics Library (OpenGL)
Since the geometric transformations involve matrices and matrix operations in which the new matrix (in the program code) is the right matrix in the matrix multiplication operation,7 the order of execution of the transformations is logically reversed (see Sect. 5.1.1). In the above example, the triangle is logically drawn first and then the model transformation and the view transformation are applied. In order to understand these steps, the program code must be read from bottom to top. The full project JoglBasicStartCodeFFP for this example can be found in the supplementary material to the online version of this chapter. The source code includes implementations of the reshape and dispose methods and the control of the renderer by an animation object, which are not shown here. If fixed-function pipeline commands are used, then the dispose method can often be left empty. Furthermore, the supplementary material to the online version of this chapter contains the project JoglStartCodeFFP, which enables the interactive positioning of the camera using the mouse and keyboard and is well suited as a starting point for own JOGL developments. It should be noted that the example in this section uses commands of the fixedfunction pipeline that are only available in the compatibility profile. These include in particular the drawing commands using the glBegin/glEnd block and the GLU library. Both are no longer available in the core profile.
2.9 Programming Shaders Special high-level programming languages have been developed for programming shaders, which are very similar to high-level programming languages for developing CPU programs, but take into account the specifics of a graphics pipeline. For example, Apple and Microsoft have developed the Metal Shading Language (MSL) and the High-Level Shading Language (HLSL) for their 3D graphics or multimedia programming interfaces Metal and DirectX, respectively. The high-level shader programming language C for Graphics (Cg) was designed by Nvidia for the creation of shader programs for DirectX or OpenGL. Since 2012, however, Cg has no longer been developed. For programming OpenGL shaders described in Sect. 2.6, the Khronos Group developed the OpenGL Shading Language (GLSL) (see [4]), whose syntax is based on the syntax of the C programming language.
2.9.1 Data Flow in the Programmable Pipeline Figure 2.10 shows in an abstract representation the data flow from the graphics application (left side) to the programmable OpenGL graphics pipeline (right side). For simplification, the possible data feedback from the graphics pipeline to the
7 Please remember matrix multiplication is not commutative, i.e., the order of the operants must not be reversed.
2.9 Programming Shaders
41
application and the optional shaders (geometry and the two tessellation shaders) are not shown. Furthermore, the non-programmable blocks of the graphics pipeline between the two shaders have been combined into one function block. During a typical graphics animation, a set of vertices, usually storing positional coordination in three-dimensional space (3D coordinates), colour values, normal vectors and texture coordinates, is continuously passed to the vertex shader. This vertex data is passed via user-defined variables, which are defined in the vertex shader (user-defined in variables). These variables are also referred to as attributes in the case of the vertex shader. This logical view of data transfer is intended to increase the understanding of how the graphics pipeline works. In fact, in a concrete OpenGL implementation, the vertex data is usually buffered and transferred in one block to increase efficiency (see Sects. 2.10 and 2.11).
Fig.2.10 Data flow in the OpenGL programmable graphics pipeline: The geometry and tessellation shaders and the feedbacks to the application are not shown
42
2 The Open Graphics Library (OpenGL)
Furthermore, both the vertex and fragment shaders receive data via so-called uniforms. These are another type of variable that can be defined in the respective shader and are independent of a single vertex or fragment. Typically, transformation matrices and illumination data are passed to shaders via uniforms. A third data path can be used to pass textures (texture data, texel data) to a texture storage from which the shaders can read. The texture storage is not shown in Fig. 2.10 for simplicity. Conceptually, a vertex shader is called exactly once for each input vertex. However, optimisations are possible in the respective implementation of the pipeline, which can lead to fewer calls. In addition, parallelising the execution of the vertex shader is allowed for processing multiple vertices at the same time. The result of processing a vertex by the vertex shader can be passed on to the next non-programmable function block of the graphics pipeline via user-defined out variables or special out variables. An important predefined out variable is, for example, gl_Position, through which the newly calculated position coordinate of the vertex is passed on. An essential non-programmable function block between the vertex and fragment shaders is rasterisation. In rasterisation, fragments are created from vertices by converting the vector graphic into a raster graphic. Usually, several fragments are created from a few vertices (see Sect. 2.5.4 and Chap. 7). For this conversion, it can be specified whether the values of the output variables of the vertex shader (out variables) are taken as values for the input variables (in variables) of the fragment shader and copied into several fragments by flat shading (see Sect. 2.5.2) or whether the values for the newly created fragments are determined by linear interpolation. The names of the user-defined input variables of the fragment shader must be the same as the user-defined output variables of the vertex shader in order to enable a mapping of the rasterised fragment data to the vertex data. For example, a predefined input variable for the fragment shader is gl_FragCoord, which contains the position coordinate of the fragment in question and was determined by rasterisation. Based on the input variables, the uniforms and the texture data, the fragment shader typically performs the calculations for the illumination of the scene or the mapping of textures onto objects (texture mapping). The calculation results are passed on to the next stage of the graphics pipeline via user-defined and predefined output variables. For example, gl_FragColor determines the colour for further processing. However, this predefined variable is only available in the compatibility profile. User-defined variables can be used to pass the colour values to the subsequent stages in the core profile, which are mapped to the input of a specific output channel via a special mechanism (via output layout qualifiers). In case only one output variable is defined, it is mapped by default to channel 0 (layout (location = 0)), which in turn allows output to the framebuffer. Another predefined output variable of the fragment shader, which is also defined in the core profile, is gl_FragDepth. This allows the depth value of a fragment in the fragment shader to be changed and passed on to the next level. Predefined and user-defined input variables (in variables) can vary per vertex or fragment. In the vertex shader, these variables are used to pass geometry data. In contrast, uniforms are usually constant for each geometric primitive. A typical use for uniforms is to pass transformation matrices and illumination data to the shaders.
2.9 Programming Shaders
43
The third type of data are textures (texture data), which are passed to the shaders via a texture storage. The texture storage can also be used to record the results of the fragment calculation and to reuse them accordingly. In programming terms, textures are special uniforms. For example, sampler2D denotes the type of a twodimensional texture.
2.9.2 OpenGL and GLSL Versions Since OpenGL is only a specification of a graphics interface and is constantly being extended, not every GPU and every graphics card driver will support the latest complete range of functions. This will be particularly noticeable when using older GPUs. If a professional graphics application is to function efficiently on a large number of computers with GPU support, it must react to this situation and, if possible, also be able to get by with a smaller OpenGL and GLSL feature set. For this purpose, the JOGL command glGetString reads out properties of the GPU, especially the version number of the supported OpenGL version. Depending on the range of functions found, alternative implementations of the graphics application or alternative shader code may have to be used. The specifications of the GLSL are subject to version numbering just like those of the OpenGL. However, it is only since OpenGL version 3.3 that the numbers of the OpenGL versions match those of the GLSL versions. The language range to be used in a GLSL shader program can be specified by a so-called preprocessor directive in the first non-commented line of a shader program. For example, the following directive specifies GLSL version 4.5. #version 450
Such a shader program cannot be executed on older graphics cards without the support of this version. Without specifying the profile, the core profile is set. However, the profile can be specified explicitly, as the following example shows. #version 450 core
The following directive switches to the compatibility profile. #version 450 compatibility
2.9.3 OpenGL Extensions The OpenGL offers a special mechanism for extending the range of functions. This enables, for example, manufacturers of graphics processors to implement so-called OpenGL Extensions. In order for these extensions to be widely used, the Khronos
44
2 The Open Graphics Library (OpenGL)
Group provides web pages with an Extensions Registry,8 which contains a list of known OpenGL Extensions. Whether a certain extension is available can be queried in the OpenGL application. Only if an extension is available, the corresponding OpenGL commands can be used.9 For example, the following JOGL command checks whether the extension for vertex buffer objects is supported by the graphics processor driver currently in use. boolean vboAvailable = gl.isExtensionAvailable( "GL_ARB_vertex_buffer_object");
The functionality of vertex buffer objects is explained in Sect. 2.10. Vertex buffer objects were only available as an extension in the past, but were included as part of the interface in the OpenGL specification as of version 1.5. In older versions, the extension can still be used. This procedure allows enhancements to the scope of the OpenGL graphics interface to be tested extensively before they are included in the specification. In the shading language GLSL, extensions must be explicitly enabled. For example, the following preprocessor directive enables an extension for clipping. #extension GL_ARB_clip_control : enable
The following directive activates all available extensions. #extension all : warn
If all extensions are activated in this way, a warning is given each time an extension is used. Directives for the activation of extensions must follow the version directive in the shader program, but must be placed before the shader source code.
2.9.4 Functions of the GLSL The shader language GLSL is based on the syntax of the programming language C and has similar data types. In addition, useful data types for graphics programming are available, such as two-, three- and four-dimensional vectors. Furthermore, matrices with different numbers of columns and rows from two to four are available. Thus, the smallest matrix provided is a 2 × 2 matrix and the largest matrix provided is a 4 × 4 matrix. In addition, the corresponding vector–matrix and matrix–matrix operations
8 References to the OpenGL Extensions can be found at the web address https://www.khronos.org/ registry/OpenGL/. 9 So-called extension viewers show which OpenGL versions and which extensions are available on a particular computer system. The company Realtech VR, for example, provides such software at https://realtech-vr.com.
2.9 Programming Shaders
45
are supported, which can be executed efficiently with hardware support depending on the GPU and driver implementation. These include operations for transposition, inversion and calculation of the determinant of matrices. Furthermore, special data types are available for accessing one-, two- and three-dimensional textures. The values of trigonometric, exponential and logarithmic functions can also be calculated. In the fragment shader, partial derivatives of functions can be determined for certain special cases. In contrast to the programming language C, no special libraries have to be integrated for these functionalities. Furthermore, the concept of overloading is present, whereby correspondingly suitable functions can exist under the same function name for different numbers and types of parameters. Overloading has been used for many predefined functions and can also be used for self-defined functions. A useful function of the GLSL language is the so-called swizzling, which allows to read individual vector components. The following three swizzling sets are available for addressing vector components. • Components of location coordinates: x, y, z, w. • Colour components: r, g, b, a. • Components of texture coordinates: s, t, p, q. The following example shows the effect of accessing vector elements. vec4 vec2 vec3 vec4 vec4 vec4
a b c d e f
= = = = = =
vec4 (0.1, a.gb; // a.zyx; // a.stpq; // a.rrgb; // a.rgxy; //
0.4, 0.7, 1.0); -> (0.4, 0.7) -> (0.7, 0.4, 0.1) -> (0.1, 0.4, 0.7, 1.0) -> (0.1, 0.1, 0.4, 0.7) error, mixture of sentences
With the help of the so-called write masking, the vector components can be changed individually. The following example shows the effect of this operation. vec4 a = vec4 (0.1, 0.4, 0.7, 1.0); a.r = 1.0; // -> (1.0, 0.4, 0.7, 1.0) vec3 b = vec3(0.7, 0.0, 0.2); a.xyz = b; // -> (0.7, 0.0, 0.2, 1.0) vec4 c = vec4(0.1, 0.4, 0.7, 1.0); a.bgra = c; // -> (0.7, 0.4, 0.1, 1.0) a.rgbr = c; // error, r is double a.rgbw = c; // error, mixture of sentences
The GLSL specification [4] and the OpenGL Programming Guide [8] provide a comprehensive definition and description of the OpenGL Shading Language. This includes a listing of the possible predefined variables (special input and output variables). A very condensed introduction is available in [6, pp. 59–71]. See [3, pp.
46
2 The Open Graphics Library (OpenGL)
927–944] for examples of GLSL shaders. There is additional material on the web for the OpenGL SuperBible [12], including examples of GLSL shaders.
2.9.5 Building a GLSL Shader Program The OpenGL Shading Language (GLSL) is a high-level programming language whose source code must be compiled and bound in a manner similar to the C and C++ programming languages. The tools for this are in the driver for the graphics processor. In Fig. 2.11, the typical sequence of commands for creating a shader program consisting of a vertex shader and a fragment shader is shown as a UML sequence diagram10 from the perspective of a Java application.11 Here, (the references to) the shaders and the shader program are understood as objects. The parameter lists of the messages (which correspond to method calls in Java) are greatly simplified or not specified for the sake of a clear presentation. The methods shown must be called by the graphics application, for example, by the init method of the OpenGL renderer object (:Renderer). For each shader to be compiled, a shader object is first created. This object can be referenced by an integer value (type int). Afterwards, the command glShaderSource assigns the shader source code string to a shader object, which is then compiled by glCompileShader. When the source code for all shaders has been compiled, a shader program object is created, which can again be referenced by an integer value. After the shader objects have been assigned to the program by glAttachShader, the translated shaders are linked to an executable shader program by glLinkProgram. After this last step, the assignments of the shader objects to the shader program object can be removed again and the memory occupied by this can be released. This is done with the commands glDetachShader and glDeleteShader. If changes to the executable shader program are planned in the further course and rebinding becomes necessary, then these objects can be retained, modified if necessary and used for the renewed creation of an executable program. To use the executable shader program, it must be activated by glUseProgram. Figure 2.11 shows an example of the integration of two shaders. With the help of this mechanism, shader programs can be created that consist of all available shader types. Conceptually, the generation of the shader objects and the shader program takes place on the GPU. Due to the possibility of different OpenGL implementations, it is not determined or predictable whether, for example, the shader compilation and binding to an executable shader program takes place entirely on the CPU and only the result is transferred to the GPU or whether a transfer to the GPU already takes place earlier.
10 For
introductions to Unified Modelling Language (UML) sequence diagrams, see, for example, [11]. 11 An alternative representation of the command syntax can be found in [5, p. 73].
2.9 Programming Shaders
47
Fig. 2.11 UML sequence diagram for the generation of a shader program
Since different objects of the 3D scene are usually rendered using different shaders, several shader programs can be created, each of which is activated by
48
2 The Open Graphics Library (OpenGL)
glUseProgram before the object in question is rendered.12 If different objects use different shaders, glUseProgram is called multiple times per frame (in the display method). The remaining commands shown in Fig. 2.11 only need to be called once in the init method if no shader programs need to be changed or created during the rendering process. Another way of including shaders is to use the Standard Portable Intermediate Representation (SPIR). SPIR was originally developed for the Open Computing Language (OpenCL), which enables parallel programming of the different processors of a computer system. The SPIR-V version is part of the newer OpenCL specifications and part of the Vulkan graphics programming interface. Through an OpenGL 4.5 extension, SPIR-V can also be used in OpenGL programs.13 With suitable tools, shaders can in principle be developed in any shading language and then compiled into the intermediate language SPIR-V. The SPIR-V binary code is not easily readable by humans. This allows shaders to be compiled into SPIRV binary code, which can then be shared with other developers without having to expose development concepts or programming details through open-source code. The integration of shaders into SPIR-V binary code is very similar to the process shown in Fig. 2.11. Instead of assigning a GLSL string to a shader object and then compiling it, the pre-compiled SPIR-V binary code, which can also contain multiple shaders, can be assigned to shader objects with glShaderBinary. Subsequently, the entry points into the binary code are defined by the glSpecializeShader command. The direct mixing of GLSL shaders and SPIR-V shaders in a shader program is only possible if GLSL shaders have previously been converted to SPIRV binary code using external tools.
2.10 Example of a JOGL Program Using GLSL Shaders Building on the previous sections and the example from Sect. 2.8, which uses the fixed-function pipeline, the use of the programmable pipeline is explained below. Only commands of the core profile are used and shaders are integrated that are programmed in the shading language GLSL. To illustrate the basic mechanisms, this program again draws a simple triangle (see Fig. 2.16). The full Java program JoglBasicStartCodePP is available as supplementary material to the online version of this chapter. Figure 2.12 shows the init method of the renderer. First, the OpenGL object is stored in the variable gl. Since this is of type GL3, only commands of the core profile are available (see Table 2.4). A vertex shader and a fragment shader must be
12 Since switching shader programs means a certain workload for the GPU, the number of switching operations can be minimised by using identical shader programs and sorting the 3D objects appropriately before drawing. 13 In the JOGL version 2.3.2 from 2015, SPIR-V is not supported.
2.10 Example of a JOGL Program Using GLSL Shaders
Fig. 2.12 Init method of a JOGL renderer using only the core profile (Java)
49
50
2 The Open Graphics Library (OpenGL)
included when using this profile; otherwise, the result of the program execution is undefined. The next step is the creation of an object for the use of a JOGL class for the calculation of transformation and projection matrices. This class PMVMatrix can be used very well together with the core profile. The class GLU is not available for the core profile. The source code of the vertex and fragment shaders is loaded by the following two commands and used to create an executable shader program. The class ShaderProgram.14 is used, which applies the command sequence shown in Fig. 2.11. By default, the drawing of 3D objects of the scene in the core profile takes place via buffers (more precisely via vertex buffer objects (VBO)), which are created in the init method on the GPU and filled with object data. Drawing from such a buffer object takes place independently on the GPU when a draw command is called (usually) in the display method. Since a large part of the object data of a 3D scene often changes little or not at all, little or no data needs to be retransmitted to the GPU. This makes this approach very efficient for drawing. For this reason, the drawing method of the compatibility profile used in the example from Sect. 2.8 is no longer available in the core profile. In the source code in Fig. 2.12, a Java array is stored in the variable vertices, which contains the vertex data for the triangle to be drawn. The three-dimensional position coordinates and the three-dimensional colour values in the RGB colour space are arranged alternately. The data for a vertex thus consist of six consecutive floating point values, which usually come from the interval [0, 1] for the position coordinates and colour values. This interleaved arrangement of the data means that only one buffer is needed for all vertices. Alternatively, two separate buffers can be used for the position coordinates and the colour data. The processing of the vertex data, in this case called vertex attributes, takes place during drawing by the vertex shader. Since the use of buffer objects on the GPU leads to very efficient graphics programs, OpenGL has a large number of such buffer objects that are suitable for different purposes. By creating a buffer object, (conceptually) memory is reserved on the GPU that can be filled for the corresponding purpose. In this example, vertex buffer objects are used for drawing. To use this buffer type, a vertex array object (VAO) must also be created and used. Vertex array objects (VAOs) are buffers on the GPU that store all the states for the definition of vertex data. Both the format of the data and the references to the necessary buffers (for example, to—possibly several—vertex buffer objects) are stored.15 Here, the data is not copied, but only references to the relevant buffers are stored. Using the concept of vertex array objects, a single glBindVertexArray
14 The
class ShaderProgram is not to be confused with the class of the same name of the JOGL binding. The class used here, together with the entire example program, can be found in the supplementary material to the online version of this chapter. 15 In fact, glBindBuffer(GL.GL_ARRAY_BUFFER, ...) does not change the state of a vertex array object (VAO). However, an assignment to a vertex array buffer takes place indirectly through the command glVertexAttribPointer.
2.10 Example of a JOGL Program Using GLSL Shaders
51
command can thus switch the VAO, making all (previously prepared) necessary buffers for a particular 3D object to be drawn accessible and drawable by a drawing command. This is particularly helpful when many different 3D objects need to be drawn in a scene, which is almost always the case. The concept of vertex array objects makes it very clear that OpenGL is designed as a state machine. Once a state is set, it is maintained until it is explicitly changed. When a VAO is activated, the associated context is active until another VAO is activated. In the example considered here, the use of vertex array objects does not seem to be necessary, since only a single triangle is drawn and thus switching between several objects is not required. In order to be able to use vertex buffer objects, however, at least one vertex array object16 should always be present. As can be seen in the example in Fig. 2.12, the variable vaoHandle is defined to store the names (references) of the vertex array objects to be reserved on the GPU. The names of the buffers on the GPU are integers, so the data type for the Java array is int[]. Then glGenVertexArrays reserves exactly one vertex array object on the GPU. If the returned name (integer value) is less than one, an error occurred during creation. For example, there might not have been enough memory available on the GPU. Subsequently, glBindVertexArray activates this (newly created) VAO. As can be seen from Fig. 2.12, glGenBuffers reserves exactly one more buffer object on the GPU in which the vertex data for the triangle to be drawn is to be stored. After successful creation, glBindBuffer and the argument GL_ARRAY_BUFFER set this buffer to a vertex buffer object in which the vertex attributes to be processed by the vertex shader can be stored. Subsequently, the data from the Java array vertices prepared above is transferred to the GPU via the glBufferData command into the previously reserved vertex buffer object. The argument GL_STATIC_DRAW indicates that the vertex data is expected to be transferred to the GPU only once and read frequently (for rendering). The specification of other types of usage is possible. For example, GL_DYNAMIC_DRAW specifies that the data is frequently modified in the buffer and frequently read. These arguments serve as a hint for the OpenGL implementation to apply optimised algorithms for efficient drawing, but are not mandatory determinations of the use of the data. After the data has been transferred to the GPU, the following four OpenGL commands establish the connection between the (currently active) vertex buffer object and the vertex shader so that the latter can (later) process the correct vertex data from the correct buffer on the GPU. By glEnableVertexAttribArray(0), the so-called vertex attribute array with the name 0 is activated. This corresponds to the layout position 0 (layout (location = 0) ...) defined in the vertex shader (see Fig. 2.13). This layout position is linked in the shader to the input variable vPosition, which holds the position coordinates for the respective vertex. The OpenGL command glVertexAttribPointer (see Fig. 2.12) maps this layout
16 Implementations of some graphics card drivers may deliver the desired result in such a case even
without a VAO. For maximum compatibility, however, a VAO should always be used.
52
2 The Open Graphics Library (OpenGL)
Fig. 2.13 Source code of a simple vertex shader (GLSL)
position to the vertex attributes in the VBO. In this example, the arguments have the following meaning: • 0: Link the position data in the VBO to the vertex attribute array 0 (layout position 0) of the vertex shader. • 3: A vertex vector element consists of three components, namely the threedimensional position coordinates of the currently active VBO. • GL_FLOAT: The data is stored in float format. • false: No normalisation of data is required. • 6* Float.BYTES: The distance to the next record of position coordinates in bytes is six times the number of bytes occupied by a float value. Since three colour values are stored between each of the three position coordinates (see the variable vertices above), the distance between the first components of the position coordinates is six. This value is to be multiplied by the number of bytes for a floating point value. This is four in Java and is supplied by Float.BYTES. • 0: The offset from the first entry of the buffer under which the first component of the position coordinate is stored is 0. In this case, the first entry (index 0) of the buffer contains the first component of the position coordinates. Subsequently, glEnableVertexAttribArray(1) activates the vertex attribute array with the name 1. This corresponds to layout location 1 (layout (location = 1) ...) defined in the vertex shader (see Fig. 2.13). This layout position is connected to the input variable vColor in the shader and takes the colour values of the respective vertex. The OpenGL command glVertexAttrib Pointer (see Fig. 2.12) maps the layout position 1 to the colour data in the VBO. In this example, the arguments have the same values as for the position coordinates, except for the last argument. Since the first component of the colour values is stored in the VBO only after the first three position coordinates (see above the variable
2.10 Example of a JOGL Program Using GLSL Shaders
53
vertices), an offset of three (3) must be specified as the last argument for the glVertexAttribPointer call. This is again multiplied by the number of bytes for a float value in Java, since the distance between the buffer entries must be specified in bytes. At this point, it should be noted that the input variables vPosition and vColor of the vertex shader (see Fig. 2.13) are defined in the GLSL as three-dimensional vector types consisting of float components. This format must naturally match the format specification in the OpenGL application, which is the case in this example. In the GLSL, fvec*, dvec*, ivec* and uvec* vector types with float, double, integer and unsigned integer components are available. Here, the values 2, 3 or 4 can be used for the * character, so that vectors with two, three or four components can be used. As the last step of the init method, a white background colour is defined by glClearColor. Figure 2.14 shows the display method of the OpenGL renderer for this example. The first five commands are quite similar to the commands in the example in Fig. 2.8, where the fixed-function pipeline is used. First, the OpenGL object used for drawing is stored in the local variable gl. Then the command glClear clears the colour buffer and the depth buffer. Using the following two Java methods of the JOGL class PMVMatrix, the model view matrix (see Sect. 2.5.1) is calculated. In contrast to the example in Sect. 2.8, no transformation matrix is stored in the gl object. The calculation takes place completely separately from the OpenGL by the CPU. Only after a final transformation matrix has been determined, it is passed on to the GPU. First, the model view matrix is reset to the unit matrix in order to delete the calculation results from the last frame. By gluLookAt, the matrix for the view matrix is calculated and multiplied by the existing matrix in the pmvMatrix object. As explained above, the first three parameter values represent the three-dimensional coordinates of the viewer location (eye point, camera position). The next three parameter values are the threedimensional coordinates of the point at which the viewer is looking at. In this example, this is the origin of the coordinate system. The last three parameter values represent the up-vector, which indicates the direction upwards at the viewer position. In this case, the vector is aligned along the positive y-axis. After calculating the view transformation, commands can be executed to perform a model transformation, for example, the scene could be moved (translated), rotated or scaled. In this case, the glRotatef command is used to rotate the triangle 15 degrees around the z-axis. The glBindVertexArray command switches to the vertex array object (VAO) that was specified by the name passed. As already explained, this establishes the necessary state for drawing a specific 3D object. In this case, the vertex buffer object (VBO) transferred in the init method to the GPU becomes accessible, which contains the vertices with the position coordinates and colour values for the triangle to be drawn. The command glUseProgram activates the corresponding shader program containing the compiled vertex and fragment shaders. For this simple example, it would have been sufficient to call this command in the init method, since no other shader
54
2 The Open Graphics Library (OpenGL)
Fig. 2.14 Display method of a JOGL renderer using only the core profile (Java)
program exists. As a rule, however, different 3D objects are displayed by different shaders, so that it is usually necessary to switch between these shader programs within a frame and thus within the display method. The two glUniformMatrix4fv commands pass the projection matrix and the model view matrix previously calculated using the pmvMatrix object as uniform variables to the OpenGL so that they can be used in the shaders. For the transfer of uniform variables, a large number of command variants exist for scalars, vectors and matrices of different dimensions and different primitive data types (such as int, float or double). For a complete list of these commands, distinguished by the last characters (number and letters) of the command name, refer to the GLSL specification [4] or the OpenGL Programming Guide [5, p. 48]. The arguments in the first call of glUniformMatrix4fv have the following meaning in this example:
2.10 Example of a JOGL Program Using GLSL Shaders
55
• 0: The data is passed via the uniform variable of layout position 0. The count of the layout positions of the uniforms is different from the layout positions for the vertex attribute arrays (the input variables of the vertex shader). • 1: Exactly one data item (one matrix) is passed. • false: No transposition (swapping of rows and columns) of the matrix is required. • pmvMatrix.glGetPMatrixf(): The projection matrix read from the pmvMatrix object is passed. In the second glUniformMatrix4fv call, the model view matrix is passed in a similar way. The transformation matrices passed are four-dimensional, as this has advantages for the calculation of transformations (see Sect. 5.2). In this example, the two matrices should (sensibly) only be processed by the vertex shader. As can be seen in Fig. 2.13, two variables have been defined for reading the matrices in the vertex shader. The corresponding layout positions must match the layout positions used in the glUniformMatrix4fv commands to allow the correct data flow. For advanced readers, the integer value (reference) of a layout position defined in the vertex shader by location can be retrieved by the graphics application using the OpenGL command glGetAttribLocation and the variable name defined in the shader as argument. Thus, the variable name must be known in the graphics application instead of the layout position whose assignment is defined in the shader. A similar mechanism exists with glGetUniformLocation for the identification of the layout positions of uniforms. Both commands can be used since OpenGL version 2.0. The last command in the display method is a drawing command by which the triangle—from the perspective of the graphics application—is drawn. For this purpose, the vertex data is read from the vertex buffer object (VBO). Conceptually, the VBO including the data is already on the GPU. When exactly the actual transfer of data from the CPU to the GPU takes place depends on the optimisation decisions made for the respective driver implementation of the GPU. The argument GL_TRIANGLES specifies that triangles are to be drawn from the vertices in the VBO. The second argument specifies the index of the first vertex in the VBO that is used to draw the first triangle (offset). This allows multiple objects to be stored in a VBO if required. These different objects can be referenced accordingly by this parameter. The last argument determines the number of vertices to be used. Since a triangle is drawn in this example, three vertices must be used. For example, if there are six vertices in the buffer and the last argument of the drawing command is also six, two separate triangles would be drawn. Explanations of drawing commands in the OpenGL are available in Sect. 3.3. After calling a drawing command, each vertex is processed by the vertex shader (see Fig. 2.13). As explained above in connection with the init and display method, the user-defined input variables and the uniforms are defined at the beginning of the vertex shader source code. The user-defined output variable is the four-dimensional vector color. The GLSL source code to be executed in the main method defines the calculations per vertex. In this case, the multiplication of
56
2 The Open Graphics Library (OpenGL)
Fig. 2.15 Source code of a simple fragment shader (GLSL)
the projection matrix with the model view matrix and the position coordinates of the input vertex takes place. Since the vector for the position coordinates has only three components, but the matrices are 4x4 matrices, the value 1 is added as the fourth component of the homogeneous position coordinate. For an explanation of homogeneous coordinates, see Sect. 5.1.1. This computation in the shader results in a coordinate transformation from the world coordinates to the clip coordinates (see Sect. 5.41 for the detailed calculation steps performed). The result is stored in the predefined output variable gl_Position for further processing in the subsequent step of the graphics pipeline. The three-dimensional colour values are also extended by one component. The extension here is a transparency value (alpha value), which in this case specifies a non-transparent (opaque) colour. The four-dimensional result vector is stored in the user-defined output variable color. As explained in Sect. 2.5, the output data of the vertex shader is processed by the non-programmable function blocks vertex post-processing, primitive assembly, rasterisation and early-per-fragment operations; see also Fig. 2.10. The fragment shader shown in Fig. 2.15 can access the result of this processing. In particular, it should be noted here that a conversion of the vector graphic into a raster graphic has taken place through rasterisation. In this process, fragments were created from the vertices. Position coordinates and colour values are now assigned to each fragment. In this example, the colour values were interpolated linearly, which is not visible in the result due to the identical colour values for all input vertices. In the fragment shader, which is called for each fragment, the user-defined input variable color is defined. This must have the same name as the output variable for the colour values of the vertex shader, so that the colour values of the fragments created from the vertices can be processed correctly in the fragment shader. In complex shader programs, several output variables of the vertex shader can be used, which makes a correct assignment to the input variables of the fragment shader necessary. The output variable of the fragment shader is FragColor. In this simple example, the colour values generated by the fixed-function blocks of the graphics pipeline before the fragment shader are passed on to the function blocks after the fragment shader without any processing. In principle, such a shader can contain very complex
2.11 Efficiency of Different Drawing Methods
57
Fig. 2.16 A triangle drawn with a JOGL renderer and the programmable pipeline
calculations, usually for the illumination of objects. The colour value in FragColor is passed through a special mapping mechanism to the colour buffer, which contains the result to be displayed on the screen. Therefore, this output variable can be given (almost) any user-defined name. It should be noted that the fragment shader does not use the position coordinates of the fragments. This could be done by using the predefined input variable gl_FragCoord, but this is not necessary in this simple example. The calculation results of the fragment shader are passed on to the per-fragment operations stage so that the drawing result can be written into the framebuffer as the last step. Figure 2.16 shows the rendering result for output to the screen via the framebuffer.
2.11 Efficiency of Different Drawing Methods Figure 2.17 shows a pseudocode representation of the timing and overhead of data transfers to the OpenGL (for processing by the GPU) for the different drawing methods used in the examples from Sects. 2.8 and 2.10. In the example from Sect. 2.8 (left side of the figure), the vertex data is transferred by the glBegin/glEnd block in the so-called immediate mode to the GPU. The data is available to the OpenGL exactly when the data is needed. Since this takes place in the display method, the transfer of data in this example is carried out for each rendered image
58
2 The Open Graphics Library (OpenGL)
Fig. 2.17 Comparison of data transfer to the OpenGL when using different drawing methods in immediate mode and in the core profile
(in this case 60 times per second17 ). In the example from Sect. 2.10 (right side in the figure), the drawing is prepared in the init method by creating and binding buffers on the GPU. The transfer of the vertices to the OpenGL takes place exactly once using this method. The actual drawing from the buffer only needs to be triggered in the display method by a draw command (here by glDrawArrays). This allows the data once transferred to be reused in many frames. The figure shows the transfer to the OpenGL interface from the logical point of view of a Java application. At which point in time the data is actually transferred to the GPU depends on the specific implementation of the OpenGL driver and the specific GPU. Even if optimisations are implemented in the driver for the immediate mode, for example, by transferring data to the GPU in blocks, the (final) data is only available when it is needed. Little is known about the structure of the data and its planned use by the application. In
17 The
maximum frame rate can be set to values other than 60 frames per second in the JOGL examples JoglBasicStartCodeFFP and JoglBasicStartCodePP in the class for the main windows.
2.12 Exercises
59
contrast, when drawing in the core profile, the data is already available very early in buffer objects, ideally already during the one-time initialisation. Since typical 3D scenes consist of thousands or millions of vertices and the largest parts of a scene undergo only a few or only slow changes, this comparison makes it very clear how efficient the drawing method of the core profile is. The effort required to manage the buffer objects on the application side is thus worthwhile. For the sake of completeness, it should be noted that display lists and vertex arrays are available in the compatibility profile to draw more efficiently than with glBegin/glEnd blocks. However, these methods are no longer available in the core profile, as the functionality is efficiently covered by the use of buffer objects.
2.12 Exercises Exercise 2.1 General information about the OpenGL (a) What is the difference between the OpenGL programming interface specification and an OpenGL implementation? (b) What are the benefits of an open and standardised interface specification? (c) What makes an OpenGL implementation work on a computer? (d) In which programming language is OpenGL specified? (e) Please explain what is JOGL and what is LWJGL. Exercise 2.2 OpenGL profiles Explain the difference between the compatibility profile and the core profile of the OpenGL. How are the fixed-function pipeline and the programmable pipeline related to these profiles? Exercise 2.3 OpenGL pipeline (a) What is the purpose of vertex processing? (b) What is the purpose of rasterisation? (c) What is the purpose of fragment processing? Exercise 2.4 Programmable OpenGL pipeline (a) Which parts of the fixed-function pipeline have become programmable in the programmable pipeline? (b) What exactly is a shader? (c) Which shaders do you know and which functions can they perform? (d) Which shaders must always be present in the compatibility profile? (e) Which shaders must always be present in the core profile? Exercise 2.5 Explain the difference between a vertex, a fragment and a pixel.
60
2 The Open Graphics Library (OpenGL)
Exercise 2.6 Swizzling Given the following two vectors defined using GLSL: vec4 pos = vec4 (0.4, 0.2, -2.1, 1.0); vec3 col = vec3 (0.8, 0.7, 0.1);
Furthermore, the following result variables are given: float v1; vec2 v2; vec3 v3; vec4 v4; Specify the value that is in the variable to the left of the assignment after the respective GLSL instruction: (a) v1 = pos.x; (d) v1 = pos.w; (g) v1 = col.b; (j) v2 = pos.wx; (m) v3 = col.rgr; (p) v3 = col.bbr; (s) v4 = pos.qtsp; (v) v3 = pos.rbg;
(b) v1 = pos.y; (e) v1 = col.r; (h) v1 = col.a; (k) v3 = pos.xwy; (n) v3 = col.bgr; (q) v3 = pos.stp; (t) v4 = pos.qspw; (w) v3 = pos.bgq;
(c) v1= pos.z; (f) v1 = col.g; (i) v3 = pos.xyz; (l) v3 = pos.wyz; (o) v2 = col.ra; (r) v4 = pos.qtst; (u) v3 = pos.xys; (x) v3 = pos.zwr;
Exercise 2.7 Write masking Given the following vector defined using GLSL: vec4 pos = vec4 (0.8, 0.4, -1.1, 0.5);
Specify the value that is in the variable pos after the respective GLSL instruction: (a) pos.x = 0.5; (b) pos.y = 0.8; (c) pos.xw = vec2 (0.1, 0.2); (d) pos.zw = vec2 (0.1, 0.2); (e) pos.stp = vec3 (0.1, 0.2, (f) pos.rga = vec3 (0.1, 0.2, (g) pos.xys = vec3 (0.1, 0.2, (h) pos.abb = vec3 (0.1, 0.2,
0.3); 0.3); 0.3); 0.3);
References
61
References 1. J. F. Blinn. “Models of light reflection for computer synthesized pictures”. In: Proceedings of the 4th annual conference on Computer graphics and interactive techniques. SIGGRAPH ’77. ACM, 1977, pp. 192–198. 2. E. Gamma, R. Helm, R. Johnson and J. Vlissides. Design Patterns: Elements of Reusable Object-Oriented Software. Pearson Education India, 2015. 3. J. F. Hughes, A. van Dam, M. MaGuire, D. F. Sklar, J. D. Foley, S. K. Feiner and K. Akeley. Computer Graphics. 3rd edition. Upper Saddle River, NJ [u. a.]: Addison-Wesley, 2014. 4. J. Kessenich, D. Baldwin and R. Rost. The OpenGL Shading Language, Version 4.60.6. 12 Dec 2018. Specification. Abgerufen 2.5.2019. The Khronos Group Inc, 2018. URL: https://www. khronos.org/registry/OpenGL/specs/gl/GLSLangSpec.4.60.pdf. 5. J. Kessenich, G. Sellers and D. Shreiner. OpenGL Programming Guide. 9th edition. Boston [u. a.]: Addison-Wesley, 2017. 6. A. Nischwitz, M. Fischer, P. Haberäcker and G. Socher. Computergrafik. 4. Auflage. Computergrafik und Bildverarbeitung. Wiesbaden: Springer Vieweg, 2019. 7. B. T. Phong. “Illumination for Computer Generated Pictures”. In: Commun. ACM 18.6 (1975), pp. 311–317. 8. R. J. Rost and B. Licea-Kane. OpenGL Shading Language. 3rd edition. Upper Saddle River, NJ [u. a.]: Addison-Wesley, 2010. 9. M. Segal and K. Akeley. The OpenGL Graphics System: A Specification (Version 4.6 (Compatibility Profile) - October 22 2019. Abgerufen 8.2.2021. The Khronos Group Inc, 2019. URL: https://www.khronos.org/registry/OpenGL/specs/gl/glspec46.compatibility.pdf. 10. M. Segal and K. Akeley. The OpenGL Graphics System: A Specification (Version 4.6 (Core Profile) - October 22, 2019). Abgerufen 8.2.2021. The Khronos Group Inc, 2019. URL: https:// www.khronos.org/registry/OpenGL/specs/gl/glspec46.core.pdf. 11. M. Seidl, M. Scholz, C. Huemer and Gerti Kappel. UML @ Classroom: An Introduction to Object-Oriented Modeling. Heidelberg: Springer, 2015. 12. G. Sellers, S. Wright and N. Haemel. OpenGL SuperBible. 7th edition. New York: AddisonWesley, 2016.
3
Basic Geometric Objects
This chapter describes basic geometric objects that are used in computer graphics for surface modelling. Furthermore, the planar basic objects used in the OpenGL and their use through OpenGL drawing commands are explained. The graphic primitive points, lines, triangles, polygons and quadrilaterals (quads) used in the OpenGL are considered in more detail. The OpenGL has graphic primitives to draw sequences of basic objects, allowing surfaces of objects to be represented efficiently. In addition, there are OpenGL drawing commands, such as indexed drawing, primitive restart and indirect drawing, to enable drawing by a graphics processor independent of the graphics application. Many of the concepts used in the OpenGL are also used in a similar form in other graphics systems. The content of this chapter provides the basis for understanding the modelling of surfaces of three-dimensional objects, which is presented in Chap. 4.
3.1 Surface Modelling A commonly used approach in computer graphics for the representation of (complex) objects is the modelling of their surfaces using basic geometric objects. Further possibilities for modelling three-dimensional objects are described in Chap. 4. The basic geometric objects in computer graphics are usually called graphics output primitives, geometric primitives or primitives. Three main types of primitives can be distinguished.
Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/978-3-031-28135-8_3.
© Springer Nature Switzerland AG 2023 K. Lehn et al., Introduction to Computer Graphics, Undergraduate Topics in Computer Science, https://doi.org/10.1007/978-3-031-28135-8_3
63
64
3 Basic Geometric Objects
Points are uniquely specified by their coordinates. Points are mainly used to define other basic objects, for example, a line by specifying the starting point and the endpoint. Lines, polylines or curves can be lines defined by two points, but also polylines that are connected sequences of lines. Curves require more than two control points. Areas are usually bound by closed polylines or defined by polygons. An area can be filled with a colour or a texture. The simplest curve is a line segment or a line characterised by a starting point and an endpoint. If several lines are joined together, the result is a polyline. A closed polyline is a polyline whose last line ends where the first begins. The region enclosed by a closed polyline defines a polygon. Thus, a polygon is a plane figure described by a closed polyline. Depending on the graphics program or graphics programming interface, only certain types of polygons are permitted. Important properties are that the polygon should not overlap with itself or its convexity. A polygon is convex if the line segment between two points of the polygon is contained in the union of the interior and the boundary of the polygon. The notion of convexity can be generalised to three-dimensional bodies. Figure 3.1 shows a self-overlapping, a non-convex and a convex polygon. For the non-convex polygon in the middle, a connecting line between two points of the polygon is indicated by a dashed line, which does not completely lie inside the polygon. Besides lines or piecewise linear polylines, curves are also common in computer graphics. In most cases, curves are defined as parametric polynomials that can also be attached to each other like line segments in a polyline. How these curves are exactly defined and calculated is described in Sect. 4.6.1. At this point, it will suffice to understand the principle of how the parameters of a curve influence its shape. Besides a starting point and an endpoint, additional control points are defined. In the context of parametric curves, these additional control points are called inner control points. Usually, one additional control point is used for the definition of a quadratic curve and two additional control points for the definition of a cubic curve. The curve begins at the start point and ends at the endpoint. It generally does not pass through inner control points. The inner control points define the direction of the curve in the two endpoints.
Fig. 3.1 A self-overlapping (left), a non-convex (middle) and a convex (right) polygon
3.1 Surface Modelling
65
Fig. 3.2 Definitions of quadratic and cubic curves using control points
In the case of a quadratic curve with one inner control point, consider the connecting lines from the inner control point to the starting point and to the endpoint. These imaginary lines form the tangents to the curve at the start and endpoints. Figure 3.2 shows on the left a quadratic curve defined by a start and endpoint and one inner control point. The tangents at the start and endpoints are drawn dashed. In the case of a cubic curve, as shown in the right part of the figure, the two tangents can be defined independently of each other by two inner control points. When curves are joined together to form a longer, more complicated curve, it is generally not sufficient for the endpoint of a curve to coincide with the start point of the respective following curve. The resulting joint curve would be continuous, but not smooth, and could therefore contain sharp bends. To avoid such sharp bends, the tangents at the endpoint of the previous curve and at the start point of the following curve must point in the same direction. This means the endpoint, which is equal to the start point of the next curve, and the two inner control points defining the two tangents must be collinear. This means they must lie in the same straight line. This can be achieved by the appropriate choice of control points. Therefore, the first inner control point of a succeeding curve must be on the straight line defined by the last inner control point and endpoint of the previous curve. Similarly, a curve can be fitted to a line without creating sharp bends by choosing the first inner control point to lie on the extension of the line. Figure 3.3 illustrates this principle. Other curves frequently used in computer graphics are circles and ellipses or parts thereof in the form of arcs of circles or ellipses. Circles and ellipses, like polygons, define areas. Areas are bounded by closed curves. If only the edge of an area is to be drawn, there is no difference to drawing curves in general. Areas, unlike simple lines, can be filled with colours or textures. Algorithmically, filling an area is very different from drawing lines (see Sect. 7.5). Axes-parallel rectangles whose sides are parallel to the coordinate axes play an important role in computer graphics. Although they can be understood as special cases of polygons, they are simpler to handle since it is already sufficient to specify
66
3 Basic Geometric Objects
Fig. 3.3 Smooth fitting of a cubic curve to a straight line
two diagonally opposite corners, i.e., two points. These rectangles can also be used as axis-aligned bounding boxes (AABB) (see Sect. 11.7). It can be very cumbersome to define complex areas by directly specifying the boundary curve. One way to construct complex areas from existing areas is to apply set-theoretic operations to the areas. The most important of these operations are union, intersection, difference and symmetric difference. The union joins two sets, while the intersection consists of the common part of both areas. The difference is obtained by removing from the first area all parts that also belong to the second area. The symmetric difference of set is the pointwise exclusive-OR operation applied to the two areas. In other terms, the symmetric difference is the union of the two areas without their intersection. Figure 3.4 shows the result of applying these operations to two areas in the form of a circle and a rectangle. Another way to create new areas from already constructed areas is to apply geometric transformations, such as scaling, rotation or translation. These transformations are illustrated in Sect. 5.1.
Fig. 3.4 Union, intersection, difference and symmetrical difference of a circle and a rectangle
3.2 Basic Geometric Objects in the OpenGL
67
3.2 Basic Geometric Objects in the OpenGL In the OpenGL it is common to represent (complex) objects by modelling the object surface with planar polygons. As a rule, this leads to polygon meshes, i.e., sets of interconnected polygons. Using these planar pieces of areas, curved surfaces must be approximated—possibly by a large number—of polygons until a desired representation quality is achieved. The use of polygon meshes offers the advantage that the further computations of the graphics pipeline can be performed very easily and efficiently. Therefore, this approach has been very common for a long time. With the availability of tessellation units in modern GPUs, it is now possible to work with curved surface pieces that can be processed on the GPU. The curved surfaces can be described precisely by a few control points, so that less complex data structures have to be transferred to the GPU than when modelling with planar polygons. Since modelling by polygons is very common in computer graphics and not all GPUs and existing graphics applications support tessellation on the GPU, the basics for this approach are presented below. The OpenGL provides geometric primitives for drawing points, lines and planar triangles. In the compatibility profile, convex polygons and quadrilaterals can also be used. Since complex objects require a large number of these primitives, the OpenGL allows several connected primitives to be drawn by reusing vertices from the preceding primitive to increase efficiency. Table 3.1 shows the types of geometric primitives available in the OpenGL. The entry contained in the left column is one of the arguments passed to a drawing command (see Sect. 3.3) and specifies how a geometric primitive is drawn from a sequence of n vertices v0 , v1 , ..., vn−1 . This sequence is called a vertex stream. In the following sections, source code extracts or commands are given to explain the essential procedure of drawing in the OpenGL. This source code and these commands can be used to modify and extend the basic examples for creating graphical objects with the compatibility profile (see Sect. 2.8) and the core profile (see Sect. 2.10). Furthermore, the supplementary material to the online version of this chapter contains complete JOGL projects that can be used to reproduce the contents of the following sections.
3.2.1 Points Figure 3.5 shows an example of points that are drawn using the GL_POINTS geometric primitive. For this purpose, the Java source code of the display method of a JOGL renderer shown in Fig. 3.6 can be used. Besides the definition of the vertex positions, the glBegin command specifies that the vertices are to be used to draw points. Furthermore, the size of the points to be displayed was set to a value of 10 using glPointSize. The points shown in the figure are square. In order to render round points, antialiasing and transparency can be enabled using the following command sequence.
68
3 Basic Geometric Objects
Table 3.1 Types of geometric primitives in the OpenGL Type
Explanation
Points GL_POINTS
A point is drawn for each vertex
Lines GL_LINE_STRIP
Two consecutive vertices are connected by a line, whereby the end vertex is always used as the start vertex of the subsequent line
GL_LINE_LOOP
A GL_LINE_STRIP is drawn and in addition a line from the last vertex to the first vertex
GL_LINES
Each two consecutive vertices that belong together define the start vertex and the end vertex between which a line is drawn. The lines are not connected to each other
Triangles GL_TRIANGLE_STRIP
A sequence of connected filled triangles is drawn, with the subsequent triangle reusing an edge of the previous triangle. A triangle is drawn from the first three vertices. Each subsequent triangle consists of the last two vertices of the previous triangle and one new vertex. The drawing order of the vertices defines the front and back sides of a triangle (see Sects. 3.2.3 and 3.2.4)
GL_TRIANGLE_FAN
A sequence of connected filled triangles is drawn, with the subsequent triangle reusing an edge of the previous triangle. A triangle is drawn from the first three vertices. Each subsequent triangle consists of the very first vertex, the last vertex of the previous triangle and one new vertex. The drawing order of the vertices defines the front and back side of a triangle (see Sects. 3.2.3 and 3.2.4)
GL_TRIANGLES
A filled triangle is drawn from each of three consecutive vertices that belong together. The respective triangles are not connected to each other. The drawing order of the vertices defines the front and back side of a triangle (see Sects. 3.2.3 and 3.2.4)
Polygon (only available in the compatibility profile) GL_POLYGON
A convex filled polygon is drawn from all vertices. The drawing order of the vertices defines the front and back side of a polygon (see Sects. 3.2.4 and 3.2.5)
Quadrilaterals (only available in the compatibility profile) GL_QUAD_STRIP
A sequence of connected, filled quadrilaterals is drawn, with the subsequent quadrilateral reusing an edge of the previous quadrilateral. A quadrilateral is drawn from the first four vertices. Each subsequent quadrilateral consists of the last two vertices of the previous quadrilateral and two new vertices. The drawing order of the vertices defines the front and back side of a quadrilateral (see Sects. 3.2.4 and 3.2.6)
GL_QUADS
A filled quadrilateral is drawn from each of four consecutive vertices that belong together. The respective quadrilaterals are not connected to each other. The drawing order of the vertices defines the front and back side of a quadrilateral (see Sects. 3.2.4 and 3.2.6)
3.2 Basic Geometric Objects in the OpenGL
69
Fig. 3.5 Example of an OpenGL geometric primitive for drawing points
Fig. 3.6 Example for drawing points (GL_POINTS) in the compatibility profile: Part of the source code of the display method of a JOGL renderer (Java)
// Enable point smoothing gl.glEnable(gl.GL_POINT_SMOOTH); // Enable transparency gl.glEnable(GL.GL_BLEND); gl.glBlendFunc(GL.GL_SRC_ALPHA, GL.GL_ONE_MINUS_SRC_ALPHA);
Antialiasing smoothes the edge pixels of each point, reducing the intensity of the grey values towards the edge and creating a rounded impression. This function is only available in the compatibility profile and has some disadvantages (see Sect. 7.6.2). As an alternative, a point can be modelled by a (small) circle to be drawn by the primitive GL_TRIANGLE_FAN (see below). Figures 3.7 and 3.8 show the relevant Java source code to draw points in the core profile. To do this, the vertex positions and their colour are defined in the init method. The definition of the colour per vertex is required when working with the shaders from Sect. 2.10. Alternatively, the colour in the shaders can be determined using the illumination calculation (see Chap. 9) or—for testing purposes—set to a fixed value in one of the shaders. In the display method, only the draw command glDrawArrays with the first parameter GL.GL_POINTS must be called. The
70
3 Basic Geometric Objects
Fig. 3.7 Parts of the source code of the init method of a JOGL renderer (Java) for drawing points (GL_POINTS) in the core profile
Fig. 3.8 Parts of the source code of the display method of a JOGL renderer (Java) for drawing points (GL_POINTS) in the core profile
transfer of the vertex data into a vertex buffer object takes place in the same way as in the example described in Sect. 2.10. The same shaders are used for rendering as in this example. The glPointSize command sets the size of the points in the same way as in the compatibility profile. Alternatively, this value can be set in the vertex or geometry shader. To do this, this function must be activated by the following command in the JOGL program. gl.glEnable(gl.GL_PROGRAM_POINT_SIZE);
Afterwards, the point size can be set to 10 in the vertex shader (GLSL source code), for example. gl_PointSize = 10;
The variable used, gl_PointSize, is a predefined output variable of the vertex or geometry shader. In order to be able to reproduce the examples in this section, the two complete projects JoglPointsFFP and JoglPointsPP are available in the supplementary material to the online version of this chapter.
3.2 Basic Geometric Objects in the OpenGL
71
3.2.2 Lines Figure 3.9 shows examples of the rendering result using the three different geometric primitives for drawing lines in the OpenGL. Below each of the examples is an indication of the order in which the vertices are used to represent a line segment. Figure 3.10 shows the relevant Java source code of the display method of a JOGL renderer for the compatibility profile to draw a polyline similar to the left part of Fig. 3.9. By replacing the argument in the glBegin command with GL.GL_LINE_LOOP or GL.GL_LINES, a closed polyline or separate lines can be drawn, as shown in the middle and right parts of Fig. 3.9. Figures 3.11 and 3.12 show the relevant Java source code to draw a polyline in the core profile (left part of Fig. 3.9). For this purpose, the vertex positions and their colours are defined in the init method. In the display method, only the drawing
Fig. 3.9 Examples of geometric primitives in the OpenGL for drawing lines
Fig. 3.10 Example of drawing a line strip (GL_LINE_STRIP) in the compatibility profile: Part of the source code of the display method of a JOGL renderer (Java) is shown
72
3 Basic Geometric Objects
Fig. 3.11 Parts of the source code of the init method of a JOGL renderer (Java) for drawing a polyline (GL_LINE_STRIP) in the core profile
Fig. 3.12 Parts of the source code of the display method of a JOGL renderer (Java) for drawing a polyline (GL_LINE_STRIP) in the core profile
command glDrawArrays must be called. The transfer of the vertex data into a vertex buffer object takes place in the same way as in the example described in Sect. 2.10. Likewise, the same shaders are used for rendering as in the example. By replacing the first argument in the glDrawArrays command with GL.GL_LINE_LOOP or GL.GL_LINES, a closed polyline or separate lines can be drawn, as shown in the middle and right parts of Fig. 3.9. The width of the lines can be set with the glLineWidth command. After the additional activation of antialiasing by the following JOGL command, smooth lines are drawn. gl.glEnable(GL.GL_LINE_SMOOTH);
For explanations on line widths, line styles and antialiasing, see Sect. 7.4. The full example projects JoglLinesFFP and JoglLinesPP used in this section can be found in the supplementary material to the online version of this chapter.
3.2 Basic Geometric Objects in the OpenGL
73
3.2.3 Triangles The top three images in Fig. 3.13 show rendering examples of the three geometric primitives in the OpenGL that are available for drawing triangles. In each case, triangles are shown that lie in a plane (planar triangles). Basically, a shape with three vertices can be curved in space. However, in order to define the curvature precisely, additional information—besides the three position coordinates of the vertices— would have to be available, which is not the case. Therefore, the three respective vertices are connected by a planar triangle. Thus, each individual triangle lies in a plane, but adjoining planar triangles can (in principle) be oriented arbitrarily in space and have (in principle) any size. This allows curved structures to be approximated. The size and number of triangles determine the accuracy of this approximation to a curved structure. In this book, triangles are always understood as plane/planar triangles. As already explained in Sects. 3.2.1 and 3.2.2 and shown in the respective source code, for the use of a specific geometric primitive, only the first argument of the drawing command has to be chosen accordingly. Thus, for drawing a sequence
Fig. 3.13 Examples of OpenGL geometric primitives for drawing triangles: In the lower part of the figure, the individual triangles are visible because the edges have been highlighted
74
3 Basic Geometric Objects
of connected triangles, a triangle fan or individual triangles, the first argument is GL.GL_TRIANGLE_STRIP, GL.GL_TRIANGLE_FAN or GL.GL_TRIANGLES, respectively. The full example source code for drawing triangles can be found in the projects JoglTrianglesFFP and JoglTrianglesPP, which are in the supplementary material to the online version of this chapter. In the lower images of Fig. 3.13, the edges have been highlighted to reveal the drawn individual triangles. Below each of the images, it is further indicated in which order the vertices are used by a geometric primitive to represent one triangle at a time. The vertices were passed to the GPU in ascending order of their indices, i.e., in the order v0 , v1 , v2 , v3 , v4 , v5 . The change of order for rendering takes place through the geometric primitive. By changing this order for a GL_TRIANGLE_STRIP or a GL_TRIANGLE_FAN, all rendered adjacent triangles have the same orientation. This can be used for modelling so that adjacent areas represent either the outer or inner surfaces of a modelled object (see Sect. 3.2.4). Using the geometric primitive GL_TRIANGLE_STRIP, adjacent triangles are drawn so that each subsequent triangle consists of an edge of the previous triangle. After the first triangle is drawn, only a single vertex is required for the next triangle. This makes this geometric primitive very efficient, both for storing and rendering objects. For long triangle sequences, this advantage over drawing individual triangles converges to a factor of three. For this reason, this primitive is very often used for modelling complex surfaces of objects. The geometric primitive GL_TRIANGLE_FAN also draws adjacent triangles where each subsequent triangle consists of an edge of the previous triangle. Since the first vertex is used for each of the triangles, a fan-shaped structure is created, a triangle fan. This makes this primitive well suited for approximating round structures. For example, if the first vertex is chosen to be the centre and each subsequent vertex is chosen to be a point on the edge, then a circle can be rendered. The use of the primitive GL_TRIANGLE_FAN is as efficient as the use of the primitive GL_TRIANGLE_STRIP. Also in this primitive, from the second triangle onwards, only one vertex is needed for each further triangle. Since any surface can be approximated with arbitrary accuracy by sequences of triangles, triangles are the most commonly used polygons in computer graphics. Another advantage of triangles is that a planar triangle can always be drawn from three vertices. Triangles cannot overlap themselves. Furthermore, triangles are always convex, which is not necessarily true for shapes with more than three corners. In graphics programming interfaces, further restrictions are therefore defined for quadrilaterals or pentagons, for example, so that these primitives can be drawn efficiently. Furthermore, the geometric primitives for drawing multiple adjacent planar triangles available on GPUs are implemented in a very memory and runtime efficient way, mainly due to the reuse of the adjacent edges (see above). The algorithms for rasterisation and occlusion computation for planar triangles are relatively simple and therefore very efficient to implement. These properties make planar triangles the preferred polygon in computer graphics.
3.2 Basic Geometric Objects in the OpenGL
75
By using two or three identical vertices, lines or points can be drawn from triangle primitives. Such degenerated triangles are useful when a single long triangle sequence is to be drawn, but the geometry of the modelled object does not permit the drawing of complete triangles within the sequence at all points of the object. Such a triangle is not visible if the triangle degenerated to a line is identical to an already drawn edge or if the triangle degenerated to a point coincides with an already drawn point (see Sect. 3.3.2). Another application for triangles degenerated into lines or points is to use them instead of the special geometric primitives for lines and points. This is sensible if the line or point primitives are less efficiently implemented as triangle sequences. As an example, a circle can be drawn as a substitute for a point with an adjustable size by using a GL_TRIANGLE_FAN. Points, lines and triangles can be used to approximate all the necessary objects in computer graphics. Any convex planar polygon can be represented by a GL_ TRIANGLE_STRIP or GL_TRIANGLE_FAN (see Sect. 3.2.5). For example, it is possible to draw a planar quadrilateral with two triangles. If a GL_TRIANGLE_ STRIP is used for this, even the number of vertices to be displayed is identical (see Sect. 3.2.6). For these reasons, only geometric primitives for drawing points, lines and triangles are present in the OpenGL core profile.
3.2.4 Polygon Orientation and Filling By a convention often used in computer graphics, the front face (front side) of a polygon is defined by the counter-clockwise order of the vertices (see Sect. 4.2) when looking at the respective face (side). Similarly, the back face of a polygon is defined by the clockwise order of the vertices. This order definition of naming (enumeration) of the vertices within a polygon is also called winding order of the respective polygon. Based on the vertex order given in Fig. 3.13, it can be seen that, according to this convention, all triangles are oriented with their front face towards the viewer. In the OpenGL, this convention is set as the default winding order. However, the command glFrontFace(GL_CW) reverses this order so that afterwards front faces are defined by the clockwise order of vertices. The glFrontFace(GL.GL_CCW) command restores the default state. CW and CCW are abbreviations for clockwise and counter-clockwise. The definition of front and back surfaces is so important because it reduces the effort for rendering. Polygon (and especially triangle) meshes are often used to model the surfaces of (complex) objects. These three-dimensional surfaces are in many cases closed and the interior of the objects is not visible from the outside. If all polygons for surface modelling have their fronts facing outwards and the observer is outside an object, then the inward facing backsides do not need to be drawn. If the observer is inside an object, then only the insides need to be shown and the outsides can be omitted from the rendering process. In other applications, both sides of a polygon are visible, for example, with partially transparent objects. If the drawing of the front or backsides is switched off (culling) in the cases where front or backsides are not visible, then the rendering effort decreases. The underlying mechanisms for
76
3 Basic Geometric Objects
determining the visibility of surfaces are presented in Sect. 8.2.1. In addition, the orientation of the surfaces is used for the illumination calculation (see Chap. 9). In the OpenGL, both sides of a polygon are drawn by default. The command gl.glEnable(GL.GL_CULL_FACE) suppresses the drawing of front or back faces or both as follows. • Frontface culling: gl.glCullFace(GL.GL_FRONT) • Backface culling: gl.glCullFace(GL._BACK) • Frontface and backface culling: gl.glCullFace(GL._FRONT_AND_BACK) These JOGL commands work in both the compatibility profile and the core profile. By default, polygons are displayed as filled polygons in the OpenGL. The command glPolygonMode changes this drawing mode. The edges of the polygons can only be rendered as lines or the vertices of the polygons (positions of the vertices) as points. The first parameter of this command uses GL_FRONT, GL_BACK or GL_FRONT_AND_BACK to determine the sides of the polygons that are to be affected. The second argument determines whether polygons are to be drawn as points (GL_POINT) or lines (GL_LINE) or whether the polygon is to be filled (GL_FILL). The following Java command sequence, for example, indicates that the front sides are to be filled and only the edges of the backsides are to be drawn: gl.glPolygonMode(GL.GL_FRONT, gl.GL_FILL); gl.glPolygonMode(GL.GL_BACK, gl.GL_LINE);
The mentioned commands for the polygon mode are available in the compatibility profile and core profile. Section 7.5 presents algorithms for filling polygons. Figure 3.14 shows the effects of different polygon modes using a sequence of connected triangles. The commands described in Sects. 3.2.1 and 3.2.2 for changing
Fig. 3.14 Examples of drawn front faces of a sequence of connected triangles (GL_TRIANGLE_STRIP) when using different polygon modes
3.2 Basic Geometric Objects in the OpenGL
77
the rendering of points and lines, for example, the point size or line width, also affect the representation of the polygons in the changed mode for drawing polygons. Using the two example projects JoglTrianglesFFP and Jogl TrianglesPP for drawing triangles, which can be found in the supplementary material to the online version of this chapter, the settings for polygon orientation and drawing filled polygons explained in this section can be reproduced.
3.2.5 Polygons Figure 3.15 shows a rendering example of an OpenGL geometric primitive for polygons. This primitive is only available in the compatibility profile. The vertices are used for drawing according to the order of the vertex stream transmitted to the GPU. As explained in Sects. 3.2.1 and 3.2.2 and shown in the source code, to use a particular geometric primitive, only the first parameter of the drawing command needs to be chosen accordingly. For example, to draw a polygon in a glBegin command, the first argument is GL.GL_POLYGON. The full source code for drawing polygons can be found in the project JoglPolygonFFP, which is in the supplementary material to the online version of this chapter. The OpenGL specification [2] requires that convex polygons are guaranteed to be displayed correctly. In a convex polygon, all points on a connecting line between two points of the polygon also belong to the polygon (see Sect. 3.1). In convex polygons, all vertices point outwards. The polygon in Fig. 3.15 is convex. Polygons that are not convex have inward indentations. Figure 3.16 shows a non-convex polygon. Not all
Fig. 3.15 Example of an OpenGL geometric primitive for drawing polygons: In the right part of the figure, the edges have been highlighted
78
3 Basic Geometric Objects
Fig. 3.16 Example of a non-convex polygon: The edges have been highlighted
points on the red connecting line between the two points P1 and P2 (that belong to the polygon) are inside the polygon. The inward indentation near vertex v1 is clearly visible. The top three images in Fig. 3.17 show the output of an OpenGL renderer for the non-convex polygon in Fig. 3.16. As soon as backface culling is switched on or if the polygon is filled, the rendering is no longer correct (image in the middle and on the right).1 In general, a non-convex polygon can be represented by a set of adjoint convex polygons. Since triangles are always convex, the same figure can be drawn by a triangle fan after decomposing this polygon into triangles (triangulation). The bottom three images in Fig. 3.17 now show correct rendering results in all polygon modes. Since every convex and planar polygon can be represented by a sequence of connected triangles (GL_TRIANGLE_STRIP) or a triangle fan (GL_TRIANGLE_STRIP), the geometric primitive for polygons is now only available in the compatibility profile. An OpenGL implementation will most likely triangulate each polygon and afterwards draw the resulting triangles. This will be the case in particular because the polygon primitive no longer exists in the core profile and the deprecated functions of the fixed-function pipeline will most likely be realised internally (within the graphics card driver or the GPU) through functions of the core profile.
1 The
figure shows the output of a particular implementation of the OpenGL. Since the OpenGL specification only guarantees the correct rendering of convex polygons, the output of another implementation (using a different GPU) may look different.
3.2 Basic Geometric Objects in the OpenGL
79
GL POLYGON
Without backface culling
With backface culling
With backface culling
GL TRIANGLE FAN
Without backface culling
With backface culling
With backface culling
Fig. 3.17 Example of rendering results for a non-convex polygon: The upper part of the figure shows the drawing results using the geometric primitive GL_POLYGON. The lower part of the figure shows the result after triangulation and rendering by a triangle fan (GL_TRIANGLE_FAN)
3.2.6 Quadrilaterals The top two images in Fig. 3.18 show examples of the two OpenGL geometric primitives for drawing quadrilaterals. As explained in Sects. 3.2.1 and 3.2.2 and shown in the source code, the use of a specific geometric primitive only requires the first argument of the drawing command to be chosen accordingly. These primitives are only available in the compatibility profile. Therefore, to draw a sequence of quadrilaterals or individual quadrilaterals, the first argument must be GL.GL_QUAD_STRIP or GL.GL_QUADS. Quad is the abbreviation for quadrilateral. The project JoglQuadsFFP, which is included in the supplementary material for the online version of this chapter, contains the full source code for this example. In the lower images of Fig. 3.18, the edges have been highlighted to reveal the individual quadrilaterals drawn. Below each of the images, it is further indicated in which order the vertices are used by a geometric primitive to represent a quadrilateral in each case. The vertices were passed to the GPU in ascending order of their indices, i.e., in the order v0 , v1 , v2 , v3 , v4 , v5 . The order in which the vertices are used for rendering in GL_QUAD_STRIP ensures that all adjacent quadrilaterals have the same orientation (see Sect. 3.2.4). Thus, adjacent faces are either outer or inner
80
3 Basic Geometric Objects
Fig. 3.18 Examples of OpenGL geometric primitives for drawing quadrilaterals: In the lower part of the figure, the individual quadrilaterals are visible because the edges have been highlighted
faces. For the primitive GL_QUADS, the vertices are used in the order in which they are specified. If the default OpenGL settings have not been changed, then all the quadrilaterals shown in Fig. 3.18 have their faces oriented towards the viewer (see Sect. 3.2.4). The geometric primitive GL_QUAD_STRIP draws sequences of adjacent quadrilaterals. Each subsequent quadrilateral consists of an edge of the previous quadrilateral. Therefore, after drawing the first quadrilateral, only two vertices are required for the next quadrilateral. This makes it very efficient to store and draw objects that are modelled by sequences of quadrilaterals. Since a quadrilateral can be represented by two triangles, any object constructed from a quadrilateral sequence can be represented by a triangle sequence (GL_TRIANGLE_STRIP). Figure 3.19 contrasts two approaches for drawing a rectangular figure. In both cases, the same number of vertices is necessary. Since this is true for the general case, both geometric primitives can be stored and drawn equally efficiently. However, modern graphics processors are optimised for drawing
3.3 OpenGL Drawing Commands
81
Fig. 3.19 Comparison of a quadrilateral sequence (GL_QUAD_STRIP) with a triangle sequence GL_TRIANGLE_STRIP using the same rectangular figure: The individual quadrilaterals and triangles are visible because the edges have been highlighted
triangle sequences, which is also due to the fact that they are supported in both the compatibility profile and the core profile of the OpenGL. Quadrilaterals are often used in computer graphics (rendering and modelling software) as they are well suited for modelling surfaces. However, triangle sequences offer more flexibility than quadrilateral sequences. For example, a monochrome non-illuminated cuboid with uniformly oriented faces can be represented by a single triangle sequence (GL_TRIANGLE_STRIP). When using quadrilaterals, at least two quadrilateral sequences (GL_QUAD_STRIP) are required for the same figure (see Sect. 4.4).
3.3 OpenGL Drawing Commands Using the introductory example for the compatibility profile, Sect. 2.8 explains how to draw a triangle using a glBegin/glEnd block. A parameter to the glBegin command specifies which geometric primitive (see Sect. 3.2) to draw. Since this drawing method transfers vertex data one by one from the graphics application to the GPU for each frame, this method is very inefficient. Therefore, so-called display lists and vertex arrays were introduced in the OpenGL, which help to transfer vertex data as blocks once to the GPU in order to use this data several times by drawing commands (of the application) for drawing objects. As a further development of this idea, with the core profile vertex buffer objects (VBO) were introduced. Vertex buffers are reserved memory areas on the GPU to hold the vertex data. After filling these buffers on the GPU, they can be used again and again for rendering by a drawing command of the application. These memory areas can be changed when objects are modified, which, however, does not or rarely occur with many objects in a scene. The use of this technique is explained in detail with the help of an example in Sect. 2.10 for drawing a triangle. This example uses the drawing command glDrawArrays. The use of vertex buffer objects is also the standard drawing technique in the core profile. Drawing with glBegin/glEnd blocks, display lists or vertex arrays is no longer available in the core profile. In order to speed up the drawing of objects and save memory on the GPU, the OpenGL follows the following basic approaches:
82
3 Basic Geometric Objects
1. Transferring the data in a block to the GPU, if possible before drawing. 2. Minimising the data (to be transferred) on the GPU by reusing existing data. 3. Reduction of the number of calls to drawing commands by the application. The first item is realised through the use of vertex buffer objects. The second item can be achieved by the use of primitives, by which a series of connected lines or triangles are drawn by one call to a drawing command. In the core profile, these are the GL_LINE_STRIP, GL_LINE_LOOP, GL_TRIANGLE_STRIP and GL_TRIANGLE_FAN primitives. All OpenGL primitives allow multiple points, lines or triangles to be created with one call to the glDrawArrays drawing command, so that the third item in the enumeration above is also supported. In addition to the glDrawArrays command (see the example from Sect. 2.10), there are other drawing commands and methods in OpenGL through which further efficiency improvements are possible according to the list presented above. The following sections present the main underlying ideas of some of these commands.
3.3.1 Indexed Draw Figure 3.20 shows a simple surface consisting of nine vertices drawn by an OpenGL renderer. This could be a small piece of the ground surface of a terrain. Figure 3.21 shows the top view of this surface. The vertices are marked by v j ( j = 0, 1, ...8). To draw this surface with separate triangles (GL_TRIANGLES) and glDrawArrays, 24 vertices have to be transferred into a vertex buffer on the GPU. A GL_TRIANGLE_STRIP primitive still needs 13 vertices (see below). In both cases, however, the vertices v3 , v4 and v5 must be entered several times in the buffer, since they belong to more than one triangle. This can be avoided by using the drawing command glDrawElements. This uses an index buffer (an additional buffer) whose elements point to the vertices in the vertex buffer. For this, similar to a vertex buffer object, an index buffer object (IBO) is reserved on the GPU and filled with the corresponding indices. Table 3.2 shows possible contents for the index and
Fig. 3.20 Example of a surface of nine vertices drawn using an OpenGL renderer: In the left part of the figure, the triangles are filled. In the right part of the figure, the edges of the triangles are shown as lines
3.3 OpenGL Drawing Commands
83
Fig. 3.21 Top view of a surface consisting of the nine vertices v0 , v1 , ..., v8 : The large numbers represent numbers of the triangles
Table 3.2 Contents of the vertex and index buffers for drawing the example surface in Fig. 3.20 Vertex buffer object (VBO) for drawing the example surface Vertex index j
0
1
2
3
4
5
6
7
8
Position x
0
20
40
0
20
40
0
20
40
Position y
40
40
40
20
20
20
0
0
0
Position z
−10
−6
−4
−10
0
−6
−12
−8
−12
Red
0.30
0.89
0.81
0.89
0.81
0.30
0.68
0.48
0.48
Green
0.80
0.74
0.65
0.74
0.65
0.80
0.62
0.59
0.59
Blue
0.10
0.26
0.16
0.26
0.16
0.10
0.10
0.17
0.17
Index buffer object for drawing the example surface using separate triangles (GL_TRIANGLES) Index k
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Vertex index j
0 3 1 3 4 1 1 4 2 4 5 2 3 6 7 3 7 4 4 7 8 4 8 5
vertex buffer objects for drawing the example mesh in Fig. 3.20. The vertex index j in the index buffer specifies which vertex from the vertex buffer is used for drawing and in which order. Note that the data in the vertex buffer object is stored in floating point format (float) and the indices in the index buffer object are integers (int). Assuming the use of floating point numbers and integers, each of which occupies four bytes in the GPU memory, there is already a memory advantage compared to drawing only with the VBO. Drawing with GL_TRIANGLES and glDrawArrays requires 24 vertices consisting of six components, each of which occupies four bytes. Thus, the vertex buffer requires 576 bytes of data. For drawing with GL_TRIANGLES and glDrawElements, only nine vertices are needed. This means that the vertex buffer object contains 216 bytes of data. In addition, there is the index buffer object with 24 indices and thus a data volume of 96 bytes. In total, this results in a memory space requirement of only 312 bytes. This reuse of vertex data supports item 2 of the list in Sect. 3.3. If we also consider that each vertex usually stores other data, at least normal vectors and texture coordinates, and that complex objects consist of many more vertices and contain many more shared vertices than in this simple example, the savings become even more obvious. On the other hand, the use of an index buffer by glDrawElements results in one more access each time a vertex is read, which
84
3 Basic Geometric Objects
Fig. 3.22 Parts of the source code of the init method of a JOGL renderer (Java) for drawing the example surface with glDrawElements from separate triangles (GL_TRIANGLES)
Fig. 3.23 Parts of the source code of the display method of a JOGL renderer (Java) for drawing the example surface with glDrawElements from separate triangles (GL_TRIANGLES)
slows down the drawing operation. This access, however, takes place entirely on the GPU and can be accelerated accordingly—through hardware support. Figures 3.22 and 3.23 show the essential Java source code for rendering the example surface by the index buffer object and the vertex buffer object from separate triangles.
3.3 OpenGL Drawing Commands
85
3.3.2 Triangle Strips Since the example surface consists of contiguous triangles, it is obvious to use the geometric primitive for drawing a sequence of connected triangles (GL_TRIANGLE_ STRIP) to increase efficiency. The top view of the example surface in the left part of Fig. 3.24 shows the vertices v j for the vertex buffer and the indices i k for the index buffer. The content of the index buffer and the drawing sequence of the triangles are derived from this. Figures 3.25 and 3.26 show the relevant parts of the source code for drawing with glDrawElements. To avoid multiple calls of this drawing command, a single continuous sequence of triangles was defined. Due to the geometry, the renderer draws a triangle degenerated to a line on the right side. This line is created by index i 6 (see the left part of Fig. 3.24). Triangle 5 is only represented by index i 7 . If the
Fig. 3.24 Example of a surface of nine vertices drawn with a short GL_TRIANGLE_STRIP: The left part of the figure shows the top view of the surface with the vertices v j , the indices i k and the drawing order of the triangles (large numbers). The right part of the figure shows the output of an OpenGL renderer
Fig. 3.25 Parts of the source code of the init method of a JOGL renderer (Java) for drawing the example surface with glDrawElements from a short sequence of connected triangles (GL_TRIANGLE_STRIP)
86
3 Basic Geometric Objects
Fig. 3.26 Parts of the source code of the display method of a JOGL renderer (Java) for drawing the example surface with glDrawElements from a short sequence of connected triangles (GL_TRIANGLE_STRIP)
Fig. 3.27 Example of surface of nine vertices drawn with a long GL_TRIANGLE_STRIP: The left part of the figure shows the top view of the surface with the vertices v j , the indices i k and the drawing order of the triangles (large numbers). The right part of the figure shows the output of an OpenGL renderer
geometry were flat, this degenerated triangle would not be visible. In this example, however, the surface is bent, so the additional triangle on the right edge of the surface is visible as an additional line (see right side of Fig. 3.24). If such a surface is only viewed from above and is bent downwards, this may not be a problem. However, it can happen that such a geometry is bent upwards and such degenerated triangles become visible above the surface and thus disturb the representation of the object. One solution to this problem is to use more degenerate triangles that follow the geometry. The left side of Fig. 3.27 again shows in a top view the assignment of indices i k to vertices vi . After triangle 4 has been drawn, index i 6 draws a triangle degenerated to a line from vertex v2 to vertex v5 . Subsequently, index i 7 creates a triangle degenerated to a point at the position of vertex v5 . Then i 8 draws a triangle degenerated to a line from v5 to v8 until i 9 represents triangle 5. The degenerated triangles now follow the geometry and are not visible, as can be seen in the right part of Fig. 3.27. Figures 3.28 and 3.29 show the relevant source code for this longer and improved GL_TRIANGLE_STRIP. It should be noted at this point that all considerations in this section about drawing a sequence of connected triangles (GL_TRIANGLE_ STRIP) using index buffers through glDrawElements can also be transferred to drawing without index buffers using glDrawArrays. However, for more complex
3.3 OpenGL Drawing Commands
87
Fig. 3.28 Parts of the source code of the init method of a JOGL renderer (Java) for drawing the example surface with glDrawElements from a long sequence of connected triangles (GL_TRIANGLE_STRIP)
Fig. 3.29 Parts of the source code of the display method of a JOGL renderer (Java) for drawing the example surface with glDrawElements from a long sequence of connected triangles (GL_TRIANGLE_STRIP)
objects, it is easier to first consider the vertices and then separately define the drawing order of the triangles using indices. In this respect, indexed drawing not only saves memory, but also facilitates the modelling of objects. Furthermore, changes to objects that become necessary in an interactive application are simplified, since the structure of the object is stored in the vertex buffer independently from the rendering order. For example, changes in the position of vertices can be achieved by changing the vertex buffer alone. The use of connected triangles for drawing supports the items 1 and 3 from the list in Sect. 3.3. Looking closely at the right part of Fig. 3.27, it can be seen that three indices are needed for the first triangle in the lower triangle sequence (triangle 5). For each further triangle, only one index is needed. This is the same situation as at the beginning of the entire triangle sequence. Triangle 1 needs three indices and the subsequent triangles only one index each. By using the three degenerated triangles at the transition from triangle 4 to triangle 5, a new triangle sequence (a new GL_TRIANGLE_STRIP) has effectively been started in the lower part of the surface.
3.3.3 Primitive Restart Instead of restarting the sequence indirectly within a triangle sequence by degenerated triangles (see Sect. 3.3.2), the OpenGL provides the explicit mechanism
88
3 Basic Geometric Objects
Fig. 3.30 Parts of the source code of the init method of a JOGL renderer (Java) for drawing the example surface with glDrawElements from a sequence of connected triangles (GL_TRIANGLE_STRIP) using primitive restart
Fig. 3.31 Parts of the source code of the display method of a JOGL renderer (Java) for drawing the example surface with glDrawElements from a sequence of connected triangles (GL_TRIANGLE_STRIP) using primitive restart
primitive restart. For this purpose, a special index is defined that can be used in the index buffer so that when drawing with glDrawElements the geometric primitive is restarted. The subsequent indices in the buffer after this special index are used as if the drawing command had been called again. Since this is a special index, primitive restart is not available for the glDrawArrays drawing command. However, this mechanism can be used for indexed drawing with all geometric primitives. Thus, primitive restart supports the items 1 and 3 of the list from Sect. 3.3. Figures 3.30 and 3.31 show the relevant source code parts to draw the example surface using primitive restart. This mechanism is enabled by glEnable(gl.GL_ PRIMITIVE_RESTART). The glPrimitiveRestartIndex method can be used to define the specific index that will cause the restart. In this example, the value 99 was chosen. In order to avoid conflicts with indices that refer to an entry in the vertex buffer, it makes sense to choose this special index as the largest possible value of the index set. The figure shows that the indices listed after index 99 were chosen as if the drawing command had been invoked anew. The glDrawElements call (see Fig. 3.31) is no different from a call without this mechanism. Only the correct number of indices to use, including the special restart indices, has to be taken into account.
3.3 OpenGL Drawing Commands
89
In this example for drawing the surface, the index buffer with primitive restart is exactly as long as in the last example from Sect. 3.3.2 without primitive restart. However, this mechanism prevents the drawing of the three (degenerated) triangles. Furthermore, modelling is simpler, since the drawing of degenerated triangles does not have to be considered. The use of degenerated triangles can be error-prone or not so simple, as awkward modelling can unintentionally turn backsides into front sides, or the degenerated triangles can become visible after all. As explained in Sect. 3.2.3, triangles and in particular sequences of connected triangles (GL_TRIANGLE_STRIP) have many advantages for the representation of object surfaces. Therefore, much effort has been invested in developing algorithms and tools to transform objects from simple triangles into objects consisting of as few and short sequences of connected triangles as possible. Together with the hardware support of graphics processors, such objects can be drawn extremely quickly. Thus, the GL_TRIANGLE_STRIP has become the most important geometric primitive in computer graphics.
3.3.4 Base Vertex and Instanced Rendering In the OpenGL, further commands exist to draw even more efficiently and flexibly than shown in the previous sections. For example, glDrawElementsBaseVertex can be used to add a constant (base vertex) to the index value after accessing the index buffer. Afterwards, the resulting index is used to access the vertex buffer. This makes it possible to store different parts of a complex geometry in a vertex buffer and to display these parts separately. This drawing command supports item 1 of the list from Sect. 3.3. The item 2 from the list in Sect. 3.3 is supported by the following two drawing commands. These commands allow multiple instances (copies) of the same object to be drawn with a single call. glDrawArraysInstanced glDrawElementsInstanced
For this purpose, the vertex shader has the predefined variable gl_Instance_ID, which is incremented by one with each instance, starting from zero. The value of this counter can be used in the shader to vary with each instance, for example, by changing the position, colour or event, the geometry of the object instance. This mechanism can be used effectively to represent, for example, a starry sky or a meadow of many individual plants.
3.3.5 Indirect Draw Indirect drawing provides further flexibility and thus further support of the items of the list from Sect. 3.3. With this mechanism, the parameters for the drawing command
90
3 Basic Geometric Objects
Fig. 3.32 Parts of the source code of the init method of a JOGL renderer (Java) for drawing the example surface with glDrawElementsIndirect from a sequence of connected triangles (GL_TRIANGLE_STRIP) using primitive restart and an indirect draw buffer
are not passed with the call of the drawing command, but are also stored in a buffer (on the GPU). The following drawing commands use indirect drawing. glDrawArraysIndirect glDrawElementsIndirect
Figures 3.32 and 3.33 show the relevant source code for rendering the example surface using the indirect draw command glDrawElementsIndirect. A GL_TRIANGLE_STRIP with primitive restart is used. In addition to the vertex and index buffers, another buffer object (GL_DRAW_INDIRECT_BUFFER) is reserved on the GPU and filled with the parameter values for the drawing command. The parameter values for indexed drawing are expected in the buffer in the following order:
3.3 OpenGL Drawing Commands
91
Fig. 3.33 Parts of the source code of the display method of a JOGL renderer (Java) for drawing the example surface with glDrawElementsIndirect from a sequence of connected triangles (GL_TRIANGLE_STRIP) using primitive restart and an indirect draw buffer
count : Number of indices that refer to vertices and are necessary for drawing the object. instanceCount : Number of object instances to be drawn. firstIndex : First index into the index buffer. This allows multiple index sequences to be kept in the same buffer. baseVertex : Value of the base vertex; see Sect. 3.3.4 baseInstance : Value to access other vertex data for individual object instances (for details, see, for example, [4]). Buffer creation works as with the other buffers via the glGenBuffers command (not included in the source code parts shown). The glBindBuffer command binds this buffer to the target GL_DRAW_INDIRECT_BUFFER, defining it as a buffer containing the parameter values for an indirect draw command. As with other buffers, glBufferData transfers the data into the buffer on the GPU. With the drawing command glDrawElementsIndirect, only the geometric primitive to be drawn, the type of the index buffer and the index (offset in bytes) at which the drawing parameters are to be found in this buffer need to be specified (see Fig. 3.33). The following extensions of these indirect drawing commands allow drawing with several successive parameter sets located in the indirect buffer: glMultiDrawArraysIndirect glMultiDrawElementsIndirect
These commands are very powerful and the amount of usable objects and triangles is in fact only limited by the available memory. Thus, millions of objects can be autonomously drawn on the GPU by a single call from the graphics application running on the CPU.
92
3 Basic Geometric Objects
3.3.6 More Drawing Commands and Example Project In the OpenGL there are further variants and combinations of the drawing commands described in the previous sections. However, this presentation would go beyond the scope of this book. Therefore, for details on the commands described here and for further drawing commands, please refer to the OpenGL Super Bible [4, p. 249ff], the OpenGL Programming Guide [1, p. 124ff] and the OpenGL specification [3]. The supplementary material to the online version of this chapter includes the complete project JoglSimpleMeshPP, which can be used to reproduce the examples of the drawing commands explained in the previous sections.
3.4 Exercises Exercise 3.1 Which three main types of basic geometric objects in computer graphics do you know? Explain their properties. What are the properties of a convex polygon? Exercise 3.2 Which three geometric primitives for drawing triangles are provided in the OpenGL? Explain their differences. Exercise 3.3 Given the vertex position coordinates (−0.5, 1.0, −0.5), (0.5, 1.0, −1.0), (−0.5, 0, −0.5) and (0.5, 0, −1.0) of a square area in space. The last two position coordinates, together with the additional position coordinates (−0.5, −1.0, −0.5) and (0.5, −1.0, −1.0), span another square area that is directly adjacent to the first area. (a) Draw the entire area using a glBegin/glEnd command sequence using the geometric primitives for drawing triangles. Alternatively, use the GL_TRIANGLES, GL_TRIANGLE_STRIP and GL_TRIANGLE_FAN primitives so that the same area always results. How many commands must be specified within each glBegin/glEnd block? (b) Draw the entire area with a single glDrawArrays command and a vertex buffer object. Alternatively, use the GL_TRIANGLES, GL_TRIANGLE_STRIP and GL_TRIANGLE_FAN primitives so that the same area always results. How many drawing steps must be specified in the command glDrawArrays (third argument) for each alternative? Exercise 3.4 Given the two-dimensional vertex position coordinates (−0.5, 0.5), (0.5, 0.5), (−0.5, −0.5) and (0.5, −0.5) of a square area. (a) Draw this area using triangles and the glDrawArrays command (in threedimensional space). Make sure that the faces of the triangles are aligned in a common direction.
3.4 Exercises
93
(b) Using the glCullFace command, alternately suppress the back faces (backface culling), the front faces (frontface culling) and the front and back faces of the polygons. Observe the effects on the rendered surface. (c) Deactivate the suppression of drawing of front or backsides. Use the glPolygonMode command to change the rendering of the front and backsides of the triangles so that only the vertices are displayed as points. Vary the size of the rendered points and observe the effects. (d) Deactivate the suppression of the drawing of front and backsides. Use the glPolygonMode command to change the rendering of the front and backsides of the triangles so that only the edges of the triangles are displayed as lines. Vary the line width and observe the effects. (e) Deactivate the suppression of the drawing of front and backsides. Use the glPolygonMode command to change the rendering of the front and backsides of the triangles so that the front side is displayed differently from the backside. Exercise 3.5 In this exercise, a circle with centre (0, 0, 0) and radius 0.5 is to be approximated by eight triangles. (a) Draw the circular area using a single glDrawArrays command and the geometric primitive GL_TRIANGLE_FAN. Make sure that the front sides of the triangles are oriented in a common direction. How many drawing steps must be specified in the glDrawArrays command (third argument)? (b) Draw the circular area using a single glDrawArrays command and the geometric primitive GL_TRIANGLE_STRIP. Make sure that the front sides of the triangles are oriented in a common direction. Degenerated triangles must be used for this purpose. How many drawing steps must be specified in the glDrawArrays command (third argument)? (c) Draw the circular area using a single glDrawElements command using an index buffer object and the geometric primitive GL_TRIANGLE_STRIP. Make sure that the front sides of the triangles are oriented in a common direction. Degenerated triangles must be used for this purpose. How many drawing steps must be specified in the glDrawElements command (third argument)? (d) Draw the circular area with a single glDrawElements command and the use of primitive restart. The geometric primitive GL_TRIANGLE_STRIP should still be used. Make sure that the front sides of the triangles are oriented in a common direction. How many drawing steps must be specified in the glDrawElements command (third argument)? (e) Extend your program so that the circular area is approximated with an (almost) arbitrary number res of triangles. Vary the value of the variable res and observe the quality of the approximation. Exercise 3.6 Drawing instances (a) Draw a rectangle using a glDrawArrays command and the geometric primitive GL_TRIANGLE_STRIP.
94
3 Basic Geometric Objects
(b) Use the glDrawArraysInstanced command to draw multiple instances of the same rectangle. In order for these instances not to be drawn on top of each other but to become visible, the vertex shader must be modified. Use the predefined variable gl_Instances_ID in the vertex shader to slightly shift the position of each instance of the rectangle relative to the previous instances. (c) Change the vertex shader so that, depending on the instance number gl_ Instances_ID, the geometry of the rectangle changes (slightly) in addition to the position.
References 1. J. Kessenich, G. Sellers and D. Shreiner. OpenGL Programming Guide. 9th edition. Boston [u. a.]: Addison-Wesley, 2017. 2. M. Segal and K. Akeley. The OpenGL Graphics System: A Specification (Version 4.6 (Compatibility Profile) - October 22 2019. Abgerufen 8.2.2021. The Khronos Group Inc, 2019. URL: https://www.khronos.org/registry/OpenGL/specs/gl/glspec46.compatibility.pdf. 3. M. Segal and K. Akeley. The OpenGL Graphics System: A Specification (Version 4.6 (Core Profile) - October 22, 2019). Abgerufen 8.2.2021. The Khronos Group Inc, 2019. URL: https:// www.khronos.org/registry/OpenGL/specs/gl/glspec46.core.pdf. 4. G. Sellers, S. Wright and N. Haemel. OpenGL SuperBible. 7th edition. New York: AddisonWesley, 2016.
4
Modelling Three-Dimensional Objects
This chapter contains an overview of the basic approaches for modelling threedimensional objects. Since modelling of surfaces of objects is of great importance in computer graphics, special attention is given to this. Often the (curved) surfaces of objects are approximated by planar polygons (see Chap. 3). Triangles are particularly well suited for this. In modern graphics processors, the tessellation unit can decompose a curved surface into planar polygons independently from the central processing unit. Freeform surfaces are well suited for modelling curved surfaces of three-dimensional objects and can be used as starting point for this decomposition. Therefore, this chapter presents the basics of freeform surface modelling. Special attention is paid to the normal vectors of the surfaces, as these are crucial for the calculation of illumination effects of the surfaces of objects.
4.1 From the Real World to the Model Before anything can be rendered on the screen, a three-dimensional virtual world must first be created in the computer. In most cases, this virtual world contains more than just the objects of the small section of the world that is ultimately to be displayed. For example, an entire city or a park landscape could be modelled in the computer, of which the viewer sees only a small part at a time. The first thing to do is to provide techniques for modelling three-dimensional objects. To describe a three-dimensional object, its geometric shape must be defined, but also the properties of its surface. This includes how it is coloured and whether it is more matt or glossy.
Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/978-3-031-28135-8_4.
© Springer Nature Switzerland AG 2023 K. Lehn et al., Introduction to Computer Graphics, Undergraduate Topics in Computer Science, https://doi.org/10.1007/978-3-031-28135-8_4
95
96
4 Modelling Three-Dimensional Objects
There are two possible approaches for the geometric modelling of objects. In many applications, the objects do not represent real objects that already exist in reality. This is true for fantasy worlds often seen in computer games, as well as prototypes of vehicles or planned buildings that have not yet been built and may never be built. In these cases, the developer must have suitable methods for modelling threedimensional objects. Even if real objects are to be represented, modelling might be necessary. In the case of existing buildings or furniture, the essential dimensions are known, but they are by far not sufficient for an approximately realistic representation of the geometric shapes, especially if rounded edges are present, for example. In other cases, very detailed measurement data about the geometric structure of objects is available. 3D scanners allow an extremely precise measurement of surfaces. However, these raw data are not suitable to be used directly for geometric modelling of the measured objects. They are usually converted automatically, with possible manual correction, into simpler surface models. The same applies to techniques for measuring inner geometric structures. Such techniques allow, for example, the study of steel beams in bridges. Another very important and rapidly developing field of application is medicine. X-ray, ultrasound or tomography techniques provide information about different skeletal and tissue structures, so that three-dimensional models of bones and organs can be computed. The first step in computer graphics is thus to model a virtual world in the computer, either manually by a developer or automatically derived from measurement data. To represent a concrete section of this world, the viewer’s position and direction of view must be defined. This includes the position of the viewer in the virtual world, in which direction he is looking, how large his viewing angle (field of view) is and how far he can see. In this way, a three-dimensional clipping area is defined so that only the objects in this area need to be considered for rendering. However, the viewer will not yet be able to see anything, as there is no light in the virtual world yet. Therefore, information about the illumination of the virtual world must be available. Only then can the exact representation of the surfaces of the objects be calculated, i.e., how intensively they are illuminated with which light and where shadows are located (see Chap. 9). Another problem to be solved is to determine which objects are actually visible in the clipping area and which objects or parts of objects are hidden by others (see Chap. 8). In addition, there are possible special effects such as fog, smoke or reflections (see Chap. 11). In the following sections of this chapter, important aspects of modelling threedimensional objects are explained.
4.2 Three-Dimensional Objects and Their Surfaces In computer graphics, all objects are typically understood as three-dimensional objects in the three-dimensional space IR3 . If points and lines are to be represented, then these must also be modelled as three-dimensional objects. This corresponds to the observation of the real world. For example, in a naive modelling approach, a
4.2 Three-Dimensional Objects and Their Surfaces
97
Fig. 4.1 Isolated and dangling edges and faces (to be avoided)
sheet of paper could be considered a two-dimensional surface. In fact, the sheet is a three-dimensional object, albeit with a very small height. Figure 4.1 shows examples of how three-dimensional objects should not look. Isolated or dangling edges and surfaces, as seen in the illustration, should be avoided. The representation of an objects in computer graphics is usually determined by its surface and not by the set of points it consists of. Exceptions to this can be transparent objects. In computer graphics, therefore, it is mostly surfaces and not sets of points in IR3 that are modelled. In applications where objects are measured with 3D scanners or tomography data, no explicit description of the surface of the objects is provided. In these cases, the object is therefore often first described as a three-dimensional set of points (sometimes called a point cloud) and then, if necessary, the surface of the object is derived from this set. There are various sophisticated techniques for modelling complex surfaces with round and curved shapes (see Sects. 4.5 and 4.6). However, for the representation of a scene in computer graphics, these curved surfaces are usually approximated by (many) planar polygons—usually triangles—to simplify the computation of lighting effects. For arbitrary surfaces, it might even be impossible to find an analytical expression for the representation of the projection. Efficient and fast calculations of projections would become impossible. However, the situation is much easier for polygons. The intersection of a straight line representing the direction of projection with a flat surface can be determined easily and quickly. The approximation of curved surfaces by polygons is called tessellation. The fact that only triangles are often used for the planar polygons is not a real limitation. Any polygon can be triangulated, that is, divided into triangles. Figure 4.2 shows a triangulation of a polygon using dashed lines. The advantage of triangles is that efficient algorithms are available for the calculations to be performed in computer graphics, most of which are implemented directly on the GPU. A disadvantage of polygons with more than three edges is that it must be assured that all vertices lie in the same plane. The individual triangles or polygons used to model a surface are usually oriented so that the side of the triangle on the surface of the object faces outwards. The orientation is given by choosing the order of the vertices of the polygon so that they
98
4 Modelling Three-Dimensional Objects
Fig. 4.2 Example of a triangulation of a polygon
Fig. 4.3 According to the convention typically used in computer graphics, the front of the left polygon is oriented towards the viewer. The front of the right polygon is oriented away from the viewer
are traversed in a counter-clockwise direction when the surface is viewed from the outside. Section 3.2.4 describes the polygon orientation in the OpenGL. In Fig. 4.3, this means that the triangle with the vertex order 0, 1, 2 points to the viewer, i.e., the viewer looks at the corresponding surface from the front. In contrast, the same triangle with the reverse vertex order 0, 2, 1 would remain invisible to the viewer from this perspective, since he is looking at the surface from the back. The surface would be invisible since it is impossible to see the surface of a solid threedimensional object from the inside. When polygons have an orientation, rendering can be accelerated considerably, as surfaces that are not pointing at the viewer can be ignored for the entire rendering process. This can be achieved by backface culling. Isolated and dangling faces should be avoided since they can lead to unrealistic effects. For example, they can be seen from one side, but become invisible from the other if backface culling is activated. Figure 4.4 shows a tetrahedron with the vertices P 0 , P 1 , P 2 , P 3 . The four triangular faces can be defined by the following groups of vertices. • • • •
P 0, P 3, P 1 P 0, P 2, P 3 P 0, P 1, P 2 P 1, P 3, P 2
4.3 Modelling Techniques
99
Fig. 4.4 A tetrahedron with its vertices
In the specification of the triangles, the order of the vertices was chosen in such a way that the vertices are listed in a counter-clockwise direction when looking at the corresponding outer surface of the tetrahedron.
4.3 Modelling Techniques A very simple technique for modelling three-dimensional objects is offered by voxels. The three-dimensional space is divided into a grid of small, equally sized cubes (voxels). An object is defined by the set of voxels that lie inside of the object. Voxels are suitable, for example, for modelling objects in medical engineering. For example, tomography data can be used to obtain information about tissue densities inside the measured body or object. If, for example, bones are to be explicitly modelled and represented, voxels can be assigned to those areas where measurement points are located that indicate the corresponding bone tissue density. Figure 4.5 illustrates the representation of a three-dimensional object based on voxels. The storage and computational effort for voxels is very large compared to pixels. The division of the three-dimensional space into voxels is in a way analogous to the division of a two-dimensional plane into pixels, if the pixels are interpreted as small squares. The storage and computational effort increases exponentially with an additional dimension. At a resolution of 4,000 by 2,000 pixels in two dimensions, eight million pixels need to be managed and computed, which is roughly the resolution of today’s computer monitors. Extending this resolution by 2,000 pixels into the third dimension requires 4000 × 2000 × 2000, or sixteen billion voxels. An efficient form of voxel modelling is provided by octrees, which work with voxels of different sizes. The three-dimensional object to be represented is first enclosed in a sufficiently large cube. This cube is divided into eight equally sized
100
4 Modelling Three-Dimensional Objects
Fig. 4.5 Example of modelling a three-dimensional object with voxels
smaller subcubes. Subcubes that are completely inside or completely outside the object are marked with in or off respectively and are not further refined. All other subcubes, that is, those that intersect the surface of the object, are marked as on and broken down again into eight equally sized subcubes. The same procedure is followed with these smaller subcubes. It is determined which smaller subcubes are in, off or on. This subdivision is continued until no new subcubes with the mark on result or until a maximum resolution is reached, i.e., until the increasingly smaller subcubes have fallen below a minimum size. To illustrate the concept of octrees, consider their two-dimensional counterpart, the quadtrees. Instead of approximating a three-dimensional object with cubes, an area is approximated with squares. Figure 4.6 shows an area enclosed in a square that has been recursively divided into smaller and smaller squares. A smaller square is only divided further if it intersects the boundary of the area. The process ends when the squares have reached a predefined minimum size. Figure 4.7 shows the corresponding quadtree. Octrees are similar to quadtrees, but their inner nodes have eight child nodes instead of four, because the cubes are divided into eight subcubes. The voxel model and the octrees offer a way to approximate objects captured with certain measurement techniques. The realistic representation of interactive scenes is already possible with these models (see for example [1]), but not yet developed to the point where widespread use in computer graphics would make sense. For illumination effects such as reflections on the object surface, the inclination of the surface plays an essential role. However, the cubes used in the voxel model and the octrees do not have inclined surfaces, since surfaces of the cubes always point in the direction of the coordinate axes. If the voxels are small enough, the gradient between the voxels could be used as a normal vector. It is useful to subsequently approximate the surfaces of objects modelled with voxels or octrees by freeform surfaces, which are presented in Sect. 4.6.
4.3 Modelling Techniques
Fig. 4.6 Recursive decomposition of an area into squares
Fig. 4.7 The quad tree to the area from Fig. 4.6
101
102
4 Modelling Three-Dimensional Objects
Fig. 4.8 An object created by the CSG scheme with the basic geometric objects and set-theoretic operations shown on the right
Fig. 4.9 Two objects created using sweep representation
The voxel model and octrees are more tailored to creating models based on measured data. Other techniques are used as direct design and modelling tools. Among these techniques is the CSG scheme, where CSG stands for constructive solid geometry. The CSG scheme is based on a collection of basic geometric objects from which new objects can be created using transformations and regularised set-theoretic operations. Figure 4.8 shows on the left an object composed of the basic objects cuboid and cylinder. The right part of the figure specifies how the corresponding basic geometric objects were combined with set-theoretic operations to obtain the object on the left. The necessary transformations are not indicated. For example, the middle section of the object shown is formed from the difference between the cuboid and the cylinder, so that a semicircular bulge is created. Another modelling technique is the sweep representation. This technique creates three-dimensional objects from surfaces that are moved along a path. For example, the horseshoe shape on the left in Fig. 4.9 is created by shifting a rectangle along an arc that lies in depth. Correspondingly, the tent shape on the right in the figure results from shifting a triangle along a line in the (negative) z-direction, i.e., into the image plane. It is known from Fourier analysis that any signal can be uniquely represented by a superposition of sine and cosine oscillations of different frequencies, which in this
4.3 Modelling Techniques
103
Fig. 4.10 Tessellation of the helicopter scene from Fig. 5.15
context are called basis functions. Since curves and surfaces from computer graphics can be understood as such signals, it is in principle always possible to represent curves and surfaces by functions. However, an exact representation may require an infinite number of these basis functions (see Sect. 7.6). For some shapes or freeform surfaces, simple functional equations exist that are relatively easy to represent analytically. For considerations on the representation of surfaces by function equations in two variables, see Sect. 4.5. Probably the most important modelling technique is based on freeform surfaces computed by parametric curves. A description of the analytical representation of freeform surfaces and curves by polynomials, splines, B-splines and NURBS and their properties can be found in Sect. 4.6. For representing objects in computer graphics, as described earlier, the surfaces of objects are often approximated with planar polygons (usually triangles). Describing the surface of an object with polygons requires a list of points, a list of planar polygons composed of these points, colour or texture information, and possibly the specification of normal vectors of the surface. The normal vectors are needed to calculate light reflections. When approximating a curved surface with planar polygons, not only the position coordinates of the polygons to be formed are stored in the corresponding vertices, but usually also the normal vectors of the (original) curved surface. Also the basic geometric shapes (cone, cylinder, cuboid and sphere) used for the representation of the helicopter scene in Fig. 4.10 are mostly tessellated for the representation and thus approximated by triangles. The larger the number of triangles chosen to approximate a curved surface, the more accurately the surface can be approximated. Figure 4.11 shows a sphere for which the tessellation was refined from left to right. The computational effort increases with the growing number of
104
4 Modelling Three-Dimensional Objects
Fig. 4.11 Representation of a spheres with different tessellations
triangles. This does not only apply to the determination of the triangles, which can be done before rendering the scene. But also the calculations for light reflections on the surface, the determination of which objects or polygons are hidden by others, and the collision detection, i.e., whether moving objects collide, become more complex as the number of triangles increases. As a rule, the computational effort increases quadratically with the level of detail, since, for example, a doubling of the level of detail for the representation (of a surface) results in a doubling in two dimensions and thus a quadrupling of the number of triangles. In some cases, an object in a scene is therefore stored in the scene graph in different levels of detail. If, for example, a forest is viewed in the distance, it is sufficient to approximate the trees only by a rough polygon model. The approximation of each individual tree with thousands of triangles would increase the computational effort enormously. When rendering the scene in this form, a single triangle in the rendered image might not even fill a pixel. However, when the viewer enters the forest, the surrounding trees must be rendered in much more detail, possibly even down to individual leaf structures if the viewer is standing directly in front of a tree branch. This technique is called level of detail (LOD) (see Sect. 4.5.1).
4.4 Modelling the Surface of a Cube in the OpenGL Based on the basic OpenGL geometric objects introduced in Sect. 3.2 and the OpenGL drawing commands from Sect. 3.3, below are some considerations for modelling the surface of a cube. These considerations can be extended to other polyhedra. Figure 4.12 shows a cube in which the corners are marked as vertices vi (i = 0, 1, . . . , 7). Indexed drawing using the drawing command glDrawElements (see Sect. 3.3.1) will be used, as this method has efficiency advantages for storing objects in memory. This also makes the considerations of this section easier to understand. The eight vertices can be stored in a vertex buffer object. Much of the content of this section is transferable to drawing without an index buffer using glDrawArrays. In the core profile, only triangles are available for drawing the surfaces, so that one face of the cube must be divided into two triangles. This results in twelve triangles. If the geometric primitive GL_TRIANGLES is used to draw individual triangles, then the index buffer must contain a total of 6 · 2 · 3 = 36 indices. For the front and right
4.4 Modelling the Surface of a Cube in the OpenGL
105
Fig. 4.12 A cube with vertex labels at the corners
faces, part of this buffer could look like this: 0, 1, 2, 1, 3, 2, 2, 3, 6, 3, 7, 6. Since the cube faces consist of adjacent triangles, each with a common edge, the geometric primitive GL_TRIANGLE_STRIP can be used to draw one face. This reduces the number of indices per side to four and for the cube to a total of 6 · 4 = 24 indices. However, the drawing command must be called again for rendering each face. After a brief consideration, it can be seen that adjacent faces, each consisting of two triangles, can be represented with only one sequence of connected triangles (GL_TRIANGLE_STRIP). However, at least two triangle sequences become necessary with this approach. For example, the front, right and back faces can be combined into one sequence and the top, left and the bottom faces into another sequence. Each GL_TRIANGLE_STRIP consists of eight indices, requiring a total of 16 indices. However, this makes it necessary to call the drawing command glDrawElements twice, which is undesirable. This can be remedied by restarting the second triangle sequence through the primitive restart mechanism (see Sect. 3.3.3). The special start index required for this is needed once, which increases the number of indices in the index buffer to 17. If one face of the cube is no longer regarded as a unit consisting of two triangles, then there is a further possibility for optimisation. Figure 4.13 shows the unrolled cube with vertex labels at the corners. In contrast to the previous considerations, the triangles divide the cube faces in a certain irregular way, which is irrelevant for the rendering result as long as the surface polygons are filled. The large numbers within the triangles in the figure indicate the order in which the triangles are drawn (render order). It can be seen from these numbers that after drawing the front face the lower triangle of the right face is drawn, then the bottom, left and top faces are drawn. Afterwards the upper part of the right face is drawn and finally the back face
106
4 Modelling Three-Dimensional Objects
Fig. 4.13 Mesh of a cube with vertices vi : The render order for a triangle sequence (GL_TRIANGLE_STRIP) is indicated by large numbers. The order of the indices i j determines the render order
Fig. 4.14 Part of the source code of the init method of a JOGL renderer (Java) for drawing the cube with eight vertices with glDrawElements from a sequence of connected triangles (GL_TRIANGLE_STRIP)
Fig. 4.15 Part of the source code of the display method of a JOGL renderer (Java) for drawing the cube from eight vertices with glDrawElements from a sequence of connected triangles (GL_TRIANGLE_STRIP)
is drawn. In this way, only one GL_TRIANGLE_STRIP is required. Figure 4.13 also shows the order of the indices i j that refer to the respective vertices in the index buffer. It can be seen that this clever modelling approach only needs 14 indices. Figure 4.14 shows the definition of the index buffer in the Java source code. Figure 4.15 shows the corresponding call of the JOGL drawing command. Figure 4.16 shows the cube drawn by the JOGL renderer, which uses the source code parts from
4.4 Modelling the Surface of a Cube in the OpenGL
107
Fig. 4.16 Cube drawn with an OpenGL renderer
Figs. 4.14 and 4.15. The right side of the figure shows the drawn triangles that are not visible with filled polygons, as shown on the left side. The cube in Fig. 4.16 is very efficient to store in memory and to draw because only one vertex is used per cube corner. This modelling is useful for rendering of wireframe models and monochrome objects that are not illuminated. In realistic scenes, objects with certain material properties are illuminated by different types of light sources, which causes reflections and other special effects. Even for an object where all surfaces are the same colour without illumination or are made of the same homogeneous material, illumination effects change the rendered colours of (and even within) the surfaces to display on the screen. As explained in Chap. 9, in the standard illumination model of computer graphics the colour values of the illuminated surfaces are influenced by the normal vectors of the surfaces of the objects. Normal vectors are displacement vectors that are perpendicular to the surface in question and are stored in the OpenGL—like in other computer graphics libraries—exclusively in the vertices. For the cube in Fig. 4.12 the normal vectors for the left face is nl , the front face is n f , the top face is nu , the right face is nr , the back face is nb and the bottom face is nd as follows: ⎛
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎛ ⎞ ⎞ ⎞ −1 0 0 1 0 0 nl = ⎝ 0 ⎠ ; n f = ⎝ 0 ⎠ ; nu = ⎝ 1 ⎠ ; nr = ⎝ 0 ⎠ ; nb = ⎝ 0 ⎠ ; nd = ⎝ −1 ⎠ . 0 1 0 0 −1 0
Similar considerations apply to textures that can be applied to surfaces of objects. In the OpenGL, texture coordinates are also stored exclusively in the vertices. For the following explanations in this section, it is sufficient to consider a cube with differently coloured but homogeneous faces. This means that the detailed illumination calculation can be dispensed with these considerations for modelling. If the correct normal vectors are later added to the vertices of such a cube, then a realistic illumination of this object is possible. Due to the illumination calculation and the
108
4 Modelling Three-Dimensional Objects
Fig. 4.17 Mesh of a cube with vertex labels at each corner of each face
respective processing in the graphics pipeline, the colour values also vary within the illuminated faces. This can also be neglected for these modelling considerations at this point. For the cube, this simplification means that the cube face can be coloured differently. In the model so far in this section, three faces share one vertex containing the colour information. For example, vertex v0 determines the colour at the corner of the front, the left and the top faces of the cube. But these faces must now be able to have different colours. This means that three vertices are needed for each corner point instead of just one, so that three different colours can be displayed and rendered for each adjacent face. Figure 4.17 shows a mesh of an unrolled cube with three vertices per corner. This gives each corner of a cube face its own vertex, so that different faces can be assigned separate colours. As a result, these individual faces must be drawn separately. Similar to the considerations for the cube with eight vertices (see above), a total of twelve triangles must be drawn to represent the cube width differently coloured faces. Using the geometric primitive GL_TRIANGLES for indexed drawing, the index buffer contains a total of 6 · 2 · 3 = 36 indices. For the front face (vertices v0 to v3 ) and the right face, this buffer could look like this 0, 1, 2, 1, 3, 2, 12, 13, 14, 13, 15, 14. For optimisation, each face of the cube can be rendered by four vertices using a triangle sequence (GL_TRIANGLE_STRIP). This reduces the number of indices per side to four and for the cube to a total of 6 · 4 = 24 indices. Since six triangle sequences must be drawn, the drawing command must be called six times. Using the render order of the triangles from Fig. 4.13, a single GL_TRIANGLE_ STRIP can also be derived from this cube mesh. Figure 4.18 shows the division of the faces into triangles and the order of the indices. In this case, 26 indices are to be
4.4 Modelling the Surface of a Cube in the OpenGL
109
Fig. 4.18 Mesh of a cube with vertices vi : The render order for a triangle sequence (GL_TRIANGLE_STRIP) is identical to the render order in Fig. 4.13. The order of the indices i j determines the render order
processed in a single call to the drawing command glDrawElements. Since the faces consist of different vertices, degenerate triangles must be drawn at the transition to an adjacent side. For example, the first four indices draw the front face with the vertex order v0 , v1 , v2 , v3 . After that, the index i 4 creates a triangle degenerated to a line consisting of the vertices v2 , v3 , v12 . Subsequently, the index i 5 creates another triangle degenerated to a line from the vertices v3 , v12 , v13 . Only with the following index i 6 a (non-degenerated) triangle is rendered again. In total, twelve triangles degenerated to a line must be generated with this approach. The degenerated triangles are not visible in this example, but they generate a certain computational effort for drawing, which is not significant for a single cube. However, if a large number of cubes or cuboids are drawn, this effort can be considerable. To avoid drawing the degenerate triangles, an index buffer can be created using the primitive restart mechanism (see Sect. 3.3.3). This mechanism renders six triangle sequences (GL_TRIANGLE_STRIP) when the drawing command is called. Since each face requires four indices and five triangle sequences must be restarted (the first sequence is started by calling the drawing command), 6 · 4 + 5 = 29 indices are required. Figures 4.19 and 4.20 show the relevant parts of the source code and the corresponding index buffer to enable drawing with the primitive restart mechanism. Figure 4.21 shows the modelled cube from different viewer positions, rendered with the primitive restart mechanism explained above. As noted at the beginning of this section, almost all of the explanations about modelling for indexed drawing (with glDrawElements) of this cube can be transferred to drawing without an index buffer (with the drawing command glDrawArrays). For this, the sequences of the indices in the index buffer can be used to build the
110
4 Modelling Three-Dimensional Objects
Fig. 4.19 Part of the source code of the init method of a JOGL renderer (Java) for drawing the cube by 24 vertices with glDrawElements, from a sequence of connected triangles (GL_TRIANGLE_STRIP) with the primitive restart mechanism
Fig. 4.20 Part of the source code of the display method of a JOGL renderer (Java) for drawing the cube by 24 vertices with glDrawElements, from sequences of connected triangles (GL_TRIANGLE_STRIP) with the primitive restart mechanism
Fig. 4.21 Different views of a cube rendered by an OpenGL renderer
corresponding vertex buffer. Only the primitive restart mechanism is not applicable to this type of drawing, as it is only intended for drawing with index buffers. Furthermore, these considerations can also be applied to the geometric primitives, which are only available in the compatibility profile. One cube face can be represented by a quadrilateral (GL_QUADS), which requires four vertices. Such a quadrilateral can be drawn by two connected triangles (GL_TRIANGLE_STRIP), which also
4.5 Surfaces as Functions in Two Variables
111
requires four vertices. Therefore, to draw the cube from separate cube faces, 6·4 = 24 vertices are needed. The cube with eight vertices can also be composed of (at least) two connected quadrilaterals (GL_QUAD_STRIP). This primitive is also only available in the compatibility profile. For example, the first quadrilateral sequence consists of the left, front and the right faces. The top, back and bottom faces form the second sequence. This makes it possible to reduce the number of required vertices to 2·(4+2+2) = 16. However, the drawing of two quadrilateral sequences must be initiated. Using degenerate quadrilaterals to realise the drawing sequence as in Fig. 4.13 does not provide any efficiency advantage.
4.5 Surfaces as Functions in Two Variables One principal way of modelling surfaces in three-dimensional space is to model the surfaces by an implicit functional equation F(x, y, z) = 0
(4.1)
or by an explicit functional equation of the form z = f (x, y).
(4.2)
The explicit form can always be transformed into an implicit form. However, choosing a simple implicit or explicit equation that gives the desired form as an area is usually extremely difficult. Nevertheless, this possibility is considered in the following, since the explicit form is suitable, for example, for the representation of landscapes (see Sect. 4.5.1). The representation of an area by an implicit equation has the advantage that more functions can be described than by an explicit form. For the calculation of the surface itself, the solutions of the Eq. (4.1) must be determined, which in most cases is only possible numerically and not analytically, since an analytical solution, for example according to the z-coordinates, does not always exist. If a representation of the desired area can be found as an explicit functional equation, then the solution is available. Numerical evaluation can be applied, for example, in computer graphics pipelines that use ray tracing techniques (see Sect. 9.9). Here, however, problems of convergence or numerical stability may arise when calculating the solution of the functional equation. Using the explicit form, the surfaces can be tessellated (see Sects. 4.2 and 4.3). No convergence or numerical stability problems arise, but the level of detail of the tessellation must be chosen appropriately. If the level of detail is too low, the function will be poorly approximated and the rendering quality of the surface may be poor too. A too high level of detail leads to a better approximation, but may result in an unacceptable computational effort. To find a level of detail depending on the situation, for example, the level of detail(LOD) method can be used (see Sect. 4.5.1). The remainder of this section contains considerations of the representation of surfaces using explicit functional equations in two variables. Figure 4.22 shows an
112
4 Modelling Three-Dimensional Objects
Fig. 4.22 The surface generated by the function z = x sin(7x) cos(4y)
example of the following function: z = f (x, y) = x sin(7 x) cos(4y)
(−1 ≤ x, y ≤ 1).
A solid closed shape cannot be modelled by a single function. If necessary, however, such a closed shape can be composed of several individual functions. The surface defined by function (4.1) must also be approximated by triangles for rendering. The simplest approach is to divide the desired area in the x y-plane by a rectangular grid. For the grid points (xi , y j ) the function values z i j = f (xi , y j ) are calculated. The points (xi , y j , z i j ) are then used to define the triangles to approximate the surface. Two triangles are defined over each rectangle of the grid in the x y-plane. Above the rectangle defined by the two points (xi , y j ) and (xi+1 , y j+1 ) in the x yplane, the triangles • (xi , y j , f (xi , y j )), (xi+1 , y j , f (xi+1 , y j )), (xi , y j+1 , f (xi , y j+1 )) and • (xi+1 , y j , f (xi+1 , y j )), (xi+1 , y j+1 , f (xi+1 , y j+1 )), (xi , y j+1 , f (xi , y j+1 )) are defined. Figure 4.23 illustrates this procedure. How well the function or the surface is approximated by the triangles depends on the curvature of the surface and the resolution of the grid. A very high-resolution grid leads to a better approximation quality, but results in a very large number of
4.5 Surfaces as Functions in Two Variables
113
Fig. 4.23 Approximation of a surface defined by a function by triangles
triangles and thus in a high computational effort during rendering. It is therefore recommended to use a technique similar to quadtrees already presented in Sect. 4.3. If the function is defined over a rectangular area, it is initially approximated by only two triangles resulting from the subdivision of the rectangles. If the maximum error of the approximation by the triangle—in the simplest case the absolute error in z-direction can be determined—is sufficiently small, i.e., if it falls below a given value ε > 0, the rectangle is not further divided into smaller rectangles. Otherwise, the rectangle is divided into four smaller rectangles of equal size. Each of these smaller rectangles is then treated in the same way as the original rectangle. If the approximating error of the function on a smaller rectangle by two triangles is sufficiently small, the corresponding triangles on the smaller rectangle are used to approximate the function, otherwise the smaller rectangle is further divided recursively into four smaller rectangles. In addition, a maximum recursion depth or a minimum size of the rectangles should be specified so that the division algorithm terminates. The latter criterion, limiting the number of steps, guarantees that the splitting will terminate even if the function is discontinuous or has a pole at the boundary of the region on which it is defined. For arbitrary functions, the calculation of the maximum error on a given (sub-)rectangle is not necessarily analytically possible. It is therefore advisable to use a sufficiently fine grid—for example, the grid that would result from the smallest
114
4 Modelling Three-Dimensional Objects
permissible rectangles—and to consider the error only on the grid points within the respective (sub-)rectangle.
4.5.1 Representation of Landscapes Most of the objects with curved surfaces are usually not described by arbitrary functions of the form z = f (x, y), but are composed of low-degree polynomials or rational functions, which make modelling more intuitive and easier to understand than with arbitrary functions. This approach to modelling is explained in more detail in Sect. 4.6. However, the technique of representing functions of the form z = f (x, y), as introduced here, can be used for modelling landscapes, for example in flight simulators. Artificial landscapes with hills, bumps and similar features can be generated relatively easily by suitable functions. Further, real landscapes can be rendered using the above technique for function approximation based on triangles. This section only considers the pure geometry or topology of the landscape and not the concrete appearance in detail, such as grassy areas, tarmac roads, stone walls or the like. This would be a matter of textures that are discussed in Chap. 10. For real landscapes, altitude information is usually available over a sufficiently fine grid or lattice. These elevations indicate for each point of the grid how high the point is in relation to a certain reference plane, for instance in relation to sea level. This means, for example, that in a file the geographical height is stored for each intersection point of a grid that is laid over the landscape. Instead of evaluating the function z = f (x, y) at the corresponding grid points (xi , y j ), the value z i j = f (xi , y j ) is replaced by the altitude value h i j at the corresponding point of the landscape to be modelled. For a larger landscape, a uniform division into triangles or a division depending on the landscape structure would result in a large number of triangles having to be considered in the rendering process. On the one hand, a high resolution of the landscape is not necessary for parts of the landscape that are far away from the viewer. On the other hand, for the parts of the landscape that are close to the viewer, a higher resolution is necessary for a realistic rendering. It is therefore advantageous if the number of triangles used for rendering depends on the distance to the viewer. Very close parts of the landscape are divided very finely and the further away the parts of the landscape are from the viewer, the larger the division into triangles is made (see [3, Chap. 2]). Figure 4.24 illustrates this principle of clipmaps. The altitude information is not shown in the figure, only the resolution of the grid and the triangles. Such techniques, where the number of triangles to represent a surface depends on the distance of the viewer, are used in computer graphics not only for landscapes but also for other objects or surfaces that consist of many triangles. If the viewer is close to the object, a high resolution with a high number of triangles is used to represent the surface. At greater distances, a coarse resolution with only a few triangles is usually sufficient. This technique is also called level of detail (LOD) (see Sect. 10.1.1).
4.6 Parametric Curves and Freeform Surfaces
115
Fig. 4.24 Level of Detail (LOD) partitioning of a landscape using clipmaps
The methods described in this section refer to a purely geometric or topological modelling of landscapes. For achieving realistic rendering results, additional textures have to be applied to the landscape to reproduce the different ground conditions such as grass, sand or tarmac. Explanations on textures can be found in Chap. 10.
4.6 Parametric Curves and Freeform Surfaces For the representation of a scene, the surfaces of the individual objects are approximated by triangles, but this representation is not suitable for modelling objects. Freeform surfaces are much better suited for this purpose. They are the threedimensional counterpart of curves in the plane as described in Sect. 3.1. Like these curves, a freeform surface is defined by a finite set of points which it approximates. Storing geometric objects in memory based on freeform surfaces allows working with different resolutions when rendering these objects. The number of triangles used to approximate the surface can be varied depending on the desired accuracy of the representation. In addition, the normal vectors used for the illumination effects are not calculated from the triangles, but directly from the original curved surface, which creates more realistic effects. The modelling of curved surfaces is based on parametric curves. When a surface is scanned parallel to one of the coordinate axes, a curve in three-dimensional space is obtained. Figure 4.25 illustrates this fact. The fundamentals of parametric curves in threedimensional space are therefore essential for understanding curved surfaces, which is why the modelling of curves is presented in the following section.
116
4 Modelling Three-Dimensional Objects
Fig. 4.25 Two curves that result when the curved surface is scanned along a coordinate axis
4.6.1 Parametric Curves If a curve in space or in the plane is to be specified by a series of points—so-called control points—the following properties are desirable to allow easy modelling and adjusting of the curve. Controllability: The influence of the parameters on the curve is intuitively understandable. If a curve is to be changed, it must be easy for the user to see which parameters he should change and how. Locality principle: It must be possible to make local changes to the curve. For example, if one control point of the curve is changed, it should only affect the vicinity of the control point and not change the curve completely. Smoothness: The curve should satisfy certain smoothness properties. This means, that not only the curve itself should be continuous, that is, have no gaps or jumps, but also its first derivative, so that the curve has no bends. The latter means that the curve must be continuously differentiable at least once. In some cases, it may additionally be required that higher derivatives exist. Furthermore, the curve should be of some limited variation. This means that it not only passes close to the control points, but also does not move arbitrarily far between the control points. When the curve passes exactly through the control points, this is called interpolation, while an approximation only requires that the curve approximate the points as closely as possible. Through (n + 1) control points, an interpolation polynomial of degree n or less can always be found that passes exactly through the control points. Nevertheless, interpolation polynomials are not suitable for modelling in computer graphics. Besides the problem that with a large number of control points the degree of the polynomial and thus the computational effort becomes very large, interpolation polynomials do not satisfy the locality principle. If a control point is changed, this usually affects all coefficients of the polynomial and thus the entire curve. Clipping for such polynomials is also not straightforward, since a polynomial interpolating a given set of control points can deviate arbitrarily from the region around the control points. Therefore, it is not sufficient to consider only the control points for clipping of such interpolation polynomials. The curve must be rendered to check whether it
4.6 Parametric Curves and Freeform Surfaces
117
Fig. 4.26 An interpolation polynomial of degree five defined by the control points (0, 0), (1, 0), (2, 0), (3, 0), (4, 1) and (5, 0)
passes through the clipping region. In addition, high-degree interpolation polynomials tend to oscillate. This means that they sometimes fluctuate strongly between the control points. Figure 4.26 shows an interpolation polynomial of degree five defined by six control points, all but one of which lie on the x-axis. The polynomial oscillates around the control points and has a clear overshoot above the highest control point. It does not remain within the convex hull of the control points, which is represented by the red triangle. The undesirable properties of interpolation polynomials can be avoided by dropping the strict requirement that the polynomial must pass through all the control points. Instead, it is sufficient to approximate only some of the control points. This leads to Bernstein polynomials of degree n, which have better properties. The ith Bernstein polynomial of degree n (i ∈ {0, . . . , n}) is given by the following equation. n (n) · (1 − t)n−i · t i (t ∈ [0, 1]) Bi (t) = i The Bernstein polynomials satisfy two important properties. Bi(n) (t) ∈ [0, 1]
for all t ∈ [0, 1]
That is, the evaluation of the Bernstein polynomials within the range of the unit interval [0, 1] only yields values between zero and one. This and the second following property is needed to construct curves that stay within the convex hull of their control points. n Bi(n) (t) = 1 for all t ∈ [0, 1] i=0
118
4 Modelling Three-Dimensional Objects
At each point in the unit interval, the Bernstein polynomials add up to one. Bézier curves use Bernstein polynomials of degree n to approximate (n + 1) control points b0 , . . . , bn ∈ IR p . For computer graphics, only the cases of the plane with p = 2 and the three-dimensional space with p = 3 are relevant. The control points are also called Bézier points. The curve defined by these points x(t) =
n
bi · Bi(n) (t)
(t ∈ [0, 1])
(4.3)
i=0
is called Bézier curve of degree n. The Bézier curve interpolates the start and end points, that is, x(0) = b0 and x(1) = bn hold. The other control points are generally not on the curve. The tangent vectors to the Bézier curve at the initial and final points can be calculated as follows. ˙ x(0) = n · (b1 − b0 ) ˙ x(1) = n · (bn − bn−1 ) This means that the tangent at the starting point b0 points in the direction of the point b1 . The tangent at the end point bn points in the direction of the point bn−1 . This principle is also the basis of the definition of cubic curves, as shown as an example in Fig. 3.2. Fixing t in Eq. (4.3), due to the properties of the Bernstein polynomials, causes x(t) to become the convex combination of the control points b0 , . . . , bn , since the values x(t) add up to one in every point t. Thus, the Bézier curve stays within the convex hull of its control points (cf. [2, Sect. 4.3]). If an affine transformation is applied to all control points, the Bézier curve of the transformed points matches the transformation of the original Bézier curve. Bézier curves are thus invariant under affine transformation such as rotation, translation or scaling. Bézier curves are also symmetric in the control points, that is, the control points b0 , . . . , bn and bn , . . . , b0 lead to the same curve. The curve is only passed through in the opposite direction. If a convex combination of two sets of control points is used to define a new set of control points, the convex combination of the corresponding Bézier curves again results in a Bézier curve. This can be expressed mathematically as follows. • If the control points b˜ 0 , . . . , b˜ n define the Bézier curve x˜ (t) and • the control points bˆ 0 , . . . , bˆ n define the Bézier curve xˆ (t), • then the control points α b˜ 0 +β bˆ 0 , . . . , α b˜ n +β bˆ n define the Bézier curve x(t) = α x˜ (t) + β xˆ (t), if α + β = 1 and α, β ≥ 0 hold. If all control points lie on a line or a parabola, then the Bézier curve also results in the corresponding line or parabola. Bézier curves also preserve certain shape properties such as monotonicity or convexity of the control points. Despite the many favourable properties of Bézier curves, they are unsuitable for larger numbers of control points, as this would lead to a too high polynomial degree. For (n + 1) control points, the Bézier curve is usually a polynomial of degree n.
4.6 Parametric Curves and Freeform Surfaces
119
P5 P
P6
2
P
4
P1
P
7
P
3
Fig. 4.27 B-Spline with knots P1 , P4 , P7 and inner Bézier points P2 , P3 , P5 , P6
Therefore, instead of Bézier curves, B-splines are more commonly used to define a curve to a given set of control points. B-splines are composed of several Bézier curves of lower degree—usually degree three or four. For this purpose, a Bézier curve is calculated for n control points (for instance n = 4) and the last control point of the previous Bézier curve forms the first control point of the following Bézier curve. In this way, B-splines interpolate the control points at which each Bézier curves are joined. These connecting points are called knots. The other control points are called inner Bézier points. Figure 4.27 shows a B-spline composed of two Bézier curves of degree three. To avoid sharp bends at the connection of the Bézier curves, which is equivalent to the non-differentiability of the curve, the respective knot and its two neighbouring inner Bézier points must be collinear. This method for avoiding sharp bends is illustrated in Fig. 3.3. By choosing the inner Bézier points properly, a B-spline of degree n can be differentiated (n − 1) times. Cubic B-splines are based on polynomials of degree three and can therefore be differentiated twice when the inner Bézier points are chosen correctly. In addition to the requirement of collinearity, another constraint must apply to the neighbouring inner Bézier points. The B-spline shown in Fig. 4.28 is composed of two Bézier curves of degree three and is defined by the knots P1 , P4 , P7 and the inner Bézier points P2 , P3 , P5 , P6 . In order to ensure twice differentiability, the segments of the tangents must be in the same ratio to each other as indicated in the figure. B-splines inherit the positive properties of Bézier curves. They stay within the convex hull of the control points, are invariant under affine transformations, symmetric in the control points, interpolate start and end points of the control points and satisfy the locality principle. A B-spline is composed piecewise of Bézier curves. These can be expressed in homogeneous coordinates in the following form. ⎞ ⎛ Px (t) ⎜ Py (t) ⎟ ⎟ ⎜ ⎝ Pz (t) ⎠ 1
120
4 Modelling Three-Dimensional Objects
Fig. 4.28 Representation of the condition for the inner Bézier points for a twice continuously differentiable cubic B-spline
Here Px (t), Py (t), Pz (t) are polynomials in t. Applying to this representation, a perspective projection in the form of the matrix from Eq. (5.11) yields ⎛ ⎞ ⎞ ⎛ ⎞ ⎛ z0 0 0 0 Px (t) Px (t) · z 0 ⎜ 0 z 0 0 0 ⎟ ⎜ Py (t) ⎟ ⎜ Py (t) · z 0 ⎟ ⎜ ⎟ ⎟ ⎜ ⎟ ⎜ ⎝ 0 0 z 0 0 ⎠ · ⎝ Pz (t) ⎠ = ⎝ Pz (t) · z 0 ⎠ . −Pz (t) 0 0 −1 0 1 In Cartesian coordinates, the projection of a Bézier curve thus results in a parametric curve that has rational functions in the individual coordinates: ⎛ ⎞ (t) · z0 − PPxz (t) ⎟ ⎜ ⎜ Py (t) ⎟ − · z ⎝ Pz (t) 0 ⎠ . −z 0 If the perspective projection of B-splines or Bézier curves usually results in rational functions anyway, it is already possible to work with rational functions when modelling in the three-dimensional space. The perspective projection of a rational function is again a rational function. Instead of B-splines, it is very common to use the more general NURBS (non-uniform rational B-splines). NURBS are generalisations of B-splines based on extensions of Bézier curves to rational functions in the following form. n (n) i=0 wi · bi · Bi (t) x(t) = (n) n i=0 wi · Bi (t) The freely selectable weights wi are called form parameters. A larger weight wi increases the influence of the control point bi on the curve. In the sense of this interpretation and in order to avoid singularities, it is usually required that the weights wi are positive.
4.6 Parametric Curves and Freeform Surfaces
121
4.6.2 Efficient Computation of Polynomials In order to draw a parametric curve, polynomials must be evaluated. This also applies to freeform surfaces. In most cases, polynomials of degree three are used for this purpose. In this section, an efficient evaluation scheme for polynomials is presented, which is based on a similar principle as the incremental calculations introduced in the context of the Bresenham algorithm in Sect. 7.3.2. Although floating point arithmetic cannot be avoided for polynomials in this way, it is at least possible to reduce the repeated calculations to additions only. To draw a cubic curve, the parametric curve is usually evaluated at equidistant values of the parameter t. The corresponding points are computed and connected by line segments. The same applies to freeform surface, which are also modelled by parametric curves or surfaces in the form of polynomials. In order to evaluate a polynomial f (t) at the points t0 , t1 = t0 + δ, t2 = t0 + 2δ, . . . with the step size δ > 0, a scheme of forward differences is used. For this purpose, the polynomial f (t) has to be evaluated once by explicitly calculating the initial value f 0 = f (t0 ) at the point t0 and then the changes f (t) = f (t + δ) − f (t) are added in an incremental way as f (t + δ) = f (t) + f (t) or f n+1 = f n + f n . For a polynomial f (t) =
at 3
+ bt 2 + ct + d of degree three this leads to
f (t) = 3at 2 δ + t (3aδ 2 + 2bδ) + aδ 3 + bδ 2 + cδ. In this way, the evaluation of a polynomial of degree three was reduced to the addition of values, which require an evaluation of a polynomial of degree two. For this polynomial of degree two, forward differences can also be used: 2 f (t) = ( f (t)) = f (t + δ) − f (t) = 6aδ 2 t + 6aδ 3 + 2bδ 2 . The -values for the original polynomial of degree three are thus given by the following formula. f n = f n−1 + 2 f n−1 For the calculation of the 2 -values, multiplication still has to be carried out. Applying the scheme of forward differences one last time, we get 3 f (t) = 2 f (t + δ) − 2 f (t) = 6aδ 3 .
122
4 Modelling Three-Dimensional Objects
Table 4.1 Difference scheme for the efficient evaluation of a polynomial of degree three t0 = 0
t0 + δ
f0
→
f0
→ → →
2 f 0 3
f0
t0 + 2δ
+
→
+ 3
+
→ → →
+ f0
→ → → →
+ + 3
f0
t0 + 3δ
...
+
...
+
...
+
...
3
f0
...
Table 4.2 Difference scheme for the polynomial f (t) = t 3 + 2t + 3 with step size δ = 1 t =0 3 3 6 6
t =1 → → → →
6 9 12 6
t =2 →
15
→ → →
21 18 6
t =3 → → → →
36 39 24 6
→ → → →
t =4
...
75
...
63
...
30
...
6
...
Thus, multiplications are only required for the calculation of the initial value at t0 = 0: f0 f0 2 f 0 3 f 0
=d = aδ 3 + bδ 2 + cδ = 6aδ 3 + 2bδ 2 = 6aδ 3 .
All other values can be determined by additions alone. Table 4.1 illustrates this principle of the difference scheme. Table 4.2 contains the evaluation of the difference scheme for the example polynomial f (t) = t 3 + 2t + 3, that is a = 1, b = 0, c = 2, d = 3, with step size δ = 1.
4.6.3 Freeform Surfaces As explained at the beginning of the Sect. 4.6, freeform surfaces are closely related to parametric curves. For the representation of curves, one parameter t is needed, whereas for the representation of surfaces, two parameters are required. If one of these two parameters is held fixed, the result is a curve on the surface, as shown in Fig. 4.29. Bézier surfaces are composed of Bézier curves in parameters s and t. n m bi j · Bi(n) (s) · B (m) (s, t ∈ [0, 1]) x(s, t) = j (t) i=0 j=0
Usually, Bézier curves of degree three are used, so that m = n = 3 is chosen. To define such a Bézier surface, (m + 1) · (n + 1) Bézier points bi j , i.e., 16 in the case of cubic Bézier curves, must be specified.
4.6 Parametric Curves and Freeform Surfaces
123
t=1 P1 (t) t=0.8 t=0.6 t=0.4 t=0.2 t t=0 s
P2 (t)
Fig. 4.29 Example of a parametric freeform surface
Fig. 4.30 A network of Bézier points defining a Bézier surface
Figure 4.30 illustrates how a network of Bézier points defines a Bézier surface. Bézier surfaces have similar favourable properties as Bézier curves. The four vertices b00 , b0m , bn0 , bnm lie on the surface. This is usually not the case for the other control points. The surface remains within the convex hull of the control points. The curves with constant value s = s0 are Bézier curves with respect to the control points n (n) bi j · B j (s0 ). bj = i=0
The same applies to the curves with the constant parameter t = t0 .
124
4 Modelling Three-Dimensional Objects
Fig. 4.31 A triangular grid for the definition of a Bézier surface
Since tessellation in computer graphics usually involves an approximation of surfaces by triangles rather than quadrilaterals, Bézier surfaces of degree n, usually n = 3, are sometimes defined over a grid of triangles as follows: (n) bi jk · Bi jk (t1 , t2 , t3 ). x(t1 , t2 , t3 ) = i, j,k≥0: i+ j+k=n
The corresponding Bernstein polynomials are given by Bi(n) jk (t1 , t2 , t3 ) =
n! j · ti · t · tk i! j!k! 1 2 3
where t1 + t2 + t3 = 1, t1 , t2 , t3 ≥ 0 and i + j + k = n (for i, j, k ∈ IN). The triangular grid is shown in Fig. 4.31.
4.7 Normal Vectors for Surfaces To render a scene realistically, illumination effects such as reflections must be taken into account. Reflections depend on the angles at which light rays hit a surface. Normal vectors of the surface are required to calculate these angles. Illumination effects and reflections are explained in detail in Chap. 9. This section presents the determination of the normal vectors of a surface. For a triangle located in a plane surface, the normal vectors all point in the same direction, corresponding to the normal vector of the plane. If the plane induced by a triangle is given by the equation Ax + By + C z + D = 0,
(4.4)
4.7 Normal Vectors for Surfaces
125
then the vector (A, B, C) is the non-normalised1 normal vector to this plane. This can be easily seen from the following consideration. If n = (n x , n y , n z ) is a not necessarily normalised normal vector to the plane and v = (vx , v y , vz ) is a point in the plane, then the point (x, y, z) lies also in the plane if and only if the connecting vector between v and (x, y, z) lies in the plane. This means the connecting vector must be orthogonal to the normal vector. The following must therefore apply.
0 = n · (x, y, z) − v = n x · x + n y · y + n z · z − n · v If A = n x , B = n y , C = n z and D = n · v, the result is exactly the plane Eq. (4.4). If a triangle is given by the three non-collinear points P 1 , P 2 , P 3 , then the normal vector to this triangle can be calculated using the cross product as follows. n = ( P 2 − P 1) × ( P 3 − P 1) The cross product of two vectors (x1 , y1 , z 1 ) and (x2 , y2 , z 2 ) is defined by the vector ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ x2 y1 · z 2 − y2 · z 1 x1 ⎝ y1 ⎠ × ⎝ y2 ⎠ = ⎝ z 1 · x2 − z 2 · x1 ⎠ . z1 z2 x1 · y2 − x2 · y1 The cross product is the zero vector if the two vectors are collinear. In this way, the non-normalised normal vector of the plane can be determined from Eq. (4.4). The value D in this equation can then be determined by inserting one of the points of the triangle, i.e., one point in the plane, into this equation: D = n · P 1 . If a surface is described by a freeform surface, the normal vector at a point x(s0 , t0 ) on the surface can be determined as the normal vector to the tangent plane at that point. The tangent plane is given by the two tangent vectors to the parametric curves p(s) = x(s, t0 ) and q(t) = x(s0 , t) at the point x(s0 , t0 ). ⎛ ⎞ n m ∂ ∂ ⎠ =⎝ bi j · Bi(n) (s) · B (m) x(s, t0 ) j (t0 ) ∂s ∂s s=s0 i=0 j=0
=
m
B (m) j (t0 ) ·
j=0
n
bi j ·
∂ Bi(n) (s) ∂s
i=0
s=s0
s=s0
⎛ ⎞ n m ∂ ∂ (n) (m) =⎝ bi j · Bi (s0 ) · B j (t)⎠ x(s0 , t) ∂t ∂t t=t0 i=0 j=0
=
n
Bi(n) (s0 ) ·
i=0
1 For
a normalised vector v, v = 1 must hold.
m j=0
bi j ·
(m) ∂ B j (t) ∂t
t=t0
t=t0
126
4 Modelling Three-Dimensional Objects
Fig. 4.32 Normal vectors to the original surface in the vertices of an approximating triangle
These two tangent vectors are parallel to the surface at the point (s0 , t0 ) and thus span the tangent plane of the surface at this point. The cross product of the two tangent vectors thus yields the normal vector at the surface at the point x(s0 , t0 ). When a curved surface in the form of a freeform surface is approximated by triangles, the normal vectors for the triangles should not be determined after the approximation by the triangles, but directly from the normal vectors of the surface. Of course, it is not possible to store normal vectors at every point on the surface. At least for the points used to define the triangles, the normal vectors of the curved surface should be calculated and stored. Usually these will be the vertices of the triangle. In this way, a triangle can have three different normal vectors associated with it, all of which do not coincide with the normal vector of the plane defined by the triangle, as can be seen in Fig. 4.32. This procedure improves the rendering result for illuminated surfaces.
4.8 Exercises Exercise 4.1 The surface of the object in the figure on the right is to be modelled with triangles. Give appropriate coordinates for the six nodes and corresponding triangles to be formed from these nodes. Make sure that the triangles are oriented in such a way that the counter-clockwise oriented surface face outwards. The object is two units high, one unit deep and five units wide. Write an OpenGL program to render the object.
4.8 Exercises
127
Exercise 4.2 Sketch the quad tree for the solid triangle in the figure on the right. Stop this process at step two (including). The root (level 0) of the quad tree corresponds to the dashed square. Exercise 4.3 Let a sphere with radius one and centre at the coordinate origin be tessellated as follows: Let n ∈ N − {0} be even. Let the grid of the tessellation be spanned by n2 equal sized longitude circles and n2 − 1 different-sized latitude circles. Assume that the centres of the latitude circles lie on the z-axis. (a) Construct a parameterised procedure that can be used to determine coordinates for the vertices of the tessellation. Sketch the situation. (b) Determine the corresponding normal vectors of the corner points from the results of task (a). (c) Write an OpenGL program that draws a sphere with an arbitrary number of latitude and longitude circles. Approximate the surface of the sphere with triangles. (d) Check if all surfaces are oriented outwards by turning on backface culling. Correct your program if necessary. (f) Optimise your program so that the complete object is drawn with only one OpenGL drawing command. If necessary, use the primitive restart mechanism. Exercise 4.4 Let there be a cylinder whose centre of gravity lies at the origin of the coordinates. The two circular cover faces have a radius of one. The cylindrical surface has a length of two and is parallel to the z-axis. The surface of the cylinder is tessellated as follows (n ∈ N − {0}): The cylindrical surface is divided into n rectangles ranging from one cover surface to the other cover surface. The two circular faces become two polygons having n corners. The angle between two successive vertices as seen from the centre of the cover is α = 2π n . The corner points of the tessellation vary with the parameter i · α; i = 0, . . . , n − 1. (a) Sketch the cylinder and its tessellation in a right-handed coordinate system. (b) What are the coordinates of the vertices in homogeneous coordinates for each rectangle as a function of α and i? (c) What are the corresponding normal vectors at the vertices in homogeneous coordinates? d) Write an OpenGL program that draws the cylindrical surface for any value of n. Approximate the rectangles by two triangles each. (e) Complete your program with two approximated circles (polygons with n corners) to represent the cover surfaces for any values for n. Approximate the polygons with n corners with triangles. For this purpose the geometric primitive GL_TRIANGLE_FAN is well suited. (f) Check if all surfaces are oriented outwards by turning on backface culling. Correct your program if necessary.
128
4 Modelling Three-Dimensional Objects
(g) Optimise your program so that the complete object is rendered with only one OpenGL drawing command. Use the primitive restart mechanism if necessary. (h) Tailor your program so that the correct normal vectors are stored in the vertices. Note that the normal vectors for the cylindrical surface are oriented differently from those for the cover surfaces. (i) Extend your program so that a frustum (truncated cone) is drawn instead of a cylinder. In this case, the two top surfaces must only be able to have (arbitrarily) different radii.
References 1. C. Crassin, F. Neyret, M. Sainz, S. Green and E. Eisemann. “Interactive Indirect Illumination Using Voxel Cone Tracing”. In: Computer Graphics Forum 30.7 (2011), pp. 1921–1930. 2. G. Farin. Curves and and Surfaces for CAGD: Practical Guide. 5th edition. Morgan Kaufmann, 2001. 3. M. Pharr, ed. GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation. Boston: Addison-Wesley, 2005.
5
Geometry Processing
This chapter presents the individual steps of the geometric transformations, which are necessary to be able to represent the geometry of real-world objects on the screen. After introducing the concepts in the two-dimensional plane, they are applied in three-dimensional space. The first step is the geometric modelling of each object as a model of the virtual world to be represented, as described in Sect. 4.1. This modelling is done for each object in its own coordinate system, the model coordinate system. Afterwards, all models are transferred into the coordinate system of the virtual scene, the world coordinate system. From this step on, the perspective comes into play, which is created by a view through the camera from the viewer’s location. Afterwards, all areas not visible through the camera are cut off, creating the so-called visible (clipping) area. The clipping area is then mapped onto the output device. The mathematics of this chapter can be deepened with the help of [1,2].
5.1 Geometric Transformations in 2D Besides the geometric objects, which are modelled in Chap. 4, geometric transformations play an essential role in computer graphics. Geometric transformations are used to position objects, i.e., to move or rotate them, to deform them, for example, to stretch or compress them in one direction. Geometric transformations are also used to realise movements or changes of objects in animated graphics. Before the geometric transformations are discussed in more detail, a few conventions should be agreed upon. In computer graphics, both points and vectors are used, both of which are formally represented as elements of the IRn .1 In the context of this book and from the perspective of computer graphics, there is frequent switching between
1 Especially
in physics, a clear distinction between these two concepts is required.
© Springer Nature Switzerland AG 2023 K. Lehn et al., Introduction to Computer Graphics, Undergraduate Topics in Computer Science, https://doi.org/10.1007/978-3-031-28135-8_5
129
130
5 Geometry Processing
the two interpretations as point and vector, so that these terms are dealt with very flexibly here to some extent. A tuple (x1 , . . . , xn ) ∈ IRn can be interpreted as a point in one equation and as a vector in the next.2 Column vectors are generally used in equations in this book. In the text, dots are sometimes noted as row vectors to avoid unnecessarily high rows. In cases where a dot is used explicitly as a column vector, the transposition symbol is usually written at the dot, i.e., it is written in the form (x, y) ∈ IR2 and (x, y, z) ∈ IR3 , respectively. The scalar product of two vectors u and v is noted as follows: ⎛ ⎞ v1 n ⎜ .. ⎟ u i · vi . u · v = (u 1 , . . . , u n ) · ⎝ . ⎠ = vn
i=1
The most important geometric transformations in computer graphics are scaling and rotation, and shear and translation. Scaling causes stretching or compression in the x- and y-direction. The point (x, y) is mapped to the point (x , y ) as follows for a scaling S(sx , s y ):
sx 0 sx · x x x = = · . y sy · y 0 sy y sx is the scaling factor in the x-direction. There is stretching in the x-direction exactly when l|sx | > 1 applies. If |sx | < 1, there is compression. If the value sx is negative, in addition to the stretching or compression in the x-direction another mirroring is operated on the y-axis. Correspondingly, s y causes stretching or compression in the direction of the y-axis and, if the value is negative, additionally a reflection on the x-axis. A scaling is—like all other geometric transformations—always to be understood point by point, even if it is applied to objects. As an example, the scaling with sx = 2 and s y = 0.5 is considered here, which stretches in the direction of the x-axis to double compression and in the direction of the y-axis to half compressed. If you apply this scaling to the rectangle whose lower-left corner is at point (80, 120) and whose upper-right corner is at point (180, 180), the resulting rectangle is not only twice as wide and half as high as the original one but also shifted to the right and down. Figure 5.13 shows the output rectangle and dashed the scaled rectangle. The scaling always refers to the coordinate origin. If an object is not centred in the origin, scaling always causes an additional displacement of the object. Another important geometric transformation is rotation, which is defined by the specification of an angle. The rotation is counter-clockwise around the coordinate origin or clockwise if the angle is negative. The rotation R(θ ) around the angle θ maps the point (x, y) on the following point (x , y ):
2 Physicists
may forgive us for this carelessness. coordinate system (with the origin in the lower-left corner) of the figure corresponds to the window coordinate system of OpenGL. 3 The
5.1 Geometric Transformations in 2D
131
Fig. 5.1 Scaling using the example of a rectangle
Fig. 5.2 Rotation using the example of a rectangle
x y
=
x · cos(θ ) − y · sin(θ ) x · sin(θ ) + y · cos(θ )
=
cos(θ ) − sin(θ ) sin(θ ) cos(θ )
x · . y
Since the rotation always refers to the coordinate origin, a rotation of an object not centred in the coordinate origin causes an additional displacement of the object, just like scaling. In Fig. 5.2, a rotation of 45◦ was performed, mapping the output rectangle to the dashed rectangle. As penultimate, elementary geometric transformation, shear is introduced, which causes a distortion of an object. Shear is defined as a scale by two parameters, with the difference that the two parameters are defined in the secondary diagonals and not
132
5 Geometry Processing
Fig. 5.3 Shear using the example of a rectangle
in the principal diagonals of the corresponding matrix stand. If a shear Sh(sx , s y ) is applied to the point (x, y), it results in the following point (x , y ):
1 sx x x + sx · y x = · . = y sy 1 y y + sy · x Analogous to scaling and rotation, the shear refers to the coordinate origin. An object not centred in the coordinate origin is affected by shear and an additional move. In computer graphics, shear is of rather minor importance compared to the other geometric transformations. In Fig. 5.3, the dashed rectangle is obtained by applying the shear with the parameters sx = 1 and s y = 0 to the output rectangle. Since s y = 0 is valid here, we speak of shear in the x-direction. For shear in the y-direction, sx = 0 must apply. The last elementary geometric transformation still to be considered is relatively simple but differs substantially from the three transformations introduced so far. The translation T (dx , d y ) causes a shift of one vector d = (dx , d y ) , i.e., the point (x, y) is mapped to the point
x + dx x dx x = = . + y y + dy dy y Figure 5.4 shows a translation of a rectangle around the vector d = (140, 80) . In contrast to the transformations discussed earlier, which are all linear mappings,4 a translation cannot be represented by matrix multiplication in Cartesian coordinates. With matrix multiplication in Cartesian coordinates, the zero vector, i.e., the origin of the coordinate system, is always mapped to itself. A translation shifts but all points,
4 Any mapping is given by a matrix A that maps one vector x to another vector b. Mapping in the form Ax = b is a linear mapping.
5.1 Geometric Transformations in 2D
133
Fig. 5.4 Translation using the example of a rectangle
including the coordinate origin. Translations belong to the affine, but not to the linear mappings. In computer graphics, more complex transformations are often created by stringing together elementary geometric transformations. A transformation, which results from a concatenation of different scales, rotations and shears, is described by the matrix, which is obtained by multiplying the matrices belonging to the corresponding elementary transformations in reverse order. If translations are also used, the composite transformation can no longer be calculated and represented in this simple way. This is due to the fact that a translation is an addition of a constant displacement vector d x and, consequently, a movement is caused. This vector addition x + d x in Cartesian coordinates cannot be converted to express a matrix representation Ax. It would be advantageous both from the point of view of memory and computational effort if all operations connected with geometric transformations could be traced back to matrix multiplications. To make this possible, a different coordinate representation of the points is used, where the translation can also be described in the form of a matrix multiplication. This alternative coordinate representation is called homogeneous coordinates, which are explained in more detail in the following Sect. 5.1.1.
5.1.1 Homogeneous Coordinates At this point, homogeneous coordinates for points in the plane are introduced. The same technique is used for three-dimensional space. Homogeneous coordinates use an additional dimension to represent points and vectors.
The point (x, y, z) in ho x y in Cartesian coordinates. mogeneous coordinates is defined with the point , z z The z-component of a point in homogeneous coordinates must never be zero. For (directional) vectors and normals, the z-component is set to zero in the homogeneous representation, i.e., in this case, the first three components are identical to the Cartesian ones. If the point (x0 , y0 ) is to be represented in homogeneous coordinates, the representation (x0 , y0 , 1) can be used as so-called normalised homogeneous
134
5 Geometry Processing
Fig. 5.5 Homogeneous coordinates
coordinates. Although this representation occurs frequently, it is not the only representation. Any representation (z · x0 , z · y0 , z) with z = 0 is also correct. The points {(x, y, z) ∈ IR3 | (x, y, z) = (z · x0 , z · y0 , z)} all lie on the straight line in IR3 , which is defined by the equation system x − x0 · z = 0 y − y0 · z = 0 and which runs through the coordinate origin. Each point on this line, except the coordinate origin, represents the point (x0 , y0 ) in homogeneous coordinates. If a fixed z-value is selected for the representation in homogeneous coordinates, e.g., z = 1, all points in the plane parallel to the x y-plane are represented by the corresponding z-value. Figure 5.5 illustrates these relationships. All points on the displayed straight line represent the same point in IR2 . If one fixes a z-value, e.g., one of the planes drawn, the points of the IR2 can be represented in the corresponding plane. The coordinate origin in Cartesian coordinates corresponds in homogeneous coordinates to a point of the form (0, 0, z). A linear map with respect to homogeneous coordinates, i.e., a linear map in IR3 , does not necessarily map this point to itself. A linear map can map this point to another point in homogeneous coordinates. A translation can be represented in homogeneous coordinates as matrix–vector multiplication: ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ x + dx 1 0 dx x x ⎝ y ⎠ = ⎝ y + dy ⎠ = ⎝ 0 1 dy ⎠ · ⎝ y ⎠ . 1 1 00 1 1 The other elementary geometric transformations can easily be extended to homogeneous coordinates, resulting in the following transformation matrices: Rotations and translations conserve lengths and angles according to their application. Scales and shears generally manipulate lengths or angles, but at least the parallelism of lines is preserved.
5.1 Geometric Transformations in 2D
135
Transformation Abbreviation Translation
T (dx , d y )
Shear
S(sx , s y )
Matrix
⎞ 1 0 dx ⎝ 0 1 dy ⎠ 00 1 ⎛ ⎞ sx 0 0 ⎝ 0 sy 0 ⎠ 0 0 1 ⎛
⎛ Rotation
R(θ)
Scherung
S(sx , s y )
⎞ cos(θ) − sin(θ) 0 ⎝ sin(θ) cos(θ) 0 ⎠ 0 0 1 ⎞ ⎛ 1 sx 0 ⎝ sy 1 0 ⎠ 0 0 1
The series-connected execution of geometric transformations in homogeneous coordinates can, therefore, be realised by matrix multiplication. The introduced matrices for the elementary transformations all have the form ⎛ ⎞ a c e ⎝b d f ⎠. (5.1) 00 1 The product of two such matrices results in a matrix of this form, as can be easily calculated. Geometric transformations in computer graphics are therefore usually represented and stored in this form. Especially in the OpenGL homogeneous coordinates are used. This is not only valid for transformations, which operate in the plane but in a similar way also for spatial transformations, which are treated starting from Sect. 5.8. Therefore, a graphics card of a computer must be able to execute vector and matrix operations efficiently. When connecting transformations in series, it must be taken into account that the order in which the transformations are carried out plays a role. Matrix multiplication is a non-commutative operation. Figure 5.6 shows in the right-hand part the different results obtained by applying in one case first a translation around the vector (40, 20) and then a rotation of 45◦ and in the other case in the reverse order to the rectangle in the left-hand part of the figure. This effect occurs only if different transformations are linked together. When connecting transformations of the same type in series, i.e., only rotations, only translations, only scalings or only shearings, the order is not important. Exchanges among themselves are only possible in a few cases. It should also be noted that when matrix notation or notation of transformations as mapping is used, the transformations are performed from right to left. This is a common mathematical convention. The transformation (T (dx , d y ) ◦ R(θ ))(v)
136
5 Geometry Processing
Original
Translation
Rotation
1. Translation, 2. Rotation
1. Rotation, 2. Translation
Fig. 5.6 Different results when the order of translation and rotation is reversed
or in matrix notation
⎞ ⎛ ⎞ cos(θ ) − sin(θ ) 0 1 0 dx ⎝ 0 1 d y ⎠ · ⎝ sin(θ ) cos(θ ) 0 ⎠ · v 00 1 0 0 1 ⎛
means that first the rotation R(θ ) and then the translation T (dx , d y ) are applied to the point v.
5.1.2 Applications of Transformations In this section, some example applications and problems are explained that can be solved with the help of geometric transformations. In computer graphics, it is common practice to specify objects in arbitrary coordinates in floating point arithmetic, the so-called world coordinates. For the generation of a concrete graphics, a rectangular window, the viewport, must be specified, which defines the area of the “object world” visible on the screen or other output device. Therefore, a mapping from the world coordinates to the device or screen coordinates must be calculated. The viewport transformation into the device coordinates from three-dimensional to two-dimensional space is dealt with in detail in Sect. 5.38. At this point, this transformation is greatly simplified for illustrative purposes. Only one observation in two-dimensional space is carried out, in which a twodimensional object section that is converted into (simplified) two-dimensional world
5.1 Geometric Transformations in 2D
137
Fig. 5.7 From world to window coordinates (highly simplified two-dimensional view)
coordinates transforms into a section of the screen, the screen or window coordinates. Figure 5.7 illustrates this situation. The rectangle with the lower-left corner (xmin , ymin ) and the upper-right corner (xmax , ymax ) at the top left of the image specifies the section of the object world to be displayed, the window in world coordinates. This world section must be drawn in the window with the screen coordinates (u min , vmin ) and (u max , vmax ) as the lower-left and upper-right corner of the window on the screen. The two rectangles in world coordinates and window coordinates do not have to be the same size or have the same shape. The mapping to the viewport can be realised by the following transformations in series. First, the screen coordinates of the viewport are translated into the coordinate origin. Afterwards, these coordinates of the viewport in the origin are scaled to the size of the screen window in order to position this scaled window on the screen by a translation at the correct position on the screen. This results in the following transformation, where ◦ is the series connection of these transformations.
u max − u min vmax − vmin ◦ T (−xmin , −ymin ). , (5.2) T (u min , vmin ) ◦ S xmax − xmin ymax − ymin Here too, the transformations are to be carried out from right to left. Rotations always refer to the coordinate origin. To perform a rotation around any point (x0 , y0 ), this point must first be moved to the coordinate origin by means of a
138
5 Geometry Processing
translation, then the rotation must be performed and finally, the translation must be undone. A rotation around the angle θ around the point (x0 , y0 ) is realised by the following series of transformations: R(θ, x0 , y0 ) = T (x0 , y0 ) ◦ R(θ ) ◦ T (−x0 , −y0 ).
(5.3)
If the rotation in this equation is replaced by a scale, it results in a scale related to the point (x0 , y0 ). Depending on which device or image format this window section is subsequently drawn on, further steps must be taken to adapt it to the conditions. Individual image formats (e.g., PNG) have the coordinate origin in the upper-righthand corner. More transformations are needed here. This results in the so-called pixel coordinates. Pixel coordinates of a window on the screen are usually specified so that the first coordinate defines the pixel column and the second coordinate defines the pixel line. For example, the x-axis, i.e., the first coordinate axis, would run as usual from left to right, while the y-axis of the window coordinates would point down instead of up. Through a suitable geometric transformation, this effect can be avoided. Before a drawing starts, one must first mirror on the x-axis. Mirroring causes the y-axis in the window to point upwards, but still be at the top edge of the window. After mirroring, a translation in the y-direction around the height of the window must be performed so that the geometric transformation T (0, h) ◦ S(1, −1)
(5.4)
is received where h is the height of the window in pixels.
5.1.3 Animation and Movements Using Transformations So far, the geometric transformations were only used statically here to describe the mappings of one coordinate system into another, and positioning or deformation of objects. Geometric transformations are also suitable for modelling movements such as the movement of the second hand of a clock in the form of a stepwise rotation of 6◦ per second. Continuous movements must be broken down into small partial movements, each of which is described by a transformation. In order to avoid a jerky representation of the continuous motion, the partial movements must be sufficiently small or the distance between two consecutive images must be sufficiently short. If the movement of an object is modelled by suitable transformations, the object must be drawn by default, the transformed object must be calculated, the old object must be overwritten and the transformed object must be redrawn (in the OpenGL, efficient procedures are carried out for this when using VBOs and VAOs; see Chap. 2). But it is by no means clear what the pixels, which belong to the old object, should be overwritten with. For this purpose, a unique background must be defined. In addition, the old object must be completely rendered again to determine which pixels it occupies. For this reason, the entire image buffer is usually rewritten. However, you do not write directly to the screen buffer, but into a virtual screen buffer, which is then transferred to the actual screen buffer. As a simple example, a clock with a second hand is considered here, which is to run from the bottom left to the top right across a screen window. The clock itself
5.1 Geometric Transformations in 2D
139
Fig. 5.8 A moving clock with second hand
consists of a square frame and has an only one-second hand. Minute and hour hands could be treated accordingly, but should not be considered further for reasons of simplification. The square frame of the clock must be replaced piece by piece from bottom left to top right by a translation that can be moved. This translation must also act on the second hand, which must also be rotated. Figure 5.8 shows at the same time individual intermediate stations of the clock. The clock could move two units to the right and one unit upwards at each step, which could be achieved by a translation Tclock,hand = T (2, 1). Correspondingly, the second hand would have to have a rotation of the form Thand,step = R(−π/30). This should be used if the hand is to continue turning clockwise by −π/30, i.e., by 6◦ in each step. The hand starts at the centre of the clock so that the hand must be rotated around this point. One could position the clock at the beginning in such a way that the hand starts at the coordinate origin. However, at the latest after one movement step of the clock, the hand leaves this position and the centre of rotation would have to be changed accordingly. There are two strategies for describing such compound movements. In the first strategy, you keep a record of where the object is—in our example the second hand— and moves the centre of rotation accordingly. In general, it is not enough to save only the displacement of the object under consideration. For example, if the object is to expand along an axis using a scale, the orientation of the object must be known. For example, if you want the second hand to become longer or shorter in the course of one revolution without changing its width, it is not sufficient to scale the reference point moved with the object, as the hand would then also become thicker or thinner. In principle, this strategy can be used to model movements, but the following second strategy is usually easier to implement. The principle is to always start from the initial position of the object, accumulate the geometric transformations to be applied and apply them to the initial object before drawing it. In the example of the clock, one could use the two transformations mentioned above and three others:
140
5 Geometry Processing (new) (old) Tclock,total = Tclock,step ◦ Tclock,total (new) (old) Thand,total rotations = Thand,step ◦ Thand,total rotations Thand,total = Tclock,total ◦ Thand,total rotations .
Tclock,total and Thand,total rotations are initialised with the identity at the beginning and then updated according to these equations in each step. Tclock,total describes the (total) translation that must be performed to move the clock from the starting position to the current position. Tclock,total is applied to the frame of the clock centred in the coordinate origin. Thand,total rotations indicates the (total) rotation around the coordinate origin that the hand has performed up to the current time. In addition, the hand must perform the shift together with the frame of the clock. The transformation Thand,total is therefore applied to the pointer positioned in the coordinate origin. It is important that first the (total) rotation of the pointer and then the (total) displacement are executed. An alternative to this relatively complex modelling of movements is the scene graph presented in Sect. 5.4.
5.1.4 Interpolators for Continuous Changes Another way of describing movements or changes is to define an interpolation from an initial state to an end state. An object should continuously change from the initial to the final state. In the example of the clock from the previous two sections, one would not use the transformation Tclock,step = T (2, 1), which is to be executed in a sequence of, e.g., 100 images after the original image from one image to the next, but is the start and end position, approximately p0 = (0, 0) and p1 = (200, 100) . The points pα on the connecting line between the points p0 and p1 result from the convex combination of these two points with pα = (1 − α) · p0 + α · p1 ,
α ∈ [0, 1].
For α = 0, one gets the starting point p0 , for α = 1, the endpoint p1 and for α = 0.5, the point in the middle of the connecting line between p0 and p1 . The principle of convex combination can be applied not only to points or vectors but also to matrices, i.e., transformations. Later it will be shown how continuous colour changes can also be created in this way (further explanations of colours can be found in Chap. 6). If two affine transformations are given by the matrices M0 and M1 in homogeneous coordinates, their convex combinations Mα are defined accordingly by Mα = (1 − α) · M0 + α · M1 ,
α ∈ [0, 1].
In this way, two objects created from the same basic object by applying two different transformations can be continuously transformed into each other. Figure 5.9 illustrates this process by means of two ellipses, both of which originated from a basic ellipse by using different scales and translations. In the upper left corner of the figure,
5.1 Geometric Transformations in 2D
141
Fig. 5.9 Transformation of one ellipse into another using a convex combination of transformations
the initial ellipse is shown, which was created by the first transformation from the base ellipse. In the lower right corner, the end ellipse is shown, which was created with the second transformation from the base ellipse. The ellipses in between are created by applying convex combinations of the two transformations to the base ellipse. One has to be careful with rotations. Of course, two rotation matrices can be transformed into each other in the same way as in the ellipse transformation just discussed. However, if linear interpolation of the rotation depending on the angle is used, then it makes more sense to interpolate between the angles of the two rotation matrices and then reinsert them into the respective rotation matrices. Another technique of continuous interpolation between two objects S and S assumes that the two objects are separated by n control points P1 = (x1 , y1 ), . . . , Pn = (xn , yn ) or P1 = (x1 , y1 ), . . . , Pn = (xn , yn ) and connecting elements (straight lines, square or cubic curves) which define these used points. Corresponding connecting lines appear in both objects, i.e., if object S contains the square curve (see Chap. 4) defined by points P1 , P3 and P8 , then object S contains the square curve defined by points P1 , P3 and P8 . Figure 5.10 shows two simple examples of two objects in the form of the letters D and C, for the definition of which five control points P1, . . . , P5 or P1 , . . . , P5 , respectively. Both letters are described by two square curves: • A curve that starts at the first point, ends at the second and uses the third as a control point. For the letter D, these are points P1, P2 and P3, respectively, and for C, points P1 , P2 and P3 , respectively. • The second square curve uses the first point as the start point, the fourth as the endpoint and the fifth as the control point. If the two objects, in this case, the letters D and C, are to be transformed into each other by a continuous movement, convex combinations can be used for this purpose. Instead of the convex combination of transformations, here the convex combinations of the corresponding pairs of points Pi and Pi are calculated, that is, (α)
Pi
= (1 − α) · Pi + α · Pi .
142
5 Geometry Processing
Fig. 5.10 Two letters, each defined by five control points and curves of the same type
Fig. 5.11 Stepwise transformation of two letters into each other
To display the intermediate image α ∈ [0, 1], the corresponding curve segments are drawn using the points Pi(α) . In the example of the transformation of the letter D into the letter C, the two square curves are drawn which are defined by the points (α) (α) (α) (α) (α) (α) P1 , P2 and P3 or P4 , P5 and P3 , respectively. Figure 5.11 shows the intermediate results for the convex combinations with α = 0, 0.2, 0.4, 0.6, 0.8, 1, using the points from Fig. 5.10 and drawing the corresponding quadratic curves. In Sect. 6.4, further application possibilities of interpolators in connection with colours and raster graphics are presented.
5.2 Geometrical Transformations in 3D As in two-dimensional computer graphics, geometric transformations also play an important role in three-dimensionality. All three-dimensional coordinates in this book refer to a right-handed coordinate system. Using the thumb of the right hand as the x-axis, the index finger as the
5.2 Geometrical Transformations in 3D
143
Fig. 5.12 A right-handed coordinate system
y
x z y-axis and the middle finger as the z-axis results in the corresponding orientation of the coordinate system. In a right-handed coordinate system, the x-axis is transformed into the y-axis by a 90◦ rotation, i.e., counter-clockwise, around the z-axis, the y-axis is transformed into the y-axis by a 90◦ rotation around the x-axis into the z-axis and the z-axis by a 90◦ rotation around the y-axis into the x-axis. Figure 5.12 shows a right-handed coordinate system. In Sect. 5.1.1, homogeneous coordinates were introduced to be able to represent all affine transformations of the plane by matrix multiplication. The same principle of extension by one dimension is used for affine transformations of the three-dimensional space. One point of the three-dimensional space homogeneous coordinate is representedby four coordinates (x, ˜ y˜ , z˜ , w) with w = 0. Thereby coded (x, ˜ y˜ , z˜ , w) the x˜ y˜ z˜ 3 3 point w , w , w ∈ IR . The point (x, y, z) ∈ IR can be measured in homogeneous coordinates that can be displayed in the form (x, y, z, 1). However, this is not the only possibility. Every representation of the form (x · w, y · w, z · w, w) with w = 0 also represents this point.
5.2.1 Translations In the homogeneous coordinates, a translation around the vector (dx , d y , dz ) is as matrix multiplication in the form ⎛ ⎞ ⎛ ⎞ ⎞ ⎛ ⎞ ⎛ 1 0 0 dx x x x + dx ⎜ ⎟ ⎜ ⎟ ⎟ ⎜ y ⎟ ⎜ ⎜ ⎟ = ⎜ 0 1 0 dy ⎟ · ⎜ y ⎟ = ⎜ y + dy ⎟ ⎝ 0 0 1 dz ⎠ ⎝ z ⎠ ⎝z ⎠ ⎝ z + dz ⎠ 1 000 1 1 1 with the translation matrix:
⎛
1 ⎜0 T (dx , d y , dz ) = ⎜ ⎝0 0
0 1 0 0
0 0 1 0
⎞ dx dy ⎟ ⎟. dz ⎠ 1
144
5 Geometry Processing
5.2.2 Scalings Scaling by the factors sx , s y , sz is given by ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎞ ⎛ x sx 0 0 0 x sx · x ⎜ y ⎟ ⎜ ⎟ ⎜ ⎟ ⎟ ⎜ ⎜ ⎟ = ⎜ 0 sy 0 0 ⎟ · ⎜ y ⎟ = ⎜ sy · y ⎟ ⎝z ⎠ ⎝ 0 0 sz 0 ⎠ ⎝ z ⎠ ⎝ sz · z ⎠ 1 0 0 0 1 1 1 with the scaling matrix:
⎛
sx ⎜0 S(sx , s y , sz ) = ⎜ ⎝0 0
0 sy 0 0
0 0 sz 0
⎞ 0 0⎟ ⎟. 0⎠ 1
5.2.3 Rotations Around x-, y- and z-Axis In two dimensions, it was sufficient to look at rotations around the origin of the coordinates. In combination with translations, any rotation around any point can be displayed. In three dimensions, a rotation axis must be specified instead of a point around which to rotate. The three Elementary rotations in the three dimensions are the rotations around the coordinate axes. A rotation by a positive angle around a directed axis in the three dimensions means that counter-clockwise rotation occurs while the axis points towards the viewer. In this context, the right-hand rule comes into play. Suppose the thumb of the right-hand points in the direction of the oriented rotation axis, and the remaining fingers are clenched into a fist. The fingers indicate the direction of positive rotation. A rotation around the z-axis by the angle θ can be described in homogeneous coordinates as follows: ⎛ ⎞ ⎛ ⎞ x x ⎜ ⎟ ⎜ y ⎟ ⎜ ⎟ = Rz (θ ) · ⎜ y ⎟ ⎝z⎠ ⎝z ⎠ 1 1 with the rotation matrix:
⎛
cos θ − sin θ ⎜ sin θ cos θ ⎜ Rz (θ ) = ⎝ 0 0 0 0
0 0 1 0
⎞ 0 0⎟ ⎟. 0⎠ 1
This rotation matrix corresponds to the rotation matrix around the coordinate origin already known from the two dimensions, which was only extended by the z-dimension. With a rotation around the z-axis, the z-coordinate does not change. The matrices for rotations around the x- and y-axis are obtained from the above
5.2 Geometrical Transformations in 3D
145
matrix by swapping the roles of the axes accordingly so that a rotation around the x-axis by the angle θ is given by the matrix ⎛ ⎞ 1 0 0 0 ⎜ 0 cos θ − sin θ 0 ⎟ ⎟ Rx (θ ) = ⎜ ⎝ 0 sin θ cos θ 0 ⎠ 0 0 0 1 and a rotation around the y-axis by the angle θ is realised by the matrix ⎛ ⎞ cos θ 0 sin θ 0 ⎜ 0 1 0 0⎟ ⎟ R y (θ ) = ⎜ ⎝ − sin θ 0 cos θ 0 ⎠ . 0 0 0 1 Note that the rotation matrices are orthonormal, i.e., their normalised column vectors (scaled to length one) are orthogonal to each other. Thus, the inverse matrix of a rotation matrix is equal to the transported matrix, just like orthonormal matrices. In this context, the question remains about how to approach a rotation around an arbitrary axis. It would be desirable that this is derived purely from the geometric transformations treated so far. This is possible and will be discussed in Sect. 5.5. However, further fundamental consideration of coordinate system transformations and scene graphing is required to understand the concepts and methods behind them better. These are explained in Sects. 5.3 to 5.4.3.
5.2.4 Calculation of a Transformation Matrix with a Linear System of Equations For all the transformation matrices made so far, the last row is (0, 0, 0, 1). This property is also retained during matrix multiplication. The following properties can be determined for transformation matrices: In the two-dimensional case, there is exactly one transformation matrix, which maps three non-collinear points to three other non-collinear points. Correspondingly, in the three-dimensional case, there is exactly one transformation matrix, which maps four non-coplanar points to four other image non-coplanar points. If four points p1 , p2 , p3 , p4 , which do not lie in a plane, are given in IR3 and their new coordinates p1 , p2 , p3 , p4 , calculate the transformation matrix by solving a linear system of equations: pi = M · pi The matrix
(i = 1, 2, 3, 4). ⎛
a ⎜e M = ⎜ ⎝i 0
b f j 0
c g k 0
⎞ d h⎟ ⎟ l⎠ 1
(5.5)
146
5 Geometry Processing
in the homogeneous coordinates must, therefore, be determined from the four-vector equations (5.5), each consisting of three equations5 and the total of twelve parameters of the matrix M. In this sense, a transformation can also be understood as a change of the coordinate system. This property is used later in the Viewing Pipeline, for example, to view scenes from different angles (model, world and camera) (see Sect. 5.10).
5.3 Switch Between Two Coordinate Systems Each coordinate system is described by a base with the corresponding base vectors and a coordinate origin. In the common three-dimensional right-handed (Cartesian) coordinate system K the three basis vectors e1 = (1, 0, 0)T , e2 = (0, 1, 0)T and e3 = (0, 0, 1)T . The coordinate system origin is located at (0, 0, 0)T . This results in the following matrix, which spans this three-dimensional space ⎛ ⎞ 100 M = ⎝0 1 0⎠, 001 the so-called unit matrix or identity matrix. Interestingly, this matrix results from the following matrix: ⎛ ⎞ 1000 M = ⎝0 1 0 0⎠, 0010 which contains the base vectors as columns, followed by the coordinate origin as the location vector. Since the last column is redundant, it can be omitted. This results in the following matrix in homogeneous coordinates: ⎛ ⎞ 1000 ⎜0 1 0 0⎟ ⎟ M =⎜ ⎝0 0 1 0⎠. 0001 Let there be another Cartesian coordinate system K . Let bx , b y and bz be the associated basis vectors and b the coordinate origin of A, represented with the coordinates in relation to K . Then the representation of the points can be described from both K and A. Different coordinates result for this description of a point P, depending on whether it is viewed from K or from A. The transition (i.e., representation with respect to K ) at given coordinates that are dependent on A can be realised by
5 Due to the homogeneous coordinates, each of the vector equations actually consists of four equations. However, the last line or equation always has the form 0 · px + 0 · p y + 0 · pz + 1 · 1 = 1.
5.3 Switch Between Two Coordinate Systems
147
Fig. 5.13 Example of the change between two coordinate systems
multiplying the vectors in A with the matrix whose columns form the vectors bx , b y , bz and b. The inverse matrix of this matrix represents the transition from K to A. This connection is illustrated by the following example. For the sake of simplicity, the two-dimensional case is considered. Figure 5.13 shows the Cartesian coordinate system and another Cartesian coordinate system A. In relation to the Cartesian coordinate system K , the basis vectors of the Cartesian coordinate system A are (3, 4)T , in normalised homogeneous coordinates ( 35 , 45 , 0)T , and the corresponding orthogonal vector (−4, 3)T , in normalised homogeneous coordinates (− 45 , 35 , 0)T . Since these vectors are not location vectors (with a fixed starting point), they have a 0 as a third component. The origin of the coordinate system is (6, 3)T , in homogeneous coordinates (6, 3, 1)T . In Fig. 5.13, it is easy to see that the Cartesian coordinate system can be transformed into the coordinate system A by rotation through the angle α (the angle between the vectors ( 35 , 45 , 0)T and (1, 0, 0)T , the basis vector of the x-axis of K in homogeneous coordinates) and a shift by the homogeneous vector (6, 3, 1)T . For the calculation of the angle α, the scalar product is used, it results in ⎛ ⎞ 1 ( 35 45 0) · ⎝ 0 ⎠ 3 0 3 cos α = ⎛ 3 ⎞ ⎛ ⎞ = 5 =
1 · 1 5
5 1
⎜ 4 ⎟ ⎝ ⎠
⎝ 5 ⎠ · 0
0 0
148
5 Geometry Processing
and according to Pythagoras
4 . 5 For the series-connected execution of the transformations for the transfer, the following matrix multiplication is performed: ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ 3 4 ⎞ ⎛ 3 −4 6⎞ 1 0 dx cos α − sin α 0 106 5 5 5 −5 0 ⎜4 3 ⎟ ⎜4 3 ⎟ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ M = 0 1 dy · sin α cos α 0 = 0 1 3 · ⎝ 5 5 0 ⎠ = ⎝ 5 5 3 ⎠ 00 1 0 01 001 0 01 0 01 sin α =
1 − cos2 α =
where d x describes the displacement in x- and dy the displacement in the y-direction. On closer examination of the result matrix M, it becomes clear that the column vectors are identical to the basis vectors of the coordinate system A, including the corresponding coordinate origin. The inverse matrix of the result matrix M −1 is ⎛ 3 4 ⎞ 5 − 5 −6 ⎜ 4 3 ⎟ ⎝−5 5 3⎠ 0 0 1 that reverses the overpass. The question arises with which matrix, i.e., M or M −1 , the homogeneous vectors of the Cartesian points have to be multiplied in order to calculate them from the point of view of the coordinate system A. At first, it is astonishing that the multiplication with M −1 must be done. The reason is obvious on further consideration: The rotation of a point with the angle β in the fixed coordinate system has the same effect as the rotation of the coordinate system by the angle −β with a fixed point. The same is true for displacement. Since the points are treated as fixed objects and the coordinate systems are transformed into each other, the matrix M −1 must be applied to the Cartesian points for the application to succeed in the transformation into the coordinate system A. For the transfer of, e.g., point (4, 8) from A to K , this results in ⎛3 4 ⎞ ⎛ ⎞ ⎛ ⎞ 4 2 5 −5 6 ⎝ 4 3 3 ⎠ · ⎝ 8 ⎠ = ⎝ 11 ⎠ 5 5 1 1 0 01 and thus in Cartesian coordinates (2, 11). The way back over the inverse matrix results in ⎛3 4 ⎞ ⎛ ⎞ ⎛ ⎞ 2 4 5 5 −6 ⎝ 4 3 3 ⎠ · ⎝ 11 ⎠ = ⎝ 8 ⎠ 5 5 1 1 0 0 1 and thus (4, 8) from the point of view of coordinate system A. These regularities are described in the so-called viewing pipeline (see Sect. 5.10), which in particular describes the geometric transformations of the vertices of an object from model coordinates to world coordinates via camera coordinates to device coordinates.
5.4 Scene Graphs
149
5.4 Scene Graphs 5.4.1 Modelling To model a three-dimensional scene, geometric objects must be defined and positioned in the scene. Possibilities for modelling individual geometric objects are presented in Chap. 4. Beside elementary basic objects like cuboids, spheres, cylinders or cones, usually more complex techniques for object modelling are available. As a rule, complex objects are composed of individual subobjects. Figure 5.14 shows a chair that was constructed with elementary geometric objects. The legs and the seat are cuboids, and the backrest consists of a cylinder. In order to model the chair, the elementary geometric objects must be created with the appropriate dimensions and positioned correctly. The positioning of the individual objects, i.e., the four legs, the seat and the backrest, is carried out by means of suitable geometric transformations, which are applied individually to each object. If one wants to place the chair at a different position in the scene, e.g., move it further back, one would have to define an appropriate translation and apply it additionally to all partial objects of the chair. This would be very complex, especially
Fig. 5.14 A chair composed of elementary geometric objects
150
5 Geometry Processing
Fig. 5.15 A scene composed of several elementary objects
for objects that are much more complicated than the chair, if these moves would have to be applied explicitly to each object component. When modelling threedimensional scenes, it is therefore common to use a scene graph, in which objects can be hierarchically combined into transformation groups. In the case of the chair, the chair forms its own transformation group to which the legs, seat and backrest are assigned. A transformation that is to be applied to the transformation group chair automatically affects all elements that belong to this transformation group. In this way, the entire chair can be positioned in the scene without having to explicitly apply the transformation (required for positioning) to all sub-objects. The algorithm that generates this data structure scene graph traverses visits the transformation groups implicitly as soon as their ancestors (node that is closer to the root on the path to the root node in the tree) have been visited. This means that actions do not only affect the visited node but all its descendants. This is used in the animation. Consequently, the concept of the scene graph is to be illustrated in detail by means of an example, which also includes animations. In the scene, there is a very simplified helicopter standing on a cube-shaped platform. In addition, a tree, which is also shown in a very abstract way, belongs to the scene shown in Fig. 5.15 is shown. Figure 5.16 shows a possible scene graph for this scene. The root node of the overall scene above has two child nodes. Both are transformation groups. Elementary geometric objects, further transformation groups and geometric transformations can be assigned to a transformation group as child nodes. The upper two transformation groups, tgHeliPlat and tgTree, represent the helicopter with the platform and the tree in the scene, respectively. Each of these two transformation groups is assigned, as a direct child node, a transformation tfHeliPlat and tfTree, respectively, which are used to position the helicopter together with the platform or the whole tree in the scene. The transformation group tfTree also has two transformation groups as child nodes, tgTrunk for the trunk and tgLeaves for the crown. The transformation group tgTrunk is only an elementary geometric object in the form of a cylinder, which is generated in the coordinate origin like all elementary objects. The
5.4 Scene Graphs
151
Fig. 5.16 The scene graph for the scene from Fig. 5.15
transformation group tgLeaves consists of the elementary geometric object leaves in the shape of a cone and a transformation tfLeaves. This transformation is used to move the tree crown created in the coordinate origin to the tree trunk. The same applies to the transformation group tgHeliPlat , which consists of the helicopter and the platform. Besides a transformation tfHeliPlat for positioning in the scene, two transformation groups are assigned to it: tgHelicopter and tgPlatform. The helicopter consists of a transformation tfHelicopter for positioning the helicopter on the platform and three further transformation groups to which elementary geometric objects are assigned. The cabin cabin of the helicopter consists of a sphere, the tail and the rotor that were created as a cuboid. The transformations tfTail and tfRotor are used to position the tail at the end of the cabin and the rotor on top of the cabin, respectively. Let MtfHeliPlat be the transformation matrix to place the platform into the scene, MtfHelicopter the transformation matrix to place the helicopter from the position of the platform onto it and MtfRotor the transformation matrix to place the rotor from the position of the helicopter onto it. Then, for example, TRotor is the overall transformation matrix to position the rotor from its model coordinates to the world coordinates (on the helicopter): TRotor = MtfHeliPlat · MtfHelicopter · MtfRotor . Note that the order of this matrix multiplication corresponds to the corresponding transformations in the scene graph in the path from node tgHeliPlat to node
152
5 Geometry Processing
tgRotor. The transformations of this matrix multiplication do not influence the transformations of other objects if the corresponding matrices of the current object are processed individually using a matrix stack (matrices stored in a stack). Afterwards, the matrix stack is emptied. The transformations of other objects can be calculated analogously in the matrix stack. The following section explains how to add animations to these scene graphs.
5.4.2 Animation and Movement Only static three-dimensional worlds have been considered so far. To describe dynamically changing scenes, similar techniques are used as they were considered for the two-dimensional case in Sect. 5.1.3. Movements can be realised as stepwise interpolation or convex combination between positions or states. The same applies to many other dynamic changes, e.g., a slow change of colour or brightness. In the two-dimensional case, composite or complex movements, as in the example of the second hand of a linearly moving clock, are generated by the explicit series connection of the individual transformations. The use of scene graphs allows a much easier handling of the modelling of movements. Each transformation group can be assigned a motion that is applied to all objects assigned to it. For a better illustration, the scene with the helicopter shown in Fig. 5.15 shall be used again. The helicopter should start its rotor and then lift off the platform diagonally upwards. Due to the linear motion of the helicopter together with its own rotation, the rotor performs a more complex spiral motion. In the scene graph, however, this can be realised very easily. The rotor is created at the coordinate origin and there its rotation around the y-axis is assigned to the corresponding transformation group. Only then is the rotor positioned at the top of the cabin together with its own movement. If the entire helicopter is placed on the platform and the platform is placed together with the helicopter in the scene, the rotation is also transformed and still takes place in the right place. A linear movement is assigned to the transformation group of the helicopter, which allows the take-off of the platform. A transformation group can now be assigned objects, other transformation groups, transformations for positioning or interpolators. The interpolators are used to describe movements. Figure 5.17 shows a section of the extended scene graph for the helicopter, with which the rotation and lift-off of the helicopter can be described. The names of transformation groups containing interpolators, i.e., movements, begin with the letters tgm (for TransformationGroup with movement). The transformation group tgmRotor contains the rotor as a geometrical object centred in the coordinate origin and two rotational movements br and brStart around the y-axis, which are executed one after the other. The first rotational movement lets the rotor rotate slowly at first. The second rotational movement describes a faster rotation, which should finally lead to the take-off of the helicopter. The take-off of the helicopter is realised in the transformation group tgmHelicopter. The movements and the positioning should be cleanly separated in the transformation groups. The rotor remains in the transformation group tgmRotor in the
5.4 Scene Graphs
153
Fig. 5.17 Section of the scene graph with dynamic transformations
coordinate origin, where its rotation is described. This transformation group is assigned to another transformation group tgRotor, which positions the rotor on the helicopter cabin. This translation is also applied to the rotational movement, since the movement is located in a child node of tgRotor. The same applies to the movement of the helicopter. The ascent of the helicopter is described in the transformation group tgmHelicopter relative to the coordinate origin. Only then the helicopter is transformed together with its flight movement in the parent transformation group tgHelicopter that is placed on the platform so that it starts from there. It is also conceivable to swap the order of the two transformation groups. The representation chosen here corresponds to the modelling of the following movement: The helicopter should take off from a platform until it finally reaches a given height h, measured from the platform. The height of the platform is not important for the helicopter in this movement. It must always cover a distance of h units. However, if the helicopter is to take off from the platform and fly up to a height h above the ground, the distance to be covered depends on the height of the platform. In this case, the two transformation groups would be swapped. However, in the transformation group tgmHelicopter, one would have to define another movement starting from a different point. Thus, the movement is calculated as the interpolation of transformations in the scene graph.
154
5 Geometry Processing
5.4.3 Matrix Stacks and Their Application in the OpenGL Sections 5.4.1 and 5.4.2 describe how individual objects in a scene can be realised using a scene graph. In this context, it becomes clear that transformations act on particular objects located in the sequence from the object, including its transformations to the root within the scene graph. Since the corresponding total matrix is created by matrix multiplication of the corresponding matrices in reverse order, this entire matrix should only affect one object in this scene and not influence the other objects. This is realised with the help of matrix stacks. A stack is a data structure that stores data on top of each other, as in a stack of books, and then empties it in the reverse order. In a matrix stack, matrices are stored as data. The last object placed on the stack is taken first again. For example, to place two objects in a scene, the corresponding transformations of the objects must be performed one after the other. This means, for example, that after placing the first object in the scene, the corresponding transformations must be undone (by applying the opposite transformations in relation to the transformations performed, in reverse order) so that they do not affect the second object. Afterwards, the same procedure is carried out with the second object. This procedure would be too cumbersome. Another solution is to use matrix stacks. With the first object, the matrices of the transformations that are present on the path from the object to the root of the scene graph are pushed into the matrix stack one after the other. Note that the removal of these matrices in the matrix stack is done in reverse order, which is identical to the order of matrix multiplication of these matrices. With this matrix multiplication, the correct total matrix is obtained, corresponding to the corresponding one after the other execution of the transformations of this first object. Thus, before processing the second object, the matrix stack is emptied. The procedure is repeated using the matrix stack with the second object and its corresponding matrices. Thus, the second object is placed in the scene independently of the first one. The same procedure can be used to display all objects of a scene graph in the scene. Movements and all transformations of the viewing pipeline (see Sect. 5.10) are implemented similarly. This can be studied in depth in [5]. In the OpenGL fixed-function pipeline, matrix stacks are realisable via the gl.glPushMatrix() and gl.glPopMatrix() functions. For example, for
Fig. 5.18 Matrix stack in the fixed-function pipeline in display method (Java)
5.5 Arbitrary Rotations in 3D: Euler Angles, Gimbal Lock, and Quaternions
155
two objects, the matrix stack is filled and emptied again as shown in Fig. 5.18 (in the display method). These commands are not present in the programmable pipeline in the core profile of OpenGL. They either must be implemented, or a library must be imported to implement this functionality. For example, the class PMVMatrix (JOGL) works well with the core profile as shown in Fig. 5.19.
Fig. 5.19 Matrix stack in the core profile of the programmable pipeline with class PMVMatrix in the display method (Java)
5.5 Arbitrary Rotations in 3D: Euler Angles, Gimbal Lock, and Quaternions From the point of view of translations and rotations for transferring an already scaled object from model coordinates to world coordinates, the following six degrees of freedom exist in three-dimensional space: • The position (x, y, z) (and thus three degrees of freedom) and • the orientation (with one degree of freedom each): – rotation around the x-axis, – rotation around the y-axis and – rotation around the z-axis. With these six degrees of freedom, any (already scaled) object, including its model coordinate system, can be brought into any position and orientation of the world coordinate system. Each scaled object is modelled in the model coordinate system. The model coordinate system is congruent at the beginning with the world coordinate
156
5 Geometry Processing
system, the coordinate system of the 3D scene. By the application of these six degrees of freedom also, a coordinate system transformation of the model coordinate system (together with the associated object) takes place each time, which results in the final position and orientation (of the model coordinate system together with the object) in the world coordinate system. Orientation about the x-, y- and z-axes takes place by applying the rotations about the x-axis (with angle θx ), y-axis (with angle θ y ) and z-axis (with angle θz ). The angles θx , θ y and θz are the so-called Eulerian angles.
5.5.1 Rotation Around Any Axis By a suitable series connection of these three rotations around the coordinate axes x, y and z (the orientation) in combination with suitable translations (for the final position), a rotation around any axis can be described by any angle θ . This arbitrary rotation axis is mathematically represented by a vector v = (x, y, x)T . After translating the starting position of the vector v into the coordinate origin, the following steps will bring the vector v collinearly onto, for example, the z-axis with suitable standard rotations around the x-, y- and z-axis. Afterwards, a rotation around the z-axis takes place with the angle θ . The transformations carried out must be reversed in the reverse order. The corresponding matrices are multiplied inversely in this order. The resulting total matrix then describes a matrix for the rotation around this rotation axis with the angle θ . In the following, this procedure is applied to an example. First, a translation T (dx , d y , dz ) must be applied, which shifts the rotation axis so that it passes through the coordinate origin. Then a rotation around the z-axis is performed so that the rotation axis lies in the yz-plane. A subsequent rotation around the x-axis allows the rotation axis to be mapped onto the z-axis. Now the actual rotation around the angle θ is executed as rotation around the z-axis. Afterwards, all auxiliary transformations must be undone, so that a transformation of the following form results: T (−dx , −d y , −dz ) · Rz (−θz ) · Rx (−θx ) · Rz (θ ) · Rx (θx ) · Rz (θz ) · T (dx , d y , dz ). Note that the transformations are carried out from right to left, as with matrix multiplication. Between the translation T (d x, dy, dz) = (−x, −y, −z)T and the back translation at the end, the selected rotations do not necessarily have to be carried out to bring the rotation axis congruently onto the z-axis. Moreover, not all rotations about the three main axes x, y and z must necessarily be present. In the present example, an initial rotation around the z-axis would also have brought the rotation axis into the yz-plane. Other rotations around the x-, y- and z-axes are possible. Furthermore, a collinear mapping of the rotation axis onto the x- or y-axis with the corresponding matrices and rotation with the angle θ instead of the z-axis is possible. Therefore, depending on the procedure, other rotation sequences arise within this sequence.
5.6 Eulerian Angles and Gimbal Lock
157
5.6 Eulerian Angles and Gimbal Lock Let’s assume that each copy of an object exists at two different positions and different orientations. There is always an axis of rotation and an angle with which these object variants can be transferred into each other. This means that for each orientation of an object in space, there is an axis and a rotation angle to the original placement of the object, assuming that its size is not changed. This can also be interpolated. Therefore, a rotation requires only the specification of a vector describing the axis of rotation and the corresponding angle, which specifies the rotation around this axis of rotation. The challenge is the efficient calculation of this axis and the corresponding angle. As explained at the end of Sect. 5.5.1, suitable geometric transformations can execute a rotation around any rotation axis in three-dimensional space. An interpolation is considered between the two positions and orientations, e.g., for an animation. The Eulerian angles are used for interpolation in the following. Eulerian angles indicate the orientation of an object with respect to an arbitrary but fixed orthonormal base in three-dimensional space. Rotations with Euler angles are defined by three rotations around the three main Cartesian axes x with base (1, 0, 0), y with base (0, 1, 0) and z with base (0, 0, 1). This corresponds to the approach of the series-connected execution of the corresponding homogeneous rotation matrices of the x-, y- and z-axis. This allows that any orientation and position of the object can be realised by the series connection of the rotations around the x-, y- and z-axis. This is also the case if the order of the x-, y- and z-axes is reversed. Interestingly, this property continues as long as the first axis is different from the second, and the second from the third. This means that a rotation, e.g., first around the x-, then around the y-axis and then around the x-axis can create any orientation and position of the object. The application of the Eulerian angle poses some challenges, which is why today’s computer graphics systems use quaternions, which are introduced later in this chapter. In order to better understand these challenges, the change of rotations is further investigated. The following is an analysis of how the orientation of the object behaves when, after performing a rotation around the x-, y- and z-axis, e.g., an x-axis rotation is applied. A rotational sequence is not commutative, this means that the rotation sequence is essential, and the previous application influences the rotation of the following one. If the series is reversed, it usually results in a different orientation or position of the object. Let θx , θ y and θz be angles. For example, first, rotate about the y-axis (R y (θ y )), then about the x- (Rx (θx )) and then about the z-axis (Rz (θz )). In the following, this rotation sequence yx z with the orthonormal total matrix Rz (θz ) · Rx (θx ) · R y (θ y ) is considered in more detail. Strictly speaking, this means that only the rotation around the y-axis with angle θ y does not affect the other rotations of the object. After applying angle θ y , the object is rotated around the x-axis with angle θx . After that, the rotation around the z-axis with angle θz is performed on the resulting orientation. These rotations are themselves interpreted as actions on objects. Then the order of rotations to be performed is a hierarchy in the form of a scene graph, as shown in Fig. 5.20.
158
5 Geometry Processing
Fig. 5.20 Hierarchy and thus mutual influence of rotations around the axes x, y and z, here rotation order yx z with rotation total matrix Rz (θz ) · Rx (θx ) · R y (θ y )
Since the successively executed transformations require a change of coordinate system each time, the corresponding coordinate system transformation is started from the model coordinate system of the scaled object. In the beginning, the model coordinate system is congruent with the axis of the world coordinate system. After performing the transformations, the model coordinate system, including the object, is in the correct position concerning the world coordinate system. This results in two views: • The intrinsic view describes the transformations done from the perspective of an observer coupled to the object, for example, a pilot in an aircraft. Here, the model coordinate system changes as a reference through coordinate system transformations. • The extrinsic view represents the transformations done from the point of view of an observer in the world coordinates. This point of view is independent and outside of the object. The world coordinate system remains fixed as a reference. These two different views differ only in the reverse order of their transformations. The view of the rotation order yx z of the example in the chosen rotation order is thus intrinsic. In this context, it makes sense to consider the three coordinate system changes from the model to the world coordinate system intrinsically with the rotation order yx z. Initially, both coordinate systems are identical. The model coordinate system is fixed with the object throughout. If a rotation around the y-axis takes place, the object rotates around the y-axis of the model coordinate system. Thus, the x- and z-axes of the model coordinate system rotate with it, while the x- and z-axes of the world coordinate system remain in their original position. However, the x- and z-axes continue to span the same x z-plane. If the object is subsequently rotated around the x-axis of the model coordinate system, both the object and the y- and z-axes of the model coordinate system rotate with it. Thus, in the first rotation R y (θ y ) also rotates. When rotating the object around the z-axis, the xand y-axis of the model coordinate system rotate along with the object. Therefore, this rotation also affects the result of the first two previous rotations. For this reason, Eulerian angles are often thought of as rotating x-, y- and z-axes, with a hierarchy like in a scene graph, as shown in Fig. 5.20.
5.6 Eulerian Angles and Gimbal Lock
159
The first rotation influences only itself, while each subsequent rotation influences all previous ones, according to the rotation order yx z with the total matrix Rz (θz ) · Rx (θx ) · R y (θ y ). At the top of the hierarchy is the z-axis, then the x-axis, and at the lowest level is the y-axis. If, after rotation around these three axes, the y-axis (of the model coordinate system) is rotated again, the result of interpolation between these steps is no longer intuitively apparent at the latest. This is the first disadvantage of the Eulerian angles. Another disadvantage of Eulerian angles in interpolation is the risk of a so-called gimbal lock. As long as all three degrees of freedom of the orientation exist, the interpolation occurs in a straight line. However, as soon as one of the degrees of freedom disappears, at least two axes are involved in an interpolation since they must compensate for the lost degree of freedom. In the animation of this interpolation, it is performed curve-like, which can lead to challenges, especially if this behaviour is not intuitively expected. A degree of freedom can disappear if a rotation around the middle axis in the hierarchy is carried out by 90◦ , and thus the lower hierarchy level coincides with the upper one. Rotations around the upper and lower axes then give the same result. Therefore, one degree of freedom is missing, which must be compensated for by the middle and upper or lower axes. The rotation around the middle axis with 90◦ thus creates a gimbal lock. A gimbal lock can be created if, in the above example, the y-axis of the model coordinate system maps onto the z-axis of the model coordinate system by rotating around the x-axis (which is on the middle hierarchical level) and thus restricts the freedom of movement by one. The first rotation dimension of the y-axis disappears as it becomes identical to rotations around the z-axis. Strictly speaking, in this constellation, a further rotation around the y-axis does the same as a rotation around the z-axis. Only the x-axis extends the degree of freedom. If the z-axis is rotated, this rotation influences all previous rotations since the z-axis is on the highest level of the hierarchy. Of course, this lost degree of freedom can still be realised by rotating the axes, but one needs all three axes to compensate for it (and not just one without gimbal lock). However, if these three rotations are executed simultaneously as animation, the model is not animated in a straight line in the direction of the missing degree of freedom, but experiences a curvilinear movement to the final position, since the interaction of at least two axes is involved. To prevent this, the axis on the middle level of the hierarchy is assigned the frequently occurring rotations, while no rotations of 90◦ and more on the x-axis or the z-axis may be allowed. This means that a suitable change in the hierarchy already prevents a gimbal lock for the application example in question. As a result, the animation, in this case, will be running straight. Nevertheless, a gimbal lock cannot be avoided completely. It always occurs when the axis in the middle hierarchy level is rotated by 90◦ degrees and coincides with the axis of the outer hierarchy level. Additionally, to make things worse, the corresponding Euler angles cannot be calculated unambiguously from a given rotation matrix since a rotation sequence that results in orientation can also be realised by different Euler angles. In summary, it can be said that the Euler angles can challenge the animator. In their pure form,
160
5 Geometry Processing
they are intuitively understandable, but the interaction increases the complexity to such an extent that predicting the actual result is anything but trivial. This fact is demonstrated in the following by the example of an animation of the camera movement. It is assumed in the following that the hierarchy order x yz runs from low to high hierarchy level. In this case, the rotation around the y-axis should not include a 90◦ degree rotation; otherwise, a gimbal lock would result. Assuming that the camera is oriented in the direction of the z-axis and the y-axis runs vertically upwards from the camera. Since the camera has to pan sideways to the left or right very often, i.e., a yaw motion as a rotation around the y-axis, it could achieve the 90◦ degree rotation and thus the gimbal lock. If a new rotation is then applied, for example, around the z-axis, then the other axes will inevitably rotate as well since the z-axis is at the highest level of the hierarchy. An animation from the initial position of the camera to the current one will cause the camera to pan unfavourably because in this animation the values of two axes are actually changed. Therefore, a different hierarchical order is chosen. A camera rarely pans up or down, so it is best to place the x-axis in the middle of the hierarchy. In addition, the camera rarely rolls around, so that the rotation around the z-axis should be placed in the lowest hierarchy level. This results in the rotation order zx y with which the Euler angles for the animated camera movement can be worked with more easily. In aviation, automotive engineering and shipping, the hierarchical order is zyx and axis selection with the calculation of Euler angles is precisely controlled (see DIN 70000, DIN 9300). When designing in CAD, a car points in the direction of the x-axis, i.e., a roll movement causes the rotation around the x-axis. This is simulated when, for example, one drives over a pothole and the car has to sway a little. The axis that protrudes vertically upwards from the vehicle is the z-axis, which triggers a yaw movement as a rotation, if, for example, the vehicle is steered in a different direction. The y-axis causes a pitching movement as rotation, which is used in the animation to create a braking action. With a car, or more clearly with an aeroplane, a gimbal lock is created when the vehicle flies steeply up or down, which would be very unrealistic. In computer games, a nodding motion of at least 90◦ is not allowed. At the beginning of the section, it is explained that an arbitrary rotation only requires the specification of a vector (which describes the axis of rotation) and the corresponding angle (which specifies the rotation around this axis of rotation). This arbitrary rotation has thereafter been realised with a rotation sequence of Eulerian angles, for example, with the rotation sequence yxz and the total matrix M = Rz (θz ) · Rx (θx ) · R y (θ y ). When M is further examined, it turns out that this orthonormal total matrix has an eigenvalue of one. Consequently, there is guaranteed to be an associated eigenvector that maps to itself when the matrix is multiplied by M. M describes the arbitrary rotation. Therefore, a vector exists; this arbitrary rotation does not influence that. However, since only the rotation axis remains in its position, the eigenvector must be the rotation axis. The rotation angle results from trigonometric considerations, for example, the x-axis before and after the total rotation in relation to this rotation axis (see, for example, [2] or the Rodrigues rotation formula for vectors). Thus, only one rotation axis and the corresponding rotation angle are needed to obtain the same result without a gimbal lock. Another advantage is that
5.6 Eulerian Angles and Gimbal Lock
161
less memory is required because instead of a total rotation matrix, only one vector for the rotation axis and one angle must be stored. In large scene graphs with many rotations, it is noticeable.
5.6.1 Quaternions Quaternions work indirectly with the axis of rotation and the associated angle. The application of quaternions, therefore, does not pose the same challenges as the application of Eulerian angles. The theory behind them is not trivial, but the application is intuitive. In addition, they also require less memory to calculate the necessary data. Consequently, quaternions are used in today’s computer graphics for rotations. Quaternions are an extension of the theory of complex numbers in the plane and are associated with geometric algebra. In order to understand them, first a shift and rotation of the points in the complex plane are considered. It is easy to see that a multiplication of two complex numbers of length one causes a rotation in the complex plane, while an addition causes a shift in the complex plane. Consider the following example. Two complex numbers 2+i and 3+4i are given. The multiplication of the two numbers gives (2+i)(3+4i) = 2(3+4i)+i(3+4i) = 2(3 + 4i) + (3i − 4) = 2 + 11i. The resulting vector (2, 11) of the complex plane results from the addition of the double vector (3, 4) with its orthogonal vector (−4, 3) in the complex plane. In Fig. 5.21, the resulting vector is visualised by this addition. It can be seen that the multiplication of the two complex numbers 2+i and 3+4i is identical to a rotation with both x-axis including angles α and β of the corresponding vectors (2, 1) and (3, 4). The length of the resulting vector (2, 11) is identical to the product of the two lengths or amounts of the corresponding vectors (2, 1) and (3, 4).
Fig. 5.21 Multiplication of two complex numbers results in a rotation
162
5 Geometry Processing
Fig. 5.22 Multiplication of two complex numbers gives a rotation
If we consider the addition of the two complex numbers 3 + 4i and 2 + i, the complex number 5+5i or (5, 5) is obtained as a vector in the complex plane, i.e., from the point of view of vector notation, you get a real vector addition (3, 4) + (2, 1) = (5, 5), which means that a displacement in the form of the second vector (2, 1) is applied to the first (3, 4). This is shown in Fig. 5.22. These two properties, which become apparent in multiplication and addition, are generally valid for complex numbers. William Rowan Hamilton extended these properties with quaternions in 1843 with the only difference being that commutativity is lost in quaternions. This means that although a · b = b · a is true for two complex numbers a and b, this is not true for two quaternions a and b. The quaternions are defined below. While a = x0 + i x1 with x0 and x1 are real and i 2 = −1 the complex numbers are defined, quaternions with a = x0 + i x1 + j x2 + kx3 with x0 , . . . , x3 are real and i 2 = j 2 = k 2 = i jk = −1. It can be proved that i j = k, jk = i, ki = j and that i(−i) = 1 (analogous for j and k) and thus −i represents the reciprocal of i (analogous for j and k). From this, it follows directly that i, j and k are not commutative but negative commutative, i.e., i j = − ji (analogous for all unequal couples). If the respective number of real values is compared as dimensions, then it is noticeable that the complex numbers are twodimensional (x0 , x1 ∈ R) and the quaternions are four-dimensional x0 , x1 , x2 , x3 ∈ R (in geometric algebra scalar, vector, bivector and trivector). While the space of complex numbers is called C, the space of quaternions is called H. With the above rules, quaternions can be formed again by any multiplication and addition of quaternions, whereby the properties regarding translation and rotation are inherited according to the complex numbers. Similar to the complex numbers, the calculations are performed, but this time based on the corresponding rules. For
5.6 Eulerian Angles and Gimbal Lock
163
clarification, an example of use is considered below. The multiplication of the two quaternions 1 + 2i − j + 3k and −1 − 2i + j + k is considered. By simple multiplication and observance of the order i, j and k (not commutative), (1 + 2i − j + 3k)(−1 − 2i + j + k) = −1 − 2i + j − 3k − 2i − 4i 2 + 2 ji − 6ki + j + 2i j − j 2 + 3k j + k + 2ik − jk + 3k 2 = −1 − 2i + j − 3k − 2i + 4 − 2i j − 6 j + j + 2k + 1 − 3i + k − 2 j − i − 3 = −1 − 2i + j − 3k − 2i + 4 − 2k − 6 j + j + 2k + 1 − 3i + k − 2 j − i − 3 = 1 − 8i − 6 j − 2k. Now the question arises of what exactly these quaternions mean geometrically. Based on the representations in geometric algebra, the quaternions are represented as vectors of four components. The example therefore contains the two initial vectors (1, 2, −1, 3)T and (−1, −2, 1, 1)T and the result vector (3, −4, −8, 2)T . Note the analogy to the complex numbers, which can also be represented as vectors of two components. The length of a quaternion (x0 , x1 , x2 , x3 )T is defined as
x02 + x12 + x22 + x32 . For the vectors in the example, this means that (1, 2, −1, 3)T √ √ has the√ length 15, (−1, −2, 1, 1)T has the length 7 and (1, −8, −6, −2)T has the length 105. Multiplication of two quaternions of length one causes a rotation. An addition results in a translation. It is astonishing that this rotation can be read directly from the result vector, as explained in the following. The vector (x0 , x1 , x2 , x3 )T can be thought of as a two-part vector. Under the constraint that it has the length one, roughly spoken (x0 , x1 , x2 , x3 )T represents the cosine of the rotation angle and (x1 , x2 , x3 )T the corresponding rotation axis. This means that what initially proved to be very difficult to calculate under the usual analytical tools—to calculate an axis of rotation with the angle of rotation— and therefore the detour via the Euler angles was accepted, now turns out to be very elegant when using the quaternions. If we assume that the length of the quaternion q is one, then the following relationships apply to rotations about an axis. The rotation angle is δ = 2 arccos(x0 ) and the rotation axis a of length one gives ⎛ x1 ⎞ ⎛ ⎞ δ 2) ⎟ ax ⎜ sin( x2 ⎟ ⎝ ay ⎠ = ⎜ ⎜ sin( δ ) ⎟ . 2 ⎠ ⎝ a x3 sin( 2δ )
z
Thus, for the corresponding quaternion q, ⎛ ⎞ ⎛ cos( δ ) ⎞ 2 x0 ax · sin( 2δ ) ⎟ ⎜ x1 ⎟ ⎜ ⎜ ⎟ ⎜ ⎟=⎜ . ⎝ x2 ⎠ ⎝ a y · sin( δ ) ⎟ 2 ⎠ x3 a · sin( δ ) z
2
164
5 Geometry Processing √ 1 (1, −8, −6, −2)T . Therefore, 105 in approximately 84, 4◦ for the
In the example, the normalised result vector is x0
=
√1 105
=
cos(δ). This results
angle δ. The axis of rotation is thus
√ 1 (−8, −6, −2)T = 105 sin(84.4◦ ))T . The axis
− √ 2 (4, 3, 1)T = 105
(ax · sin(84.4◦ , a y · sin(84.4◦ ), az · of rotation is thus (ax , a y , az )T = (−0.79, −0.59, −0.2)T . The rotation matrix R resulting from a quaternion q looks like this ⎛
⎞ (x02 + x12 − x22 − x32 ) 2(x1 x2 − x0 x3 ) 2(x1 x3 + x0 x2 ) R = ⎝ 2(x1 x2 + x0 x4 ) (x02 − x12 + x22 − x32 ) 2(x2 x3 − x0 x1 ) ⎠ . 2(x2 x3 + x0 x1 ) (x02 − x12 − x22 + x32 ) 2(x1 x4 − x0 x2 ) If we assume that the quaternion q has the length one and thus x02 + x12 + x22 + x32 = 1, this matrix can be simplified to ⎞ ⎛ (1 − 2x22 − 2x32 ) 2(x1 x2 − x0 x3 ) 2(x1 x3 + x0 x2 ) R = ⎝ 2(x1 x2 + x0 x4 ) (1 − 2x12 − 2x32 ) 2(x2 x3 − x0 x1 ) ⎠ . 2(x1 x4 − x0 x2 ) 2(x2 x3 + x0 x1 ) (1 − 2x12 − 2x22 ) If we look at the multiplication of two quaternions from this point of view q = (x0 , x1 , x2 , x3 )T and q = (x0 , x1 , x2 , x3 )T so ⎞ ⎛ x0 x0 − x1 x1 − x2 x2 − x3 x3 ⎜ x0 x + x x1 + x2 x − x3 x ⎟ 1 0 3 2⎟ q · q = ⎜ ⎝ x0 x + x2 x + x3 x − x1 x ⎠ . 2 0 1 3 x0 x3 + x3 x0 + x1 x2 − x2 x1 Obviously ⎞ ⎞ x1 ⎟ ⎜ x0 x0 − (x1 , x2 , x3 ) ⎝ x2 ⎠ ⎟ ⎜ ⎟ ⎜ 3 ⎛ ⎞ ⎛ ⎞ ⎛ x⎞ ⎛ ⎞⎟ q · q = ⎜ ⎜ x1 x1 x1 x1 ⎟ ⎟ ⎜ ⎝ x0 ⎝ x ⎠ + x ⎝ x2 ⎠ + ⎝ x2 ⎠ × ⎝ x ⎠ ⎠ 2 2 0 x3 x3 x3 x3 ⎛
⎛
where quaternion results, i.e., the first line represents the angle and the second line the (three-dimensional) axis of rotation of the total rotation. Note the cross product in the calculation. Quaternions can be used in the same way as homogeneous rotation matrices to concatenate rotations connected in series. For the inverse of a quaternion q q −1 = (x0 , −x1 , −x2 , −x3 )T where the quaternion 1 = (1, 0, 0, 0)T forms the neutral element. If one wants to rotate a point P = (x, y, z) with a given quaternion q, this point must first be transformed into a quaternion (0, x, y, z)T . Then the rotation is performed in the following way, where the resulting point P = (x , y , z ) follows (0, x , y , z )T = q(0, x, y, z)T q −1 .
5.7 Clipping Volume
165
Therefore, if the rotationaxis is a three-dimensional vector a and the rotation cos( 2δ ) angle δ is given, then q = . The last three components extracted from sin( 2δ )a the quaternion q correspond to the point P’ in Cartesian coordinates. If, on the other hand, there is no axis of rotation, but one object must be transferred to another, the axis of rotation and the angle must be calculated. Assume that a vector v is to be transferred into another vector v . Then the axis of rotation must be perpendicular to both vectors, which is calculated using the cross product of the two vectors: n = v × v . If applicable, the resulting axis must then be normalised if the length is not equal to one. The angle of rotation corresponds to the angle between the two vectors and can, therefore, be determined by the scalarproduct of δ cos( ) 2 the two vectors, since v T v = cos(δ) applies. Again the quaternion q = sin( 2δ )n performs the rotation. To sum up, it can be said that quaternions contain the same information as homogeneous matrices. With quaternions, there is no gimbal lock. In addition, memory space is saved because a quaternion takes up only four memory locations, whereas 4 · 4 = 16 memory locations are required for each homogeneous matrix. The following rules follow the Euler angles α, β and γ and quaternions q = (x0 , x1 , x2 , x3 )T where q still has the length one, i.e., x02 + x12 + x22 + x32 = 1. The Euler angles α, β and γ can be calculated from a quaternion as follows: 2(x1 x2 + x0 x3 ) α = arctan x02 + x12 − x22 − x32 β = arcsin(2(x0 x3 − x1 x3 )) 2(x2 x3 + x0 x1 ) γ = − arctan . −(x02 − x12 − x22 + x32 ) Note that it is also possible to calculate the quaternion from the Eulerian angles. For this purpose, we refer, for example, to [2].
5.7 Clipping Volume The representation of the section of a three-dimensional model world that a viewer can see requires a number of details about how the viewer’s look is to be represented in the model world. The coordinates of the point in which the observer is located and the direction in which he looks must be specified. However, this information is not yet sufficient. The projection plane must be specified. It corresponds to the plane of the display medium, usually the plane of the screen with which the viewer sees the scene as if through a window. The screen or the display window on the screen can only show a finite section of the projection plane. Usually, this is a rectangular
166
5 Geometry Processing
Fig. 5.23 The field of view angle determines the area on the projection plane that corresponds to the window width
cutout. Instead of specifying a corresponding rectangle on the projection plane, an angle is often specified that defines the viewer’s field of view. This angle defines how far the observer’s field of view opens to the left and right. This results in width on the projection plane that corresponds to the width of the display window on the screen. The height of the area on the projection plane is selected in proportion to the height of the display window. Figure 5.23 shows a top view of the observer’s field of view. In principle, this information is sufficient for clipping calculation. The threedimensional clipping area—the clipping volume—corresponds to a pyramid of infinite height in the case of perspective projection or to a cuboid extending infinitely in one direction in the case of parallel projection. The visibility of a human being is theoretically almost unlimited. One can see stars light-years away as well as the frame of a pair of glasses right in front of the eyes. However, when seeing, the eyes are focussed at a distance, so that it is not possible to see a very close and a very distant object in focus at the same time. If, for example, one fixates on a distant object and then holds a finger relatively close in front of one eye, one hardly sees this finger. Conversely, when reading a book, one does not notice how a bird flies past in the distance. Therefore, the range of vision that can be seen in a blind eye usually extends from a certain minimum to a certain maximum distance. This property is modelled in computer graphics by defining a front (near) and a rear (far) clipping plane. In a perspective projection, the clipping volume thus takes the shape of a truncated pyramid, while a parallel projection provides a cuboid as clipping volume. The projection plane is at the distance at which the eyes of the viewer are optimally focussed. The near and far clipping planes correspond to the smallest and largest distance at which the viewer can still perceive objects when focussing on the projection plane. The projection plane is usually between the near and far clipping planes. For the sake of simplicity, the projection plane is assumed identical to the near clipping level in the considerations of this book. Objects that lie in front of the projection plane should give the viewer the impression that they are in front of the screen. However, this effect can only be achieved with techniques that support the stereoscopic vision, which is discussed in the Sect. 11.12. The relationship between clipping volume, near and far clipping plane, projection plane and projection type is shown in Fig. 5.24.
5.7 Clipping Volume
167
Fig. 5.24 The clipping volume for the parallel projection (top) and perspective projection (bottom)
In Sect. 5.8, it has already been explained how each projection can be split into a transformation and a subsequent projection onto an image plane parallel to the xy-plane. Therefore, the three-dimensional clipping can be easily and efficiently implemented. First, the transformation T is applied to all objects. If the objects transformed in this way are mapped to the xy-plane by means of parallel projection, the transformed clipping volume corresponds to a cuboid whose edges are parallel to the coordinate axes. This cuboid can be defined by two diagonally opposite corners (xleft , ybottom , −z near ) and (xright , ytop , −z far ). To determine whether an object lies within this clipping volume, it is only necessary to check whether at least one point (x , y , z ) of the object lies within the box. This is exactly the case if xleft ≤ x ≤ xright
und
ybottom ≤ y ≤ yright
und
− z near ≤ z ≤ −z far
applies. Therefore, for a plane polygon, these comparisons only need to be performed for all corners to determine whether the polygon lies within the clipping volume. If the objects transformed in this way are mapped by means of perspective projection onto the x y-plane, the transformed clipping volume corresponds to the truncated pyramid. Again, there is a near clipping and a far clipping level, and the frustum in between.
168
5 Geometry Processing
5.8 Orthogonal and Perspective Projections In Sect. 5.2, transformations have been used to transform geometric objects into position or move a scene. To display a three-dimensional scene, a projection is required on a two-dimensional plane, which corresponds to the computer monitor, for example. These projections can also be realised by means of geometric transformations. When displaying a three-dimensional scene on a common output device, such as a 2D screen, a viewer’s point of view and a projection plane must be defined. The viewer looks in the direction of the projection plane, which has a window with a view of the three-dimensional world. The projection of an object on this plane is obtained by connecting rays emitted from a projection centre around the point of view corresponding to the observer’s point of view to the points of the object to be projected and calculating the points of impact in the projection plane. Only the points are projected (no surfaces). This procedure, illustrated in Fig. 5.25 on the left, is called perspective projection. If the centre of projection is moved further and further perpendicular to the plane of projection in the opposite direction, the result is an infinite parallel projection beam. If such a projection direction is given instead of a projection centre, it is called parallel projection. In this case, the projection is of an object by letting rays parallel to the direction of projection emanate from the points of the object and calculating the points of intersection with the projection plane. Usually, the projection direction is perpendicular to the projection plane. The parallel projection is shown on the right in Fig. 5.25. At first, the parallel projection with the plane z = z 0 as a projection plane that is, the projection plane is parallel to the x y-plane. With this parallel projection, the
Fig. 5.25 Perspective (left) and parallel projection (right)
5.8 Orthogonal and Perspective Projections
169
Fig. 5.26 Mapping of any plane to a plane parallel to the x y-plane
point (x, y, z) is mapped to the point (x, y, z 0 ). In the homogeneous coordinates, this figure can be represented as matrix multiplication as follows: ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ x 100 0 x ⎜y⎟ ⎜0 1 0 0 ⎟ ⎜ y ⎟ ⎜ ⎟ = ⎜ ⎟ ⎜ ⎟ (5.6) ⎝ z0 ⎠ ⎝ 0 0 0 z0 ⎠ · ⎝ z ⎠ . 1 000 1 1 This allows any parallel projection to be described in the form of matrix multiplication in homogeneous coordinates. If the projection is to be made on a plane that is not parallel to the x y-plane, only a corresponding transformation must be connected in front of the projection matrix in Eq. (5.6), which maps the given projection plane on a plane parallel to the x y-plane. This can always be achieved by a rotation around the y-axis and a subsequent rotation around the x-axis, as shown in Fig. 5.26. To understand parallel projection, it is therefore sufficient to consider only projection planes parallel to the x y-plane. In the case of another projection plane, one can instead imagine a transformed scene in the same way, which in turn is then projected onto a plane parallel to the x y-plane. The parallel projection in the OpenGL maps the visual volume onto a unit cube whose length, width and height are in the interval [−1, 1] (see Fig. 5.27). In this illustration, there is a change from the right-handed to the left-handed coordinate system, i.e., the z-values are negated. To achieve this, the values a1 , a2 , a3 , b1 , b2 and b3 must be calculated in the following matrix. ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ a1 0 0 b1 x xc a1 · x + b1 ⎜ 0 a2 0 b2 ⎟ ⎜ y ⎟ ⎜ a2 · y + b2 ⎟ ⎜ yc ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ (5.7) ⎝ 0 0 −a3 b3 ⎠ · ⎝ z ⎠ = ⎝ −a3 · z + b3 ⎠ = ⎝ z c ⎠ 0 0 0 1 1 1 1 Each component is calculated individually in the following.
170
5 Geometry Processing
Fig. 5.27 Calculation of the orthogonal projection in the OpenGL
The following dependencies apply to the x-coordinate. 1. x → xleft , xc → −1: (a1 · x + b1 = xc ) → (a1 · xleft + b1 = −1), 2. x → xright , xc → 1: (a1 · x + b1 = xc ) → (a1 · xright + b1 = 1). The first equation is solved after b1 , resulting in b1 = −1 − a1 · xleft and inserted into the second equation as follows: a1 · xright + (−1 − a1 · xleft ) = 1 ⇔ a1 · xright − a1 · xleft = 2 ⇔ a1 · (xright − xleft ) = 2 ⇔ a1 =
2 . xright − xleft
For b1 , this results in −1 − (
xleft − xright − 2 · xleft xright + xleft 2 ) · xleft = =− . xright − xleft xright − xleft xright − xleft
For the y-component, analogously, a2 =
2 ytop −ybottom
y
+x
bottom und b2 = − ytop . top −ybottom
The following dependencies apply to the z-component. 1. z → −z near , z c → −1: (−a3 · z + b3 = z c ) → (a3 · z near + b3 = −1), 2. z → −z far , z c → 1: (−a3 · z + b3 = z c ) → (a3 · z far + b3 = 1).
5.8 Orthogonal and Perspective Projections
171
Fig. 5.28 Calculation of the perspective projection
y y z
z
y’ z0 x
It should be noted that z near and z far are both positive. Analogously, follows a3 =
2 z far −z near
+z near and b3 = − zz far . far −z near
The final results for the orthogonal projection in the OpenGL are as follows: ⎛ xright +xleft ⎞ 2 ⎛ ⎞ 0 0 − xright −xleft x ⎟ ⎜ xright −xleft ytop +xbottom ⎟ ⎜ 2 ⎜y⎟ 0 0 − ⎟ ⎜ y −y y −y ⎜ ⎟. top top bottom bottom ⎟ · (5.8) ⎜ ⎜ z far +z near ⎟ ⎝ z ⎠ 2 0 0 − z far −z near − z far −z near ⎠ ⎝ 1 0 0 1 Perspective projection can also be consider in homogeneous coordinates as matrix–vector multiplication. For this purpose, a perspective projection with a projection centre at the coordinate origin and a projection point facing the (x, y)-plane at z = z 0 , as shown in Fig. 5.28. The point (x, y, z) is projected onto a point (x , y , z 0 ) on this projection plane as shown below. The ray theorem results in x z0 = x z
and
y z0 = y z
respectively
z0 z0 ·x and y = · y. z z This perspective projection thus maps the point (x, y, z) to the point
z0 z0 · x, · y, z 0 . (x , y , z 0 ) = z z x =
(5.9)
In the homogeneous coordinates, this mapping can be written as follows: ⎛ ⎞ ⎞ ⎛ ⎛ ⎞ ⎛ ⎞ x ·z z0 0 0 0 x · z0 x ⎜ y · z ⎟ ⎜ 0 z0 0 0 ⎟ ⎜ y ⎟ ⎜ y · z0 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ (5.10) ⎝ z0 · z ⎠ = ⎝ z · z0 ⎠ = ⎝ 0 0 z0 0 ⎠ · ⎝ z ⎠ . z z 0 0 1 0 1
172
5 Geometry Processing
If the resulting point is represented in Cartesian coordinates, i.e., if the first three components are divided by the fourth component, the searched point (5.9) is obtained. The matrix for the perspective projection in Eq. (5.10) does not have as last row (0, 0, 0, 1), as all other matrices discussed so far. Therefore, the result, in this case, is not available in normalised homogeneous coordinates. Analogous to parallel projection, the special choice of the coordinate origin as the projection centre with a projection plane parallel to the x y-plane does not represent a real limitation in the case of perspective projection. Every perspective projection can be assigned to this special perspective projection. To do this, one moves the projection centre to the coordinate origin with the help of a transformation. Then, just like in parallel projection, the projection plane can be mapped to a plane parallel to the x y-plane by two rotations. If the perspective projection is mirrored at the x y-plane, i.e., the eyepoint remains at the coordinate origin and the projection centre changes from positive z 0 - to negative −z 0 -coordinate, this case can easily be derived from the previous one and the corresponding matrix can be set up ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ x · (−z) z0 0 0 0 x · z0 x ⎜ y · (−z) ⎟ ⎟ ⎜ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ = ⎜ y · z0 ⎟ = ⎜ 0 z0 0 0 ⎟ · ⎜ y ⎟ . (5.11) ⎝ z0 · z ⎠ ⎝ 0 0 z0 0 ⎠ ⎝ z ⎠ ⎝ z · z0 ⎠ −z −z 0 0 −1 0 1 It has already been mentioned that every perspective projection can be traced back to a perspective projection with a projection centre at the coordinate origin with a projection plane parallel to the x y-plane. To understand perspective transformations, it is therefore sufficient to examine the perspective projection in Eq. (5.11). All other perspective projections can be understood as this perspective projection with an additional preceding transformation of the object world. A point of the form 0, 0, w1 with w ∈ IR, w = 0 is considered. In the homo geneous coordinates, this point can be written in the form 0, 0, w1 , 1 . This point is written by matrix multiplication (5.11) to the point 0, 0, − zw0 , w1 in homogeneous coordinates. In Cartesian coordinates, this means
1 → (0, 0, −z 0 ) . 0, 0, w If one lets the parameter w go towards zero, the starting point 0, 0, w1 moves towards infinity on the z-axis, while the pixel converts towards the finite point (0, 0, −z 0 ). The infinitely distant object point on the negative z-axis is thus converted to a concrete and non-infinite pixel that is displayed. If one considers all lines, which run through 0, 0, w1 , the lines transformed by the matrix multiplication (5.11) intersect in the image in the homogeneous point 0, 0, − zw0 , w1 . If one lets w again go towards zero, the set of lines changes into the lines parallel to the z-axis, which do not intersect. However, the transformed lines in the image all intersect at the point (0, 0, −z 0 ). This point is also called vanishing point. These theoretical calculations prove the well-known effect that in a perspective projection, parallel lines running away from the viewer intersect at one point, the
5.8 Orthogonal and Perspective Projections
173
Fig. 5.29 Vanishing point for perspective projection
vanishing point. Figure 5.29 shows a typical example of a vanishing point, actually parallel iron rails that appear to converge. Only for the lines running to the rear, parallelism is not maintained in this perspective projection. Lines that are parallel to the x- or y-axis also remain according to the projection parallel. If a projection plane is selected that intersects more than one coordinate axis, multiple escape points are obtained. If the projection plane intersects two or even all three coordinate axes, two or three vanishing points are obtained. These are then referred to as two or three-point vanishing points. Figure 5.30 illustrates the effects of displaying a cube with one, two or three vanishing points. It has been shown that any projection of a scene can always be represented with the standard transformations with subsequent projection according to Eq. (5.11). A change of the observer’s point of view corresponds to a change of the transformations which are carried out before this projection. In this sense, a movement of the observer can also be interpreted in such a way that the observer remains unchanged, infinitely far away, since a parallel projection is used. Instead of the observer’s movement, the entire scene is subjected to a transformation that corresponds exactly to the observer’s inverse movement. To model the viewer’s movements, therefore, only the transformation group of the entire scene needs to be extended by a corresponding transformation. For example, a rotation of the observer to the left is realised by a rotation of the entire scene to the right. Again, place the camera as an eyepoint in the coordinate origin with the direction of view in the negative z-axis. In the following, a further special case of a perspective projection is used in the OpenGL fixed-function pipeline. This projection additionally calculates in one step the visible volume, the so-called clipping volume, which is explained in more detail in Sect. 5.7. This is the area to be displayed in which the visible objects are located. Rendering must only be carried out for these objects. In the following Sect. 5.9, this case is explained in more detail according to the OpenGL situation.
174
5 Geometry Processing
Fig. 5.30 One-, two- and three-point perspective
5.9 Perspective Projection and Clipping Volume in the OpenGL In OpenGL, the clipping volume is limited by means of perspective projection between the near and far clipping plane. For simplicity, the near clipping plane corresponds to the image plane in the following. Figures 5.31 and 5.32 visualise this situation. In [3], the perspective projection is treated in more detail with the image plane not identical to the near clipping plane and is recommended for deepening. Note the similarity of the procedures for determining the corresponding transformation matrices. The lower-left corner of the near clipping plane has the coordinates (xleft , ybottom , −z near ) and the upper-right corner (xright , ytop , −z near ). The far clipping plane has the z-coordinate −z far . All objects that are inside the frustrum should be mapped into the visible area of the screen. OpenGL makes use of an intermediate step by mapping the vertices of the objects from the frustrum of the visible volume of the pyramid into a unit cube. This unit cube has the characteristic [−1, 1] in x-, y- and z-direction. This means that the frustrum has to be transferred into this unit cube by a suitable matrix. It applies to object points (x, y, z) and their image points (x , y , −z near ): xleft ≤ x ≤ xright ⇔ 0 ≤ x − xleft ≤ xright − xleft ⇔ 0 ≤
x − xleft ≤1⇔ xright − xleft
5.9 Perspective Projection and Clipping Volume in the OpenGL
175
Fig. 5.31 Calculating the perspective projection in the OpenGL
Fig. 5.32 Calculating the perspective projection: Camera coordinate system in the OpenGL with U , V and N base vectors
176
5 Geometry Processing
x − xleft x − xleft ≤ 2 ⇔ −1 ≤ 2 −1≤1⇔ xright − xleft xright − xleft xright − xleft xright + xleft 2x x − xleft − ≤1 ⇔ −1 ≤ − ≤1⇔ −1 ≤ 2 xright − xleft xright − xleft xright − xleft xright − xleft xright +xleft 2 · z near · x x −z near −1 ≤ ≤ 1 mit − = from the ray theorem −z(xright − xleft ) xright − xleft x z
xright + xleft 1 2 · z near ·x+ ·z · ≤ 1. ⇔ −1 ≤ xright − xleft xright − xleft −z 0≤2
Analogously, one can conclude from ybottom ≤ y ≤ ytop folgern
ytop + ybottom 1 2 · z near ·y+ ·z · −1 ≤ ≤ 1. ytop − ybottom ytop − ybottom −z Accordingly, the following applies to the standardised clip coordinates in the Cartesian representation:
xright + xleft 2 · z near 1 ·x+ ·z · xc = xright − xleft xright − xleft −z and yc =
ytop + ybottom 2 · z near 1 ·y+ ·z · . ytop − ybottom ytop − ybottom −z
What remains is the calculation of the normalised clip coordinates in the zdirection and thus the entries a1 and a2 in the following matrix with homogeneous coordinates ⎛
⎞
⎛
xc · (−z) ⎜ ⎜ ⎜ yc · (−z) ⎟ ⎜ ⎟ = ⎜ ⎜ ⎝ z c · (−z) ⎠ ⎜ ⎝ −z
2·z near xright −xleft
0
0
2·z near ytop −ybottom
xright +xleft xright −xleft ytop +ybottom ytop −ybottom
0 0
0 0
a1 −1
⎞
⎛ x ⎟ ⎟ ⎜ 0 ⎟ ⎜y ⎟·⎝ z a2 ⎟ ⎠ 1 0 0
⎞ ⎟ f⎟ ⎠
(5.12)
z c · (−z) = a1 z + a2 apply. There are two cases: First, the mapping of the points from the near clipping plane to −1, and second, the mapping of the points from the far clipping plane to 1. From these considerations, two equations with two unknowns result a1 and a2 in the following manner. 1. For (z, z c ) = (−z near , −1), the following applies −a1 · z near + a2 = −z near and 2. for (z, z c ) = (−z far , 1) applies −a1 · z far + a2 = z far .
5.9 Perspective Projection and Clipping Volume in the OpenGL
177
Note the nonlinear relation between z and z c . For z c , there is a high precision near the near clipping plane, while near the far clipping plane the precision is very low. This means that small changes of the object corner points near the camera are clearly displayed, while in the background they have little or no influence on the clipping coordinates. If the near and far clipping planes are far apart, precision errors can, therefore, occur when investigating, for example, the visibility of objects near the far clipping plane. Hiding objects near the far clipping plane receive the same z-coordinate and compete with each other due to these precision errors and therefore become alternately visible and invisible. This leads to an unwanted flickering, which is called z-fighting. It is therefore important that the scene to be displayed is enclosed as closely as possible between the near and far clipping planes to keep the distance between these two planes as small as possible. Alternatively, a 64-bit floating point representation instead of 32-bit helps to double the number of numbers that can be displayed. After substituting the two equations into each other, the following results: 1. a2 = a1 · z near − z near , used in +z near . 2. −a1 · z far + (a1 · z near − z near ) = z far and converted into a1 = − zz far far −z near a1 is used in the first equation,
z far +z near z far −z near
· z near + a2 = −z near , and converted into
z far + z near z far + z near · z near = −(1 + ) · z near z far − z near z far − z near z far − z near z far + z near 2 · z far · z near = −( + ) · z near = − . z far − z near z far − z near z far − z near
a2 = −z near −
This results in the projection matrix for a general frustum to convert into the Normalised Device Coordinate (NDC), which is the projection matrix (GL PROJECTION) of the fixed-function pipeline in the OpenGL. ⎞
⎛
⎛
xc · (−z) ⎜ ⎜ ⎜ yc · (−z) ⎟ ⎟ = ⎜ ⎜ ⎜ ⎝ z c · (−z) ⎠ ⎜ ⎝ −z
2·z near xright −xleft
0
0
2·z near ytop −ybottom
0
0
0
0
xright +xleft xright −xleft ytop +ybottom ytop −ybottom +z near − zz far far −z near
far ·z near − 2·z z far −z near
−1
0
0 0
⎞
⎛ ⎞ ⎟ x ⎟ ⎜ ⎟ ⎟ ⎜y⎟ ⎟ · ⎝ ⎠ . (5.13) z ⎟ ⎠ 1
The frustum can take on special forms that further unify the projection matrix. If the visible volume of the frustum is symmetrical, i.e., xleft = −xright and ybottom = −ytop , then the projection matrix is reduced ⎛ z near ⎞ xc · (−z) xright ⎜ 0 ⎜ yc · (−z) ⎟ ⎜ ⎜ ⎟ ⎝ z c · (−z) ⎠ = ⎜ ⎝ 0 −z 0 ⎛
0 z near ytop
0 0
0 0
+z near far ·z near 0 − zz far − 2·z z far −z near far −z near 0 −1 0
⎞ ⎛ ⎞ x ⎟ ⎜ ⎟ ⎟ ⎜y⎟ ⎟ · ⎝ ⎠. z ⎠ 1
(5.14)
178
5 Geometry Processing
Fig. 5.33 Calculation of the perspective projection depending on the angle of view θx in the OpenGL
Fig. 5.34 Calculation of the perspective projection depending on the angle of view θ y in the OpenGL
Interestingly, this matrix can be derived if only the angle of view, the aspect ratio of the image area, z near and z far are available as parameters. The question arises how the coordinates xleft , xright , ybottom and ytop can be derived from this. First, the simple case is considered that the aspect ratio of the image area is 1:1, i.e., the image area is square. For this purpose, the image angles θx in x- and θ y in y-direction are added in the following (see Figs. 5.33 and 5.34). In most cases, the angle of view θ y is given, from which θx can be calculated with trigonometric considerations. According to trigonometric considerations, tan(
θx )= 2
1 2
· (xright − xleft ) xright − xleft = z near 2 · z near
and tan(
θy )= 2
1 2
· (ytop − ybottom ) ytop − ybottom = . z near 2 · z near
5.9 Perspective Projection and Clipping Volume in the OpenGL
179
Accordingly, with the reciprocal value and due to the symmetry, cot(
z near θx 2 · z near = )= 2 xright − xleft xright
and cot(
θy z near 2 · z near = . )= 2 ytop − ybottom ytop
The distance between the eyepoint of the camera and the image area is called focal length f . With a normalised image area in x- and y-direction in the interval [−1, 1], the symmetrical case results in an enlargement of the image angle θx or θ y a zoom 2·z near z near out and when the image angle is reduced, a zoom in. The ratio 1f = xright −xleft = xright and
f 1
=
2·z near ytop −ybottom
=
z near ytop
continues to exist with varying f after the ray theorem. θ
Hence, the following applies f = cot( θ2x ) = cot( 2y ) and thus the following matrix: ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ f 0 0 0 x xc · (−z) ⎟ ⎜y⎟ ⎜ yc · (−z) ⎟ ⎜0 f 0 0 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ (5.15) ⎝ z c · (−z) ⎠ = ⎝ 0 0 − z far +z near − 2·z far ·z near ⎠ · ⎝ z ⎠ . z far −z near z far −z near −z 1 0 0 −1 0 θ
Additionally, xright = ytop = −xleft = −ybottom = tan( θ2x )· z near = tan( 2y )· z near . In this connection, the nonlinear relation between z and z c whose equation can now be specified is as follows: zc =
z far + z near z far · z near 1 +2· · . z far − z near z far − z near z
For example, if z far = 30 and z near = 10 and thus the formula z c = 2 + 30 z . This results in a mapping between the z-values −10 and −30 as shown in Fig. 5.35. For z c , high precision exists near the near clipping plane, while near the far clipping plane, the precision is very low. This means that small changes in the object corner points are displayed near the camera while they are hardly or not perceptible in the background. Therefore, if the two near and far clipping planes are far apart, precision errors may occur when investigating, for example, the visibility of objects near the far clipping plane. Obscuring objects near the far clipping plane receive,
Fig. 5.35 Nonlinear relation between z and z c
180
5 Geometry Processing
due to these precision errors, the same z-coordinate and compete. As a result, they become alternately visible and invisible, respectively. This leads to unwanted flickering, which is called z-fighting. It is therefore essential to frame the scene to be displayed as closely as possible between the near and far clipping planes to keep the distance between these two planes as small as possible. Alternatively, a 64-bit floating point representation instead of 32-bit also helps double the number of numbers displayed. This means that the further away the objects are, the more they “crowd” into the far clipping plane due to the nonlinear mapping. However, since the order is not changed from the near clipping plane to the far clipping plane, this distortion is not relevant for this order, and depth calculations can still be performed. It should be noted that there is a difference in precision in that more bits are relatively devoted to values further ahead than to values further back. In OpenGL, the image area is internally oriented in x- and y-direction in the interval [−1, 1], i.e., symmetrically in the aspect ratio 1:1. What remains is to consider the case where the aspect ratio is not equal to 1:1. Assuming that the ratio is wh with unequal image width w and image height h, i.e., the image area is rectangular. Then the properties of the symmetrical case remain, i.e., xleft = −xright and ybottom = −ytop . However, the aspect ratio wh influences the deviation betweenxright and ytop and thus also the relationship between xleft and ybottom . xright = ytop · wh applies. In addition, with trigonometric considerations, θ
ytop = tan( 2y ) · z near is true. Therefore, the matrix changes to ⎛ ⎛ ⎞ f xc · (−z) aspect ratio ⎜ ⎜ yc · (−z) ⎟ 0 ⎜ ⎜ ⎟ ⎝ z c · (−z) ⎠ = ⎜ ⎝ 0 −z 0
0 0 0 f 0 0 +z near 2·z far ·z near − 0 − zzfar z far −z near far −z near 0 −1 0
⎞ ⎛ ⎞ x ⎟ ⎜ ⎟ ⎟ ⎜y⎟ ⎟ · ⎝ ⎠. z ⎠ 1
(5.16)
An asymmetric frustum is used in VR. After performing the perspective division (from homogeneous to the Cartesian coordinates by division with the fourth component −z), one obtains the coordinates of the vertices (xc , yc , z c ) in the unit cube, the so-called normalised device coordinate (NDC). The separation between projection and perspective division is essential in the OpenGL since clipping can be performed without the perspective projection. The clipping limits are between [−|z|, |z|] for each dimension x, y and z without the perspective division.
5.10 Viewing Pipeline: Coordinate System Change of the Graphical Pipeline This section summarises the geometric transformations discussed in the previous section in the correct order in which they pass through the graphical pipeline. This part of the pipeline is also called the viewing pipeline and is subordinate to the process of vertex processing (see Fig. 5.36).
5.10 Viewing Pipeline: Coordinate System Change of the Graphical Pipeline
181
Fig. 5.36 Transformations of the viewing pipeline with the associated coordinate systems
Initially, the objects are created in the model coordinate system. Then they are arranged in the scene by means of scaling, rotation, translation or distortion and thus placed in the world coordinate system. Afterwards, they are displayed from the camera’s point of view. The previous calculations of Sect. 5.8 assumed that the camera is located at the coordinate origin of the world coordinate system and looks in a negative direction (negative z-axis of the world coordinate system). Since this is usually not true, the transformation from the world coordinate system to the camera coordinate system must be considered in the following. First of all consideration of the transformation
182
5 Geometry Processing
Fig. 5.37 Transformations from Model to World to Camera Coordinate System
from model coordinates to world coordinates of an object is necessary because the model view matrix of OpenGL contains the overall transformation from the model coordinate system to camera coordinate system. The camera coordinate system is also called the view reference coordinate system (VRC). Following the results of Sect. 5.3, the following matrices can be set up very easily (see Figs. 5.32 and 5.37). • Matrix MMC→WC from model coordinate system MC to world coordinate system WC and • Matrix MVRC→WC from camera coordinate system VRC to world coordinate system WC. Assuming that the axes of the model coordinate system are spanned by the threedimensional column vectors m x , m y and m z , and the coordinate origin is Pmodel in reference to the world coordinate system, then applies in homogeneous coordinates
m x m y m z Pmodel . . (5.17) MMC→WC = 0 0 0 1 Analogously, if the axes of the camera coordinate system are spanned by the three-dimensional column vectors U, V and N, and the coordinate origin would be Peye from the point of view of the world coordinate system, then it can be seen that in the homogeneous coordinates
U V N Peye (5.18) MVRC→WC = 0 0 0 1
5.10 Viewing Pipeline: Coordinate System Change of the Graphical Pipeline
183
applies. However, since the world coordinate system is to be mapped into the camera −1 must be applied. coordinate system, the inverse MVC→WC The overall matrix GL_MODELVIEW in the OpenGL thus results from matrix −1 · MMC→WC . multiplication MVC→WC The input parameters in the OpenGL for the function gluLookAt for the definition of the camera coordinate system are not directly the axes U, V and N and its coordinate origin, but the eyepoint Peye , the up-vector Vup and the Reference point Pobj . The up-vector Vup is a vector from the point of view of the world coordinate system which points upwards, mostly (0, 1, 0)T is used as a vector of the y-axis. In order to calculate the axes of the camera coordinate system, and thus the matrix MVC→WC , the following calculations are performed. Let Pobj be a vertex of the object. P −Peye is calculated that specified First, the normalised connection vector F = |Pobj obj −Peye | the direction of view of the camera. The non-named vector U is perpendicular to the direction of view of the camera and the up-vector and thus results from the cross product U = F × Vup . The normalised vector V is, of course, orthogonal to the two vectors U and F and is calculated by V = U × F. Finally, the normalised vector N is identical to −F. Then a projection, which is usually the perspective projection, is made into the clipping coordinates. Then clipping (see Sect. 8.1) is applied. Let w be the fourth component of the homogeneous coordinates (see Sect. 5.1.1), the one that must be divided by in the perspective division to obtain the corresponding Cartesian coordinates. The advantage of performing the clipping calculations already now and not only in the next step after the perspective division is based on the fact that clipping can be applied to −w, w. The corresponding efficiency is even identical and the special case w = 0 does not need to be treated. In addition, the perspective division does not have to be carried out subsequently for all transformed key points, which leads to an increase in efficiency. Then, by applying the perspective division, the frustum is mapped into the normalised device coordinate system (NDC), which is a unit cube in the dimension [−1, 1] of all three components. The corresponding matrix in the OpenGL is the GL_PROJECTION Matrix. Finally, the mapping is done to the window of the respective device, which is represented in the window space coordinate system (DC). In the last step, the clip coordinates are mapped to the respective window of the end device. Assuming that for the DC coordinate system of the window, the x-axis is directed to the right and the y-axis to the top. The coordinate origin would be at (x, y). The mapping of the unit cube from the normalised device coordinate system (NDC) into this two-dimensional coordinate system would then have to be carried out using the following rules: 1. −1 → x, 1 → (x + width); 2. −1 → y, 1 → (y + height); 3. −1 → z + near, 1 → (z + far). A visualisation of these relationships is shown in Fig. 5.38.
184
5 Geometry Processing
Fig. 5.38 Transformations from the normalised device coordinate system to the window space coordinate system (z-component neglected)
The mapping from the NDC to the DC is a linear mapping 1. from interval [−1, 1] to interval [x, x + width], 2. from interval [−1, 1] to interval [y, y + height] and 3. from interval [−1, 1] to interval [z + near, z + far] with (x, y) coordinate origin, window width and window height and is calculated as follows: 1. a1 · (−1) + b1 = x, a1 · 1 + b1 = x + width; 2. a2 · (−1) + b2 = y, a2 · 1 + b2 = y + height; 3. a3 · (−1) + b3 = z + near, a3 · 1 + b3 = z + far. For the first two equations applies a1 · (−1) + b1 = x ⇔ b1 = x + a1 and a1 + b1 = x + width ⇔ a1 + x + a1 = x + width ⇔ a1 =
width 2
and thus
width width + b1 = x ⇔ b1 = x + . 2 2 For the next two equations follows analogously −
height height und b2 = y + . 2 2 The last two equations remain to be examined. It is necessary that a2 =
a3 · (−1) + b3 = z + near ⇔ b3 = z + near + a3 and therefore a3 · 1 + b3 = z + far ⇔ a3 = z + far − z − near − a3 ⇔ 2 · a3 = far − near ⇔ a3 =
far − near . 2
5.10 Viewing Pipeline: Coordinate System Change of the Graphical Pipeline
185
In addition,
far − near far + near =z+ . 2 2 This results in the total mapping (in Cartesian coordinates) ⎞ ⎛ width ⎞ ⎛ width 2 · x c + (x + 2 ) xwindow ⎟ height height ⎝ ywindow ⎠ = ⎜ ⎝ 2 · yc + (y + 2 ) ⎠ z window far−near · z + (z + far+near ) b3 = z + near +
2
c
(5.19)
2
and in the homogeneous representation by means of matrix multiplication ⎛
⎞
⎛ width 2
xwindow ⎜ ⎜ ⎜ ywindow ⎟ ⎜ ⎟ = ⎜ 0 ⎜ ⎝ z window ⎠ ⎝ 0 1 0
⎞
width ⎛ ⎞ 2 ) xc height height ⎟ ⎜ yc ⎟ 0 (y + 2 ) ⎟ 2 ⎟ · ⎜ ⎟. ⎟ (z + far+near ) ⎠ ⎝ zc ⎠ 0 far−near 2 2
0
0
0
0
(x +
1
(5.20)
1
Calculating the size of the screen with any coordinate origin with positive xand y-coordinates seems obvious, but the question arises why the z-coordinates are transformed again. Of course, these remain between −1 and 1, but due to the zfighting phenomenon, they are transferred to a larger range, especially if the storage
Fig. 5.39 Transformations from the window space coordinate system with y-axis up to the coordinate system with y-axis down [4]
186
5 Geometry Processing
possibility only allows a certain degree of accuracy. The depth buffer in typical OpenGL implementations (depending on the graphics processor), for example, only allows certain accuracies, typically integer values of 16, 24 or 32 bits. This means that the numerical interval [−1, 1] is first mapped to [0, 1] and then, for example, with 16-bit accuracy, to the interval [0, 216 − 1] = [0, 65535]. z window is calculated with the matrix calculation rule by setting near = 0 and far = 65535. For many image formats, however, the coordinate origin, e.g., top left is expected, for which further transformations are necessary as a post-processing step. To transfer this output to the two-dimensional coordinate system, where the y-axis points down, the following steps must be carried out (see Fig. 5.39): 1. Move by −yheight . 2. Mirroring on the x-axis (i.e., multiplying the y-values by −1).
5.11 Transformations of the Normal Vectors For the calculations of the illumination (see Chap. 9), the normals of the corner points are needed. Since the corner points are transformed in the viewing pipeline, these transformations must also be applied to the normal vectors. In the fixed-function pipeline, this adjustment is automated, but in the programmable pipeline, it is left to the programmer. One could assume that it is sufficient to multiply each normal vector by the total matrix of the transformations of the viewing pipeline. The following consideration refutes this hypothesis. A normal vector is strictly speaking a (directional) vector that only needs to be rotated during the transformation. If the normal vectors are present and if they are orthogonal to the corresponding vertices, it would be desirable that this behaviour is also present after a transformation. In this case, the transformed normal vectors must be orthogonal to the corresponding vertices again. Therefore, for a normal vector nT , and for the total matrix M of the transformation in the viewing pipeline, the following applies: nT · ((x, y, z)T − v) = 0 ⇔ nT · r = 0 ⇔ nT · M −1 · M · r = 0 ⇔ ((M −1 )T · n)T · M · r = 0 ⇔ n¯ T · r = 0. Thus, for the transformed normal vector n¯ T = ((M −1 )T ·n)T and for the tangential vector (direction vector of the plane) r = M · r = M · ((x, y, z)T − v), with a vector v of a point of the spanned plane. Obviously, tangents are transformed with the total matrix M of the viewing pipeline, like the vertices. The normal vectors in this respect must be transformed with the transposed inverse (M −1 )T in order to remain orthogonal to the object. Note the special case M is orthonormal, in which case (M −1 )T is equal to M; therefore, a standard matrix inversion is no longer necessary in this case.
5.12 Transformations of the Viewing Pipeline in the OpenGL
187
5.12 Transformations of the Viewing Pipeline in the OpenGL The 4 × 4 transformation matrices for homogeneous coordinates are represented in the OpenGL by a 16-digit array of type float. The matrix fields are stored column by column. In the following, the fixed-function pipeline is considered first (see Figs. 5.40 and 5.41). The GL_MODELVIEW Matrix realises that the transformations of objects by the corresponding transformation matrices are multiplied in reverse order and stored as an overall matrix in GL_MODELVIEW. The GL_MODELVIEW matrix is activated via the command glMatrixMode(GLMODELVIEW) and, among other things, is using the functions glLoadldentity, glLoadMatrix, glMultMatrix, glPushMatrix, glPopMatrix, glRotate, glScale and glTranslate. Since OpenGL works like a state machine, these transformations of an object are only updated when drawing. First, the GL_MODELVIEW matrix is overwritten with the 4 × 4 identity matrix. The function glLoadIdentity does this. Then the necessary transformations with the multiplication of their matrices are applied in reverse order. Assume that first a rotation and then a transformation takes place, then the functions glRotate and glTranslate are called with the corresponding parameters in reverse order. The functions glMultMatrix, glRotate, glScale
Fig. 5.40 The transformations in the context of vertex processing in the OpenGL
188
5 Geometry Processing
Fig. 5.41 Transformations of the viewing pipeline in the fixed-function pipeline of OpenGL
and glTranslate each generate a 4 × 4 matrix, which is multiplied from the right to the GL_MODELVIEW matrix, also a 4 × 4 matrix. Afterwards, the transformation into the camera coordinate system takes place, which is considered in Sect. 5.8. The corresponding 4 × 4 matrix is then also connected from the right to the Matrix GL_MODELVIEW multiplied. Thus, the resulting GL_MODELVIEW consists of the multiplication of a total matrix of geometric transformations, Mmodel , with the matrix resulting from the transformation into the camera coordinate system, Mview , in the following way. The resulting total matrix is applied to each vertex point in the form of parallel processing. In OpenGL, all transformations from the model coordinate system to the camera coordinate system are summarised in the GL_MODELVIEW result matrix. For projection from the camera coordinate system into the normalised device coordinate system, the GL_PROJECTION matrix with glMatrixMode(GL_PROJECTION) is activated. All known matrix operations and additionally gluPerspective can be executed on the GL_PROJECTION matrix. Finally, glViewport is used to transform the result to the mobile device. With the programmable pipeline (core profiles), the programmer has to implement most of the functions previously explained in the fixed-function pipeline in the vertex
5.13 Exercises
189
shader himself (except perspective division and viewport transformation), which in turn offers complete design freedom. It should be mentioned that the class PMVTool provides the corresponding functions and can be used well with the core profile. However, all calculations are performed on the CPU. In Chap. 2, this programmable pipeline is described. Since OpenGL is a state machine, the current states of the matrices only influence the corresponding transformations of the objects when drawing. Corresponding source code snippets are available in Sect. 2.5 with the comment “display code” seen in the marked areas.
5.13 Exercises Exercise 5.1 Let there be a two-dimensional sun–earth orbit. The sun is at the centre (i.e. its centre is at the coordinate origin). The earth is visualised as a sphere with a radius of 10 length units. The earth (i.e., its centre) moves evenly on its circular orbit counter-clockwise around the sun. The radius of the orbit is 200 units of length. The initial position of the earth (i.e., its centre) is (200, 0). During the orbit of the sun, the earth rotates 365 times evenly around itself (counter-clockwise). Let P be the point on the earth’s surface that is the least distance from the sun in the initial position. Using geometric transformations, find out where P is set if the earth has completed one-third of its orbit. Proceed as follows. (a) Consider which elementary transformations are necessary to solve the task and arrange them in the correct order. (b) Specify the calculation rule to transform the point P using the transformations in (b). Make sure that the multiplications are in the correct order. (c) Calculate the matrix for the total transformation in homogeneous coordinates and determine the coordinates of the transformed point P. Exercise 5.2 Select the constant c in the matrix ⎞ ⎛ c06 ⎝0 c 4⎠ 00c so that the matrix in homogeneous coordinates represents a translation around the vector (3, 2) . Exercise 5.3 Program a small animation showing the movement from task Exercise 5.1.
190
5 Geometry Processing
Exercise 5.4 Program a small animation in which a beating heart moves across a screen window. Exercise 5.5 Apply the technique for converting two letters shown in Fig. 5.10 for D and C to two other letters, e.g., your initials. Exercise 5.6 The following chair is to be positioned on the x z-plane centrally above the coordinate origin so that it stands on the x z-plane. The backrest should be oriented to the rear (direction of the negative z-axis). The chair has the following dimensions: • The four chair legs have a height of 1.0 with a square base of side length 0.1. • The square seat has the side length 0.8 and the thickness 0.2. • The cylindrical backrest has the radius 0.4 and the thickness 0.2. Construct the chair using the elementary objects box(x, y, z), which creates a box of width 2x, height 2y and depth 2z with the centre at the coordinate origin, and cylinder(r, h), which creates a cylinder of radius r around the height h with the centre at the origin. (a) Draw a scene graph for the chair in which each node is assigned a suitable transformation. (b) Specify the geometric elements and geometric transformations of the scene graph by selecting appropriate types of geometric transformations and specifying precise parameter values for the geometric elements and transformations. (c) Write an OpenGL program to visualise this chair. Exercise 5.7 Write a program in which the individual parts of the chair shown in Fig. 5.14 are first positioned next to each other and the chair is then assembled as an animation. Exercise 5.8 The perspective projection on one plane is to be reduced to a parallel projection on the x y-plane. For this purpose, the projection plane is to be considered, which is defined by the normal vector √1 (1, 1, 1)T and the point P = (2, 4, 3), 3 which is defined in the plane. The projection centre is located at the coordinate origin. Calculate suitable geometric transformations for the return to a parallel projection. Specify the projection as a series-connected execution of the determined transformations. Proceed as follows: (a) Specify the plane in the Hessian normal form. What is the distance from the origin of the plane? (b) Specify the projection reference point (PRP) and the view reference point (VRP). When calculating the VRP, keep in mind that the connection between the PRP and VRP is parallel to the normal vector of the plane. (c) Specify the transformation to move the VRP to the origin.
5.13 Exercises
191
(d) Specify the transformations in order to rotate the plane into the x y-plane based on the transformation from task part (c). (e) Specify the transformation in order to build on the transformations from task parts (c) and (d) and move the transformed projection centree to the z-axis (to the point (0, 0, −z 0 )) if necessary. Determine z 0 . (f) Specify the transformations that are necessary to transform the result of (e) into the standard projection, PRP in the coordinate origin and VRP on the negative z-axis. (g) Enter the calculation rule for determining the total transformation, which consists of the individual transformation steps from the previous task parts. Exercise 5.9 The following perspective projection is given from the camera’s point of view in the viewing pipeline. The Projection Reference Point is enclosed (0, 0, 24). The view reference point is at the coordinate origin. The projection surface is the x y-plane with the limitation −xmin = xmax = 3 and −ymin = ymax = 2. The near plane has the distance dmin = 10 and the far plane has the distance dmax = 50 to the eye point. Explain the necessary transformations until shortly before mapping to the device mapping coordinate system and specify the corresponding matrix multiplication. Calculate the total matrix. Exercise 5.10 In this task, the individual steps of the viewing pipeline from “modelling” to “device mapping” are to be traced. For the sake of simplicity, it is assumed that the scene, which consists of individual objects, has already been modelled. The following information is available in the coordinate system after the modelling is complete. The eyepoint (synthetic camera, projection reference point) is at (4, 10, 10)T , and the view reference point is at (0, 0, 0)T . The camera has a coordinate system which is defined by the vectors U T = (0.928, 0.0, −0.371)T , V T = (−0.253, 0.733, −0.632)T and N T = (0.272, 0.68, 0.68)T is clamped. The projection surface has the dimensions xmin = −xmax = −2.923 and ymin = −ymax = −2.193. Furthermore, let dmin = 0.1 and dmax = 100. The output device has a resolution of 640 × 480 pixels. (a) Enter the names of the involved steps of the viewing pipeline and coordinate systems to move the objects from the model coordinate system to the coordinates of the output device. (b) Specify the matrices that describe the steps of the viewing pipeline mentioned under task part (a). Assume here that the modelling of the scene has already been completed. (c) How is the matrix of the overall transformation calculated from the results of subtask (b) in order to transform the scene into the coordinates of the output device?
192
5 Geometry Processing
Exercise 5.11 In the fixed-function pipeline of OpenGL, the matrix of the transfer to the camera coordinate system VRC is created with the call v o i d g l u L o o k A t ( G L d o u b l e eyeX , G L d o u b l e eyeY , G L d o u b l e eyeZ , GLdouble centerX , GLdouble centerY , G L d o u b l e c e n t e r Z , G L d o u b l e upX , G L d o u b l e upY , G L d o u b l e upZ ) ;
The subsequent perspective projection is created with the call void g l u P e r s p e c t i v e ( GLdouble fovy , GLdouble a s p e c t , G L d o u b l e zNEAR , G L d o u b l e zFAR ) ;
In this task, it shall be understood how the steps of the viewing pipeline can be calculated from these specifications alone. For the sake of simplicity, it is assumed that the scene, which consists of individual objects, is already modelled. The following information is available in the coordinate system after the modelling is completed. The projection reference point (eyeX, eyeY, eyeZ)T = (0, 0, 0)T and the view reference point at (centerX, centerY, centerZ)T = (−4, −10, −10)T are given. The up-vector Vup is enclosed (upX, upY, upZ)T = (4, 11, 10)T . The angle of view (field of view, fovy) of the camera is 22.5◦ . Furthermore, z far = 100 (zFAR) and z near is identical to the distance of the image area from the eyepoint in the z-direction (zNEAR). The output device has a resolution of 640 × 480 pixels (aspect). Assume the symmetrical case. (a) Calculate the base vectors U, V and N of the camera coordinate system VRC. (b) Calculate z near , xleft , xright , ybottom and ytop of the image area of the camera coordinate system PRC. The results can be used to determine the steps of the viewing pipeline as in task Exercise 5.10.
References 1. E. Lengyel. Mathematics. Vol. 1. Foundations of Game Engine Development. Lincoln: Terathon Software LLC, 2016. 2. E. Lengyel. Mathematics for 3D Game Programming and Computer Graphics. Boston: Course Technology, 2012. 3. E. Lengyel. Rendering. Vol. 2. Foundations of Game Engine Development. Lincoln: Terathon Software LLC, 2019. 4. NASA. “Taken Under the ‘Wing’ of the Small Magellanic Cloud”. Abgerufen 08.02.2021, 19:21h. URL: https://www.nasa.gov/image-feature/taken-under-the-wing-of-thesmall-magellanic-cloud 5. A. Nischwitz, M. Fischer, P. Haberäcker and G. Socher. Computergrafik. 4. Auflage. Computergrafik und Bildverarbeitung. Wiesbaden: Springer Vieweg, 2019.
6
Greyscale and Colour Representation
This chapter contains the basics for the representation of computer graphics objects using grey values and colours. As an example of how grey values can be represented even though the output device can only produce completely black or completely white pixels, the halftone method is described. Furthermore, this chapter contains an overview of colour models that are used in computer graphics and related areas such as image processing and digital television. For professional applications, the exact reproducibility of colours on different output devices is required. For example, exactly the same colour should be the output on a printer as it appears on the monitor. For this purpose, the so-called calibrated colour spaces, also called colorimetric colour spaces, can be used. Furthermore, the basic colour models used in the OpenGL are presented and the basic principles of colour interpolation are explained.
6.1 Greyscale Representation and Intensities In black-and-white images, the individual pixels of an image can not only be black or white, but can also take on grey values of different intensity. Such images are colloquially called black-and-white images. In the context of computer graphics, the term greyscale image is more appropriate for such images in order to distinguish these images from binary images, that only contain black or white pixels and no grey value. In colour images, analogously, the colour components of a coloured pixel are not either set or not set, but can also take on graded values. Therefore, the following explanations of this section on greyscale representations can also be applied to colour images. When indicating the brightness of an LED light source, the equivalent power in watts is often given, which is what a light bulb with a conventional filament needs to shine as brightly as the LED light source. Since human perception of brightness is relatively oriented, a 60-watt bulb appears much brighter relative to a 20-watt bulb than a 100-watt bulb relative to a 60-watt bulb, even though the difference is 40 watts in both cases. The 60-watt bulb appears about three times as bright as the 20-watt © Springer Nature Switzerland AG 2023 K. Lehn et al., Introduction to Computer Graphics, Undergraduate Topics in Computer Science, https://doi.org/10.1007/978-3-031-28135-8_6
193
194
6 Greyscale and Colour Representation
bulb (60/20 = 3), while the 100-watt bulb appears less than twice as bright as the 60-watt bulb (100/60 ≈ 1.6). For the greyscale or colour representation in images, there are usually only a finite number of intensity levels available. These intensity levels should be scaled logarithmically rather than linearly according to the human perception of brightness. Starting from the lowest intensity (which is black for grey values) I0 , the intensity levels should be of the form I0 = I0 ,
I1 = r I0 ,
I2 = r I1 = r 2 I0 ,
...
up to a maximum intensity In = 0 with a constant r > 1. Humans can generally only distinguish between adjacent grey values when r > 1.01, i.e., when the difference is at least 1% (see [6]). Assuming a maximum intensity In = 1 and a minimum intensity I0 depending on the output device, it follows from 1.01n I0 ≤ 1 that ln I10 n ≤ ln(1.01) rn I
grey levels are sufficient for resolution on the corresponding output device. A finer representation would not make any difference to human visual perception. Based on these considerations, Table 6.1 shows the maximum useful number of intensity levels for different output media. If an image with multiple intensity levels is to be displayed on an output medium that has only binary pixel values (values can really only be black or white), for example, a black-and-white laser printer, an approximation of the different intensity levels can be achieved by reducing the resolution. This technique is called halftoning, which combines binary pixels to larger pixels in a pixel matrix. For instance, if 2×2 pixels are combined into one larger pixel, five intensity levels can be displayed. For 3×3 pixels ten and for n×n-pixels n 2 + 1 intensity levels are possible. The reduction of the resolution may only be carried out to such an extent that the visual system does not perceive the visible raster as disturbing from the corresponding viewing distance. The individual pixels to be placed in the n×n large pixel should be chosen as adjacent as possible and should not lie on a common straight line. Otherwise, striped patterns may be visible instead of a grey area. Figure 6.1 shows in the first line a representation of the five possible grey levels by a matrix of 2×2 pixels and below the representation of ten possible grey levels based on a 3×3 pixel matrix.
Table 6.1 Intensity levels for different output media (according to [1]) Medium
I0 (ca.)
Max. number of grey levels
Monitor
0.005–0.025
372–533
Newspaper
0.1
232
Photo
0.01
464
Slide
0.001
695
6.1 Greyscale Representation and Intensities
195
Fig. 6.1 Grey level representation based on halftoning for a 2×2 pixel matrix (top line) and a 3×3 pixel matrix
For the definition of these pixel matrices, dither matrices are suitable. These matrices define which pixels must be set to represent an intensity value by the n×n large pixel. The five pixel matrices of the first row in Fig. 6.1 are encoded by the dither matrix D2 . The ten pixel matrices for the 3×3 grey level representation are encoded by the dither matrix D3 as follows: ⎛ ⎞ 684 02 D2 = , D3 = ⎝ 1 0 3 ⎠ 31 527 Halftoning can also be applied to non-binary intensity levels in order to refine the intensity levels further. For example, using 2×2 matrices with four different intensity levels for each pixel results in 13 intensity levels. This can be represented by the following dither matrices:
196
6 Greyscale and Colour Representation
00 10 10 11 11 , , , , , 00 00 01 01 11
21 21 22 22 32 , , , , , 11 12 12 22 22
32 33 33 , , . 23 23 33
In this encoding, for example, the first matrix in the second row represents the fifth intensity level (counting starts at 0), which is obtained by colouring one of the four finer pixels with grey level 2 and the other three with grey level 1. As mentioned at the beginning of this section, the halftoning method can also be applied to colour images.
6.2 Colour Models and Colour Spaces The human visual system can perceive wavelengths in a range with a lower limit of approximately 300–400 nm (violet) and an upper limit of approximately 700–800 nm (red). Theoretically, a colour is described by the distribution of the intensities of the wavelengths in the frequency spectrum. The human eye has three types of receptors called cones for colour perception. The sensitivity of the cones is specialised to different wavelength ranges. The L-type receptor is sensitive to long wavelengths, covering the red portion of the visible spectrum. The M-type and S-type receptors detect light in the middle and short wavelength ranges of the visible light spectrum, respectively. This makes them sensitive to the colours green and blue. We usually speak of red, green and blue receptors. Blue receptors have a much lower sensitivity than the other two receptors. The three receptor types mentioned could be detected in the human eye through physiological research, thus confirming the previously established trichromatic theory according to Young and Helmholtz (see [4, p. 207]). The following characteristics are essential for the human intuitive perception of colour: • The hue determined by the dominant wavelength. • The saturation or colour purity, which is very high when the frequency spectrum is very focused on the dominant wavelength. When there is a wide spread of frequencies, the saturation is low. • The intensity or lightness, which depends on the energy with which the frequency components are present. Figure 6.2 shows the distribution of the energies for the wavelengths at high and low saturation. Since the wavelength λ is inversely related to the frequency f via
6.2 Colour Models and Colour Spaces
197
Fig. 6.2 Distribution of the energies over the wavelength range at high (top) and low (bottom) saturation
the speed of light c1 and the relationship c = λ · f , the frequency spectrum can be derived from the wavelength spectrum. The perceived intensity is approximately determined by the mean height of the spectrum. Based on physiological and psychophysical findings, various colour models have been developed. A distinction is made between additive and subtractive colour models. In additive colour models, the colour is determined in the same way as when different coloured light is superimposed. On a black or dark background, light of different colours is mixed additively. The addition of all colours results in a white colour. Examples of this are computer monitors or projectors. Subtractive colour models require a white background onto which pigment colours are applied. The mixture of all colours results in a black colour. Such subtractive colour models are used for colour printers for example. The most commonly used colour model in computer graphics is the RGB colour model. Most computer monitors and projectors work with this model. Each colour is composed additively of the three primary colours red, green and blue. Thus, a colour can be uniquely defined by three values r, g, b ∈ [0, 1]. If 0 is the minimum and 1 the maximum intensity, the intensity vector (0, 0, 0) corresponds to black and (1, 1, 1) to white. The intensity vector (x, x, x) with x ∈ [0, 1] corresponds to shades of grey, depending on the choice of x. The intensity vectors (1, 0, 0), (0, 1, 0) and (0, 0, 1) correspond to red, green and blue, respectively. In computer graphics, one byte is usually used to encode one of the three colours, resulting in 256 possible intensity levels for each of the three primary colours. Therefore, instead of using three floating point values between 0 and 1, three integer values between 0 and 255
1 The
speed of light in vacuum is clight = 299.792.458 m/s.
198
6 Greyscale and Colour Representation
Fig. 6.3 RGB and the CMY colour model
are often used to specify a colour. This encoding of a colour by three bytes results in the so-called 24-bit colour depth, which is referred to as true colour and is widely used for computer and smartphone displays. Higher colour depths of 30, 36, 42 or 48 bits per pixel are common for high quality output or recording devices. The subtractive CMY colour model is the complementary model to the RGB colour model and is used for printers and plotters. The primary colours are cyan, magenta and yellow. The conversion of an RGB colour to its CMY representation is given by the following simple formula: ⎛ ⎞ ⎛ ⎞ C R ⎝M ⎠ = 1 − ⎝G⎠ Y B Figure 6.3 shows a colour cube for the RGB and the CMY colour model. The corners of the cube are labelled with their corresponding colours. In the RGB colour model, the origin of the coordinate system is in the back, lower left corner representing the colour black. In the CMY model, the origin is located in the front, upper right corner representing the colour white. The grey tones are located on the diagonal line between the black-and-white corners. In practice, printers do not exclusively use the colours cyan, magenta and yellow from the CMY model, but instead use a four-colour printing process corresponding to the CMYK colour model with the fourth colour black2 (K). In this way, the colour black can be better reproduced than by mixing the three other colours. The colour values of the CMY colour model to the CMYK colour model can be converted according to the following equations: K C M Y
:= min{C, M, Y } := C − K := M − K := Y − K .
2 Since the letter B (black) is already assigned to blue in the RGB colour model, K (key) is used for black as an abbreviation for key colour.
6.2 Colour Models and Colour Spaces
199
Based on these equations, at least one of the four values C, Y, M, K is always equal to zero. The YIQ colour model, YUV colour model and the Y Cb Cr colour model originate from the field of television signal processing and transmission. In these three colour models, the luminance Y , which is a measure of lightness, is separated from two measures of chrominance (colourfulness), which are used to represent colour. This provides backward compatibility with the older black-and-white (greyscale) television systems. Furthermore, the colour signals can be transmitted with a lower bandwidth than the luminance signal, as the human visual system is less tolerant of blurring in the luminance signal than of blurring in the colour signals. A lower bandwidth means that a data type with a reduced number of bits can be used for the digital encoding of such signals, so that less data needs to be transmitted. This property of the human visual system is also used for the compression of digital images, for example, the JPEG compression method, which provides the conversion of RGB images into the YCb Cr colour model. The YIQ colour model is a variant of the YUV colour model and was originally intended for the American NTSC television standard for analogue television, but was replaced early on by the YUV colour model. The YUV colour model is thus the basis for colour coding in the PAL and NTCS television standards for analogue television. The YCb Cr colour model is a variant of the YUV colour model and is part of the international standard for digital television [2, p. 317–320]. The conversion from the RGB colour model to the YCb Cr colour model is given by the following equation (see [2, p. 319]): ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 0.299 0.587 0.114 R Y ⎝ Cb ⎠ = ⎝ −0.169 −0.331 0.500 ⎠ · ⎝ G ⎠ . 0.500 −0.419 −0.081 B Cr The luminance value Y represents the greyscale intensity of the image. This is the luminance value that a greyscale image must have in order to produce the same impression of lightness as the original colour image. From the top row of the matrix in the equation, the weighting factors for the RGB colour components can be read to determine Y . These values are based on the ITU recommendation BT.601.3 According to the ITU Recommendation for high-definition television (HDTV) BT.7094 the weighting factors for the luminance calculation written as line vector are (0.2126, 0.7152, 0.0722). The green colour component has the highest influence and the blue colour component the least influence on the luminance, which is consistent with the findings of physiological research on the sensitivity of the colour receptors of the human eye (see above). For example, the Y value can be used to adjust the brightness of different monitors very well or to calculate a greyscale image with the same brightness distribution as in the colour image. Attempting to
3 International Telecommunication Union. ITU Recommendation BT.601-7 (03/2011). https://www.itu.int/rec/R-REC-BT.601-7-201103-I/en, retrieved 22.4.2019, 21:50h. 4 International Telecommunication Union. ITU recommendation BT.709-6 (06/2015). https://www.itu.int/rec/R-REC-BT.709-6-201506-I/en, retrieved 22.4.2019, 21:45h.
200
6 Greyscale and Colour Representation
Fig. 6.4 The HSV colour model
individually adjust the RGB values to change the brightness easily leads to colour distortions. Similar to the colour models from the television sector, the HSV colour model is also not based on primary colours, but instead on the three parameters hue, saturation and value (intensity). Since these parameters correspond to the intuitive perception of colour (see above), they can be used to make a simple selection of a colour, for example, in a raster graphics editing programme. The HSV model can be represented in the form of an upside-down pyramid (see Fig. 6.4). The tip of the pyramid corresponds to the colour black. The hue H is specified by the angle around the vertical axis. The saturation S of a colour is zero on the central axis of the pyramid (V -axis) and is one at the sides. The value V encodes lightness and increases from the bottom to the top. The higher V is, the lighter the colour. Strictly speaking, the HSV colour model leads to a cylindrical geometry. Since the colour black is intuitively always considered to be completely unsaturated, the cylindrical surface representing black is often unified into one point. This results in the pyramid shown in the figure. The HLS colour model is based on a similar principle to the HSV colour model. The hue is defined by an angle between 0◦ and 360◦ around the vertical axis with red (0◦ ), yellow (60◦ ), green (120◦ ), blue (240◦ ) and purple (300◦ ). A value between zero and one defines the lightness. Saturation results from the distance of a colour from the central axis (grey value axis) and is also characterised by a value between zero and one. Figure 6.5 shows the HLS colour model. Sometimes an HLS colour model is also used in the form of a double cone, as shown on the right in the figure, instead of a cylinder. The double cone reflects the fact that it does not make sense to speak of saturation for the colours black and white, when grey values are already characterised by the lightness value. The equations for converting from the HSV and HLS colour model to the RGB colour model and vice versa can be found in [3] or [2, p. 306–317]. Setting a desired colour or interpreting a combination of values is very difficult with colour models such as RGB or CMY, at least when all three colour components interact. Therefore, colour tables or colour calculators exist from which assignments of a colour to the
6.2 Colour Models and Colour Spaces
201
Fig. 6.5 The HLS colur model
RGB values can be taken. Colour adjustment using the HSV and HLS colour models is much easier, as these colour models are relatively close to an intuitive colour perception. For this reason, HSV and HLS are referred to as perception-oriented colour models. A further simplification of the colour selection can be achieved by using colour palettes. This involves first selecting a set of colours using an appropriate colour model and then only using colours from this palette. The colour naming system (CNS) is another perception-oriented colour system that, like the HSV and HLS colour models, specifies a colour based on hue, saturation and lightness. For this purpose, the CNS uses linguistic expressions (words) instead of numbers. For the colour type, the values purple, red, orange, brown, yellow, green, blue with further subdivisions yellowish green, green-yellow or greenish yellow are available. The lightness can be defined by the values very dark, dark, medium, light and very light. The saturation is defined by greyish, moderate, strong and vivid. Even though this greatly limits the number of possible colours, the description of a colour is possible in a very intuitive way. The reproduction of colours depend on the characteristics of output devices, such as the LCD display of monitors or printing process of colour printers. Since the colours used in these devices have a very large variance, it takes some effort, for example, to reproduce the exact same colour on a printout that was previously displayed on the monitor. Colour deviations exist not only across device boundaries, but also within device classes (for example, for different monitors). These deviations are caused, for example, by the use of different reproduction processes (e.g., laser printing or inkjet printing) and different components (e.g., printer ink of different chemical composition).
202
6 Greyscale and Colour Representation
Colour models, as presented in this chapter so far, can be seen as only a definition of how colours can be represented by numbers, mainly in the form of triples or quadruples of components. These components could take—in principal—arbitrary values. In order to accurately reproduce colours on a range of output devices, a defined colour space is needed. The so-called colorimetric colour spaces or calibrated colour spaces are required for such an exactly reproducible and device-independent representation of colours. The basis for almost all colorimetric colour spaces used today is the CIEXYZ colour space standardised by the Commission Internationale de l’Éclairage (CIE, engl. International Commission on Illumination). This colour space was developed by measurements with defined test subjects (standard observers) under strictly defined conditions. It is based on the three artificial colours X , Y and Z , with which all perceptible colours can be represented additively. Values from common colour models, such as the RGB colour model, can be mapped to the CIEXYZ colour space by a linear transformation to calibrate the colour space [2, p. 341–342]. Important for the precise measurement and reproduction of colour within the physical reality is the illumination. Therfore, the CIEXYZ colour space uses specified standard illuminants. [2, p. 344–345]. Since the CIEXYZ colour space encompasses all perceptible colours, it can be used to represent all colours encompassed by another colour space. It can also be used to describe what colour space certain output devices such as monitors or printers encompass. The set of reproducible colours of a device or particular colour space is called gamut or colour gamut. The gamut is often plotted on the CIE chromaticity diagram, which can be derived from the CIEXYZ colour space. In this two-dimensional diagram, the range of perceptible colours has a horseshoe-shaped form. [2, p. 345] The main disadvantage of the CIEXYZ colour space is the nonlinear mapping between human perception and the colour distances in the model. Due to this disadvantage, the CIELab colour space was developed, which is also referred to as CIE L*a*b* colour space. In this colour space, the distances between colours correspond to the colour spacing according to human perception. [2, p. 346–348] For computer graphics, the direct use of the CIEXYZ colour space is too unwieldy and inefficient, especially for real-time representations. Therefore, the standard RGB colour space (sRGB colour space) is used, which was developed for computer-based and display-oriented applications. The sRGB colour space is a precisely defined colour space, with a standardised mapping to the CIEXYZ colour space. The three primary colours, the white reference point, the ambient lighting conditions and the gamma values are specified for this purpose. The exact values for these parameters can be found, for example, in [2, p. 350–352]. For the calculation of the colour values in the sRGB colour space, a linear transformation of the values from the CIEXYZ colour space into the linear RGB colour space takes place first. This is followed by the application of a modified gamma correction, which introduces a nonlinearity. Let (rl , gl , bl ) be a colour tuple from the RGB colour model with rl , gl , bl ∈ [0, 1] after the linear transformation from the CIEYXZ colour space. The conversion of each of these three colour components cl is done by the function f s according to the
6.2 Colour Models and Colour Spaces
following equation (see [2, p. 352]): ⎧ 0 ⎪ ⎪ ⎪ ⎨ 12.92 · c l f s (cl ) = 0.41666 − 0.055 ⎪ 1.055 · c l ⎪ ⎪ ⎩ 1
203
if cl ≤ 0 if 0 < cl < 0.0031308 if 0.0031308 ≤ cl < 1 if 1 ≤ cl
(6.1)
Let the result of the application of the function f s be the colour tuple (rs , gs , bs ), whose colour components are restricted to the value range [0, 1] due to this equation. For the exponent (gamma value), the approximate value γ = 1/2.4 ≈ 0.4166 is used. For display on an output device and for storing in memory or in a file, these colour components are scaled to the interval [0, 255] and discretised to an eight bits representation each. The inverse transformation of the colour tuple (rs , gs , bs ) with the nonlinear colour components rs , gs , bs ∈ [0, 1] into linear RGB colour components is done for each of the three colour components cs by the function f sr according to the following equation (see [2, p. 353]): ⎧ 0 if cs ≤ 0 ⎪ ⎪ ⎪ ⎪ cs ⎨ if 0 < cs < 0.04045 12.92 2.4 f sr (cs ) = (6.2) cs +0.055 ⎪ ⎪ if 0.04045 ≤ c < 1 s ⎪ 1.055 ⎪ ⎩ 0 if 1 ≤ cs The values of the resulting colour tuple (rl , gl , bl ) (with the three linear colour components) are again limited to the value range [0, 1] and can be converted back into the CIEXYZ colour space by applying the inverse linear transformation. The gamma value for the function f s according to Eq. (6.1) is γ ≈ 1/2.4. For the inverse function f sr , according to Eq. (6.2), it is γ = 2.4. Since these are modified gamma corrections due to the linear ranges for small colour component values (see the second case in each of the above formulae), the effective gamma values under the hypothetical assumption of non-modified (pure) gamma correction are γ ≈ 1/2.2 and γ ≈ 2.2. The sRGB colour space was developed for CRT monitors so that the 8-bit values of the nonlinear colour components (rs , gs , bs ) can be output on these devices without further processing or measures. Even though these monitors no longer play a role today, many current devices on the mass market, such as LCD monitors, simulate this behaviour so that sRGB values can still be displayed without modifications. Since the sRGB colour space is very widespread in many areas, it can be assumed that practically every image file (for example, textures) with 8-bit values per colour channel contains colour values according to the sRGB colour space. The sRGB colour space is also the standard colour space in Java. Strictly speaking, for any calculations with these colour values, the individual colour components must be converted into linear RGB values using the function f sr according to Eq. (6.2). After the desired calculations, the resulting linear RGB values must be converted back into nonlinear sRGB colour values using the f s function according to Eq. (6.1). However, these steps are often not applied in practice. [2, p. 353]
204
6 Greyscale and Colour Representation
In contrast to the CIELab colour space, the sRGB colour space has a relatively small gamut, which is sufficient for many (computer graphics) applications, but can cause problems for printing applications. Therefore, the Adobe RGB colour space was developed, which has a much larger gamut than the sRGB colour space. [2, p. 354– 355] For detailed explanations, figures and transformation formulas for colorimetric colour spaces, please refer to [2, p. 338–365].
6.3 Colours in the OpenGL In the OpenGL, the RGB colour model (see Sect. 6.2) is used. The values of the individual components of a colour tuple are specified as floating point values from the interval from zero to one. Thus, for a colour tuple (r, g, b) r, g, b ∈ [0, 1] applies. As in many other technical applications, a fourth component, the so-called alpha value, is added in the OpenGL for the representation of transparency—or rather opacity (non-transparency). For a detailed description of the handling of transparency and translucency in connection with the illumination of scenes, see Sect. 9.7. The alpha component is also a floating point value within the interval [0, 1]. An alpha value of one represents the highest opacity and thus the least transparency. An object with such a colour is completely opaque. An alpha value of zero represents the lowest opacity and thus the highest transparency. An object with such a colour is completely transparent. Thus, the colour of a vertex, fragment or pixel including an opacity value (transparency value) can be expressed as a quadruple of the form (r, g, b, a) with r, g, b, a ∈ [0, 1], which is called RGBA colour value. In the core profile, the RGB values are mostly used for colour representation and the alpha value for the representation of opacity. But their use is not limited to this application. For example, in shaders, the RGBA values can also be used for any purpose other than colour representation. In the compatibility profile, the colour index mode is available. This makes it possible to define a limited number of colours that serve as a colour palette for the colours to be displayed. The colours are stored in a colour index table. A colour is accessed by the number of the entry (the index) of the table. This index can be used in the framebuffer and elsewhere instead of a colour with the full colour resolution (for example 24 bits). For a colour index table with 256 values, only an integer data type of eight bits is needed for an index. Such a value can, therefore, be stored in one byte. This type of representation allows for more efficient storage of image information than using full resolution colour values. This can be use, for example, for a smaller framebuffer, faster interchangeability of colour information and faster access to colours. However, these advantages are also countered by disadvantages, especially in the flexible use of colour information in shaders. For example, blending between colour indices is only defined meaningfully in exceptional cases (for example, with a suitable sorting of the colour values in the colour index table). Furthermore, in modern GPUs there is usually enough memory and computing power available so that it is possible to calculate with RGBA values of the full resolution per
6.4 Colour Interpolation
205
vertex, fragment or pixel without any problems. For these reasons, the colour index mode is not available in the core profile. In some window systems and in JOGL, such a mode does not exist either. In the OpenGL, the conversion to the nonlinear sRGB colour space is provided. The conversion of the colour components from the linear RGB colour components takes place according to formula (6.1), if this conversion has been activated as follows: gl.glEnable(GL.GL_FRAMEBUFFER_SRGB)
Deactivating the sRGB conversion can be done with the following instruction: gl.glDisable(GL.GL_FRAMEBUFFER_SRGB)
6.4 Colour Interpolation Most of the colour models, which are presented in this chapter, use three colour components to characterised a colour. For example, a colour in the RGB colour model corresponds to a vector (r, g, b) ∈ [0, 1]3 . This interpretation can be used to calculate convex combinations of colours. One possible application of such convex combinations are colour gradients. If a surface is not to be coloured homogeneously but with a continuous change form one colour to another colour, two colours (r0 , g0 , b0 ) and (r1 , g1 , b1 ) can be defined for the two points p0 and p1 . At point p0 , colour (r0 , g0 , b0 ) is used, at point p1 , colour (r1 , g1 , b1 ) is used. For points on the connecting line between the two points, the corresponding convex combination of the two colours is used to determine the respective colour. For the point p = (1−α) p1 +α p0 (with α ∈ [0, 1]), the colour (1 − α) · (r0 , g0 , b0 ) + α · (r1 , g1 , b1 ) is used. This corresponds to an interpolation of the colour values on the connecting line from the given colours at the end points p0 and p1 . The same colour gradient is used parallel to the connecting line. When filling surfaces with textures (see Chap. 10), colour interpolation can also be useful in some cases. If the texture has to be drawn several times horizontally or vertically to fill the area, visible borders usually appear at the edges between the individual texture tiles. A kind of tile pattern becomes visible. In digital image processing, smoothing operators [5] are used to make the edges appear less sharp. Simple smoothing operators can be characterised by a weight matrix that is used to modify the colour values of the pixels (or fragments). For example, the weight matrix means that the smoothed colour intensity of a pixel is the weighted sum of 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1
206
6 Greyscale and Colour Representation
its own intensity and the intensities of the neighbouring pixels. The smoothed colour intensities are calculated individually for each of the colours red, green and blue. In this weight matrix, the pixel’s own intensity is given a weight of 0.2 and each of its eight immediate neighbouring pixels is given a weight of 0.1. Depending on how strong the smoothing effect is to be, the weights can be changed. For example, all pixels are set to a value of 1/9 to obtain a strong smoothing effect. Furthermore, the weight matrix can be extended so that not only the immediate neighbours of the pixel are considered for smoothing, but also pixels from a larger region. To smooth the transitions between texture tiles, the smoothing must be done at the edges. For this purpose, the pixels on the right edge should be considered as left neighbours of the pixels at the left edge and vice versa. The upper neighbouring pixels of a pixel on the upper edge are found at the corresponding position on the lower edge. This application of image smoothing effectively is a low-pass filtering of an image (tile). Section 7.6.5 provides more details on this process and various weighting matrices for low-pass filtering in the context of antialiasing as part of the rasterisation stage of the graphics pipeline. In Sect. 5.1.4, some possible applications of interpolators to model continuous changes are presented. The interpolators considered there are all based on geometric transformations. One application for such animated graphics was the transformation of one geometric object into another one by convex combinations of transformations. In order to smoothly transform one arbitrary image into another, additional colour interpolators are necessary. The simplest way to transform one image into another image of the same format is the pixel-by-pixel convex combination of the intensity values for the colours red, green and blue. However, this only achieves a continuous blending of the two images. In this case, a new image appears while the old image is faded out. More realistic effects can be achieved, when also the geometric shapes in the two images are transformed properly into each other. In this case, geometric transformations are required in addition to colour interpolation. A common technique that does more than just blend the two images is based on a triangulation of the two images. A triangulation is a division using triangles. For this application, the two triangulations of the two images must use the same number of triangle vertices and the triangles must correspond to each other, i.e., if in the first image the points pi , p j and pk form a triangle, then the corresponding points in the second image must also represent a triangle within the triangulation. The coordinates of the corresponding points in the two images do not have to match. Each triangle of the triangulation describes a section of the image that has a counterpart in the corresponding triangle of the other image. Such corresponding sections may be different in size and shape. Figure 6.6 illustrates this situation. On the left are two faces that are to be transformed into each other. The right part of the figure shows compatible triangulations of the two images. For example, the respective upper triangle in the two triangulations stands for the forehead area, the lower triangle for the chin area. It should be noted that the number of points for the triangulations is identical for both images, but the points do not have the same positions.
6.4 Colour Interpolation
207
Fig. 6.6 Example of compatible triangulations of two images
In order to transform one image into the other, step by step, a triangulation is first determined for each intermediate image. The points of the triangulation are determined as convex combinations of the corresponding points in the two images. The triangles are formed according to the same scheme as in the other two images. If the points pi , p j and pk form a triangle in the first image and the associated points pi , pj and pk form a triangle in the second image, then the points (1 − α) pl + α pl
l ∈ i, j, k
form a triangle in the intermediate image. This means that if three points in the first image define a triangle, the triangle of the corresponding points belongs to the triangulation in the second image. The convex combination of the corresponding pairs of points describes a triangle of the triangulation of the intermediate image. Within each triangle, the pixel colours are determined by colour interpolation. To colour a pixel in the intermediate image, it must first be determined to which triangle it belongs. If it belongs to several triangles, i.e., if it lies on an edge or a vertex of a triangle, then any of the adjacent triangles can be chosen. It is necessary to find out whether a pixel q belongs to a triangle defined by the points p1 , p2 , p3 . Unless the triangle is degenerated into a line or a point, there is exactly one representation of q in the form (6.3) q = α1 · p1 + α2 · p2 + α3 · p3 where α1 + α2 + α3 = 1.
(6.4)
This is a system of linear equations with three equations and three variables α1 , α2 , α3 . The vector equation (6.3) gives two equations, one for the x-coordinate and one for the y-coordinate. The third equation is the constraint (6.4). The point q lies inside the triangle spanned by the points p1 , p2 , p3 if and only if 0 ≤ α1 , α2 , α3 ≤ 1 holds, that is, if q can be represented as a convex combination of p1 , p2 , p3 . After the triangle in which the pixel to be transformed lies and the corresponding values α1 , α2 , α3 have been determined, the colour of the pixel is determined as a convex combination of the colours of the corresponding pixels in the two images to be transformed into each other. The triangle in the intermediate image in which the pixel
208
6 Greyscale and Colour Representation
Fig. 6.7 Computation of the interpolated colour of a pixel
is located corresponds to a triangle in each of the two images to be transformed into each other. For each of these two triangles, the convex combination of its vertices with weights α1 , α2 , α3 specifies the point corresponding to the considered pixel in the intermediate image. Rounding might be required to obtain a pixel from the point coordinates. The colour of the pixel in the intermediate image is a convex combination of the colours of the two pixels in the two images to be transformed into each other. Figure 6.7 illustrates this procedure. The triangle in the intermediate image in which the pixel to be coloured is located is shown in the centre. On the left and right are the corresponding triangles in the two images to be transformed into each other. The three pixels have the same representation as a convex combination with regard to the corner points in the respective triangle. The procedure described in this section is an example of the use of barycentric coordinates for the interpolation of colour values for points within a triangle. Values for the tuple (α1 , α2 , α3 ) represent the barycentric coordinates of these points. Colour values are a type of the so-called associated data for vertices and fragments in terms of the computer graphics pipeline. Vertices or fragments can, for example, be assigned colour values, texture coordinates, normal vectors or fog coordinates. The interpolation of the associated data for the individual fragments within a triangle, i.e., the filling of a triangle on the basis of the data at the three corner points stored in the vertices, is an important part of the rasterisation of triangles. Section 7.5.4 describes barycentric coordinates and their use for interpolating associated data for fragments within a triangle.
6.5 Exercises Exercise 6.1 Let the colour value (0.6, 0.4, 0.2) in the RGB colour model be given. Determine the colour value in the CMY colour model and in the CMYK colour model. Calculate the colour value for the same colour for the representation in the YCb Cr colour model.
6.5 Exercises
209
Exercise 6.2 Research the formulas for calculating the Hue, Saturation and Value components for the HSV colour model based on RGB values. Exercise 6.3 Given are the following colour values in the RGB colour model: • • • • • • •
(0.8, 0.05, 0.05) (0.05, 0.8, 0.05) (0.05, 0.05, 0.8) (0.4, 0.4, 0.1) (0.1, 0.4, 0.4) (0.4, 0.1, 0.4) (0.3, 0.3, 0.3).
Display these colours in an OpenGL program, for example, by filling a triangle completely with one of the colour values. Then, for each of these colour tuples, specify a linguistic expression (name of the colour) that best describes the colour. Convert the individual colour values into the HSV colour model. What do you notice about the HSV values? Exercise 6.4 Draw the curves from the Eqs. (6.1) and (6.2) for the conversion between RGB and sRGB colour values. Note that the domain and codomain of the functions both are from the interval [0, 1]. Exercise 6.5 Given are the following colour values in the RGB colour model: • (0.8, 0.05, 0.05) • (0.4, 0.1, 0.4) • (0.3, 0.3, 0.3). Assume these are linear RGB colour components. Convert these individual colour values into sRGB values using the equations from this chapter. Using an OpenGL program, draw three adjacent triangles, each filled with one of the specified linear RGB colour values. Extend your OpenGL program by three adjacent triangles, each of which is filled with one of the calculated sRGB colour values. Extend your OpenGL program by three more adjacent triangles, each of which is filled with one of the sRGB colour values that are determined by the OpenGL internal functions. Check whether these sRGB colour values calculated by the OpenGL match the sRGB colour values you calculated. Exercise 6.6 A circular ring is to be filled with a colour gradient that is parallel to the circular arc. Write the equation of the circle in parametric form and use this representation to calculate an interpolation function of the colours depending on the angle α. Assume that two colours are given at αs = 0◦ and αe = 180◦ , between which interpolation is to take place.
210
6 Greyscale and Colour Representation
Exercise 6.7 Let there be a circular ring whose outer arc corresponds to a circle of radius one. Let the inner arc be parallel to this circle. Fill this circular ring with a colour gradient, where at 0 degrees the colour X is given and at 270 degrees the (other) colour Y . A (linear) colour interpolation is to take place between the colours X and Y . (a) Sketch the shape described. (b) Give a formula to calculate the interpolated colour FI for each pixel (x, y) within the gradient depending on the angle α. (c) Then implement this technique of circular colour interpolation in an OpenGL program.
References 1. H.-J. Bungartz, M. Griebel and C. Zenger. Einf ührung in die Computergraphik. 2. Auflage. Wiesbaden: Vieweg, 2002. 2. W. Burger and M. J. Burge. Digital Image Processing: An Algorithmic Introduction Using Java. 2nd edition. London: Springer, 2016. 3. J. D. Foley, A. van Dam, S. K. Feiner and J. F. Hughes. Computer Graphics: Principles and Practice. 2nd edition. Boston: Addison Wesley, 1996. 4. E. B. Goldstein. Sensation and Perception. 8th edition. Belmont, CA: Wadsworth, 2010. 5. S. E. Umbaugh. Computer Imaging: Digital Image Analysis and Processing. Boca Raton: CRC Press, 2005. 6. G. Wyszecki and W. Stiles. Color Science: Concepts and Methods, Quantitative Data and Formulae. 2nd edition. New York: Wiley, 1982.
7
Rasterisation
Chapters 3 and 4 explain the computer graphical representations of basic geometric objects, and Chap. 5 explains transformations for these objects. The descriptions of these objects are based on the principles of vector graphics. This allows a lossless and efficient transformation of objects and scenes of objects. However, for display on common output devices such as computer monitors or smartphone devices, the conversion of this vector-based representation into a raster graphic is required. This conversion process is referred to as rasterisation. This chapter compares the advantages and disadvantages of these representation types. It also explains the basic problems and important solutions that arise with rasterisation. Traditionally, a line is an important geometric primitive of computer graphics. Lines can be used, for example, to represent wireframe models or the edges of polygons. Therefore, algorithms for efficient rasterisation were developed early on, some of which are presented in this chapter. Meanwhile, planar triangles are the most important geometric primitive in computer graphics. They are very commonly used to approximate surfaces of objects (with arbitrary accuracy). Not only more complex polygons but also lines and points can be constructed from triangles. Therefore, algorithms for rasterising and filling polygons, with a focus on planar triangles, are presented in this chapter. When converting a vector graphic into a raster graphic, undesirable disturbances usually occur. These disturbances are called aliasing effects, which can be reduced by antialiasing methods. A detailed section on this topic completes this chapter.
7.1 Vector Graphics and Raster Graphics An object to be drawn, if it is not already available as a finished image, must first be described or modelled. This is usually done by means of vector graphics, also called vector-oriented graphics. In such a representation, the object to be modelled is described by the combination of basic geometric objects, such as lines, rectangles, arcs of circles and ellipses. Each of these basic objects can be uniquely defined by © Springer Nature Switzerland AG 2023 K. Lehn et al., Introduction to Computer Graphics, Undergraduate Topics in Computer Science, https://doi.org/10.1007/978-3-031-28135-8_7
211
212
7 Rasterisation
(a)
(b)
(c)
Fig. 7.1 Representation of a house (a) as a vector graphic (b) and as a raster graphic (c)
specifying a few coordinates, which determine the position, and some parameters, such as the radius in the case of a circle. As explained in Sect. 3.1, these basic objects are also called geometric primitives. In OpenGL, the essential geometric primitives available are points, lines and triangles. These primitives are sometimes called base primitives in OpenGL. All other available geometric primitives for drawing and any other three-dimensional geometric object can be composed of base primitives or approximated arbitrarily. Depending on the desired quality of approximation, a large number of base primitives may be necessary. Figure 7.1b shows a description as a very simple vector graphic of the house from Fig. 7.1a. The house can be described as a sequence of point coordinates or vectors. In addition, it must be specified whether two consecutive points are to be connected with each other by a line or not. In the figure, two not to be connected points are indicated by a dashed line. In OpenGL, the description of a vector graphic is done by specifying a sequence of vertices containing, among other data, the three-dimensional position coordinates of each vertex. In addition, a parameter of the drawing command specifies the geometric primitive to be drawn and thus determines the use of the current vertex stream (see Sect. 3.3). However, this form of an object description is not directly suitable for display on a purely pixel-oriented output device such as a flat screen, projector and printer. One of the few output devices that can directly process a vector graphic is a plotter. With a pen plotter, pens are moved over the medium to be printed, for example paper, according to the vectorial description and lowered accordingly for writing.
7.1 Vector Graphics and Raster Graphics
213
With a cathode ray tube, commonly used in computer monitors and televisions in the past, a vector-oriented graphic could theoretically be drawn directly by guiding the cathode ray—or three cathode rays in a colour display—along the lines to be drawn. Depending on whether a line is to be drawn or not, the cathode ray is faded in or out. However, this can lead to the problem of a flickering screen, since the fluorescence of the screen dots diminishes if they are not hit again by the cathode ray. Flicker-free requires a refresh rate of about 60 Hz. If the cathode ray has to scan the individual lines of the vector graphic over and over again, the refresh rate depends on how many lines the image contains. Therefore, for more complex images, a sufficiently high refresh rate for flicker-free operation cannot be guaranteed. To avoid this, the cathode ray tube scans the screen line by line so that the refresh rate is independent of the graphic (content) to be displayed. However, this method is based on pixel-oriented graphics. Not only for common monitors, projectors and printers but also for various image storage formats such as JPEG,1 raster graphics or raster-oriented graphics is used. A raster graphic is based on a fixed matrix of image points (a raster). Each individual point of the raster is assigned a colour value. In the simplest case of a black and white image, the pixel is either coloured black or white. If a vector graphic is to be displayed as a raster graphic, all lines and filled polygons must be converted to points on a raster. This conversion is called rasterisation or, in computer graphics, scan conversion. This process requires a high computing effort. A common standard monitor has about two million pixels, for each of which a colour decision has to be made for every frame to be displayed, usually 60 times per second. Moreover, this conversion leads to undesirable effects. The frayed or stepped lines observed after rasterising are sometimes counted as aliasing effects. The term aliasing effect originates from signal processing and describes artefacts, i.e., artificial, undesirable effects, caused by the discrete sampling of a continuous signal. Rasterisation can therefore be seen as a sampling of a signal (the vector graphic).
Fig. 7.2 An arrowhead (left) shown in two resolutions (centre and right)
1 JPEG stands for Joint Photographic Experts Group. This body developed the standard for the JPEG format, which allows the lossy, compressed storage of images.
214
7 Rasterisation
Even though an image must ultimately be displayed as a raster graphic, it is still advantageous to model it in vector-oriented form, process it in the graphics pipeline and save it. A raster graphic is bound to a specific resolution. If a specific raster is specified, there are considerable disadvantages in the display if an output device works with a different resolution. Figure 7.2 shows the head of an arrow and its representation as a raster graphic under two different resolutions of the raster. If only the representation in the coarse resolution in the middle is known (saved), the desired finer raster on the right can no longer be reconstructed without further information. At best, the identical coarse representation in the centre could be adopted for the finer resolution by mapping one coarse pixel onto four finer pixels. If the ratio of the resolutions of the two screenings is not an integer, the conversion from one resolution to another becomes more complex. From the point of view of signal processing, the conversion of a raster graphic into another resolution basically corresponds to a resampling, whereby (new) aliasing effects can occur.
7.2 Rasterisation in the Graphics Pipeline and Fragments As shown in Sect. 7.1, vector graphics and raster graphics each have advantages and disadvantages. In order to use the advantages and at the same time avoid the disadvantages, vector graphics are used in the front stages of graphics pipelines, for example, to model objects, to build up scenes, to define the viewer’s perspective and to limit the scene to the scope to be further processed by clipping. Raster graphics are processed in the rear stages of graphics pipelines, since the raster resolution can be fixed in these stages and the necessary computations can thus be carried out efficiently, such as for the illumination of surfaces or the check for the visibility of objects (see Sect. 2.5). Therefore, within a graphics pipeline, rasterisation must be used to convert a scene to be drawn into a two-dimensional image. A scene is composed of objects that consist of geometric primitives, which in turn are described by vertices (corner points of a geometric primitive). Thus, for each different geometric primitive, a procedure for converting the vertices into a set of pixels on a grid must be defined. As explained in Sect. 7.5.3, any geometric primitive commonly used in computer graphics can be traced back to a planar triangle or a set of planar triangles. This means, in principle the consideration of only this geometric form is sufficient. All in all, a rasterisation method for planar triangles can be used to convert a complete scene from a vector graphic to a raster graphic. If a nontrivial scene is viewed from a certain camera perspective, some objects and thus the geometric primitives of the scene usually obscure each other. Therefore, when rasterising a single geometric primitive, it cannot be decided immediately which candidate image point must be displayed on the screen for correct representation. In order to be able to make this decision later, when all necessary information is available, the generated image point candidates are assigned a depth value (z-value) in addition to the two-dimensional position coordinate (window coordinate) and the
7.2 Rasterisation in the Graphics Pipeline and Fragments
215
colour information. Such a candidate image point is called a fragment in computer graphics. A fragment is distinct from a pixel, which represents an image point actually to be displayed on the screen. A pixel results from visibility considerations (see Chap. 8) of the overlapping fragments at the relevant two-dimensional position coordinate. If all fragments are opaque (i.e., not transparent), the colour value of the visible fragment is assigned to the pixel. If there are transparent fragments at the position in question, the colour value of the pixel may have to be calculated from a weighting of the colour values of the fragments concerned. In addition to the raster coordinates, the fragments are usually assigned additional data that are required for visibility considerations, special effects and ultimately for determining the final colour. These are, for example, colour values, depth values, normal vectors, texture coordinates and fog coordinates. This data, which is assigned to a vertex or fragment in addition to the position coordinates, is called associated data. Depending on the viewer’s position, the generation of many fragments from a few vertices usually takes place through the rasterisation of a geometric primitive. The calculation of the resulting intermediate values (additional required data) for the fragments is done by linear interpolation of the vertex data based on the twodimensional position coordinates of the fragment within the respective primitive. For the interpolation within a triangle, barycentric coordinates are usually used (see Sect. 7.5.4). To illustrate rasterisation methods, this book mostly uses the representation as in Fig. 7.2, in which a fragment is represented by a unit square between the grid lines. In other cases, the representation as in the left part of Fig. 7.3 is more favourable, where the fragments correspond to circles on the grid intersections. In the figure, a fragment is drawn at location (3, 3). The right side of the figure shows the definition of a fragment as in the OpenGL specification (see [20,21]). In this specification, the
Fig. 7.3 Representations of a fragment with coordinate (3, 3): In the left graph, the fragment is represented as a circle on a grid intersection. The right graph shows the representation from the OpenGL specification
216
7 Rasterisation
fragment is represented as a filled square. Its integer coordinate is located at the lower left corner of the square, which is marked by the small cross at the position (3, 3). The centre of the fragment is marked by a small circle with the exact distance ( 21 , 21 ) (offset) from the fragment coordinate. According to the OpenGL specification, a fragment does not have to be square, but can have other dimensions. If non-square fragments are used, for example, lines can be created as raster graphics that are thicker in one coordinate dimension than in the other coordinate dimension. When describing methods and algorithms for rasterisation, the literature often does not distinguish between fragments and pixels, as this difference is not relevant in isolated considerations and visibility considerations only take place in later stages of the graphics pipeline. Mostly, the term pixel is used. In order to achieve a uniform presentation and to facilitate classification in the stages of the graphics pipeline, the term fragment is used uniformly in this book when this makes sense in the context of the consideration of rasterisation by the graphics pipeline.
7.3 Rasterisation of Lines This section explains procedures for rasterising lines. Section 7.5 presents methods and procedures for rasterising or filling areas and in particular for rasterising triangles. The algorithm according to Pineda for the rasterisation of triangles can be parallelised very easily and is therefore particularly suitable for use on graphics cards, since these usually work with parallel hardware.
7.3.1 Lines and Raster Graphics In the following, the task of drawing a line from point (x0 , y0 ) to point (x1 , y1 ) on a grid is considered. In order to keep the problem as simple as possible at first, it shall be assumed that the two given points lie on the available grid, i.e., their coordinates are given as integer values. Without loss of generality, it is further assumed that the first point is not located to the right of the second point, i.e., that x0 ≤ x1 holds. A naive approach to drawing the line between these two points on the raster would step incrementally through the x-coordinates starting at x0 up to the value x1 and calculate the corresponding y-value for the line in each case. Since this y-value is generally not an integer, this y-value must be rounded to the nearest integer value. The fragment with the corresponding x- and the rounded y-coordinate is drawn. Figure 7.4 describes this algorithm in the form of pseudocode. First, it should be noted that for vertical lines, i.e., where x0 = x1 holds, this algorithm performs a division by zero when calculating the slope m and therefore fails. Even if this situation is treated as a special case, the algorithm remains ineffective, as Fig. 7.5 shows. The upper, horizontal line is drawn perfectly as can be expected for a raster graphic. The slightly sloping straight line below is also drawn correctly with the restrictions that are unavoidable due to the raster graphics. The fragments
7.3 Rasterisation of Lines
217
void drawLine(int x0, int y0, int x1, int y1) { int x; double dy = y1 - y0; double dx = x1 - x0; double m = dy/dx; double y = y0; for (x=x0; x 0: If the point would be on the line, the value y M would have to be greater. This means that the point (x M , y M ) is below the line. • F(x M , y M ) < 0: If the point would be on the line, the value y M would have to be smaller. This means that the point (x M , y M ) lies above the line. The sign of the value F(x M , y M ) can thus be used to decide where the point (x M , y M ) is located relative to the line under consideration. If the implicit representation of the straight line in the form of Eq. (7.4) is used for this decision, calculations in floating point arithmetic cannot be avoided. Since a line connecting two given grid
7.3 Rasterisation of Lines
221
points (x0 , y0 ) and (x1 , y1 ) is to be drawn, Eq. (7.4) can be further transformed so that only integer arithmetic operations are required. The line given by these two points can be expressed by the following equation: y1 − y0 y − y0 = . x − x0 x1 − x0 By solving this equation for y and with the definition of the integer values2 d x = x1 − x0 and dy = y1 − y0 , the following explicit form is obtained: dy dy y = x + y0 − x0 . dx dx From this, the implicit form can be derived as follows: dy dy 0 = x − y + y0 − x0 . dx dx After multiplying this equation by d x, we get the implicit form F(x, y) = dy · x − d x · y + C = 0
(7.5)
with C = d x · y0 − dy · x0 . The aim of these considerations is to enable the drawing of a line based only on integer arithmetic. From the assumption that the line segment to be drawn has a slope between zero and one, it follows that in each step it is only necessary to decide which of two possible fragments is to be drawn. Either the eastern (E) or the northeastern (NE) fragment must be set. To make this decision between the two fragment candidates, it must be verified whether the line is above or below the midpoint M as illustrated in Fig. 7.6. For this purpose, the representation of the line in implicit form is very useful. If the midpoint is inserted into the implicit equation of the line, the sign indicates how the midpoint lies in relation to the line. The midpoint M = (x M , y M ) lies on the grid in the x-direction and the y-direction between two grid points. The x-coordinate x M is therefore an integer, and the y-coordinate y M has the form 1 (0) yM = yM + 2 (0) with an integer value y M . Using the implicit form (7.5) of the line and the correct value y M , floating point number operations are always necessary to compute its value. However, multiplying Eq. (7.5) by the factor 2 yields the implicit form ˜ F(x, y) = 2 · dy · x − d x · ·2 · y + 2 · C = 0 (7.6)
for the line under consideration. If the midpoint M = (x M , y M ) is inserted into this equation, the computations can be reduced to integer arithmetic. Instead of directly inserting the floating point value y M , which has the decimal part 0.5, only the integer (0) 2 · y M + 1 value can be used for the term 2 · y M . should be noted that the points (x0 , y0 ) and (x1 , y1 ) are coordinates of fragments lying on a grid, so that x0 , y0 , x1 and y1 can only be integers.
2 It
222
7 Rasterisation
Mnew (xp+1 , yp+1 ) N E
NE
Mold
(xp , yp ) (xp+1 , yp+1 )
Mold
Mnew
E
(xp , yp )
E
Fig. 7.7 Representation of the new midpoint depending on whether the fragment E or N E was drawn in the previous step (based on [11])
In this way, the calculations for drawing lines in raster graphics can be completely reduced to integer arithmetic. However, Eq. (7.6) is not used directly for drawing lines, as incremental calculations can even avoid the somewhat computationally expensive (integer) multiplications. Instead of directly calculating the value (7.6) for the midpoint M in each step, only the first value and the change that results in each drawing step are determined. To determine the formulas for this incremental calculation, the implicit form (7.5) is used instead of the implicit form (7.6). In each step, the decision variable d = F(x M , y M ) = dy · x M − d x · y M + C indicates whether the fragment is to be drawn above or below the midpoint M = (x M , y M ). For d > 0, the upper fragment N E is to be set, and for d < 0 the lower fragment E. It is calculated how the value of d changes in each step. This is done by starting with the fragment (x p , y p ) drawn after commercial rounding. After drawing the fragment (x p+1 , y p+1 ), it is determined how d changes by considering the midpoint to determine the fragment (x p+2 , y p+2 ). For this purpose, two cases have to be distinguished, which are illustrated in Fig. 7.7. Case 1: E, i.e., (x p+1 , y p+1 ) = (x p + 1, y p ) was the fragment to be drawn after (x p , y p ). The left part of Fig. 7.7 shows this case. Therefore, the mid point M new = x p + 2, y p + 21 must be considered for drawing fragment (x p+2 , y p+2 ). Inserting this point into Eq. (7.5) yields the following decision variable: 1 1 = dy · (x p + 2) − d x · y p + + C. dnew = F x p + 2, y p + 2 2
7.3 Rasterisation of Lines
223
In the previous step to determine fragment (x p+1 , y p+1 ), the midpoint x p + 1, y p + 21 had to be inserted into Eq. (7.5) so that the decision variable has the following value: 1 1 = dy · (x p + 1) − d x · y p + + C. dold = F x p + 1, y p + 2 2 The change in the decision variable in this case is therefore the value Δ E = dnew − dold = dy. Case 2: N E, i.e., (x p+1 , y p+1 ) = (x p + 1, y p + 1) was the fragment to be drawn after (x p , y p ). The right part of Fig. 7.7 shows this case. Therefore, the mid point Mnew = x p + 2, y p + 23 must be considered for drawing fragment (x p+2 , y p+2 ). The value for the decision variable results as follows: 3 3 = dy · (x p + 2) − d x · y p + + C. dnew = F x p + 2, y p + 2 2 The previous value of the decision variable d is the same as in the first case of the eastern fragment E, so that the change in the decision variable is Δ N E = dnew − dold = dy − d x. In summarised form, this results in a change of the decision variable dy if E was chosen Δ = dy − d x if N E was chosen
that is Δ =
dy if dold < 0 dy − d x if dold > 0.
The value of Δ is always an integer, which means that the decision variable d only changes by integer values. In order to compute the value of the decision variable d in each step, the initial value of d is needed in addition to the change of d. This is obtained by inserting the first midpoint into Eq. (7.5). The first fragment on the line to be drawn has thecoordinates (x0 , y0 ). The first midpoint to be considered is therefore x0 + 1, y0 + 21 , so the first value of the decision variable can be calculated as follows: 1 dinit = F x0 + 1, y0 + 2 1 +C = dy · (x0 + 1) − d x · y0 + 2 dx = dy · x0 − d x · y0 + C + dy − 2 dx = F(x0 , y0 ) + dy − 2 dx = dy − . 2
224
7 Rasterisation
The value F(x0 , y0 ) is zero, since the point (x0 , y0 ) lies by definition on the line to be drawn. Unfortunately, the initial value dinit is generally not an integer, except when d x is even. Since the change d x of d is always an integer, this problem can be circumvented by considering the decision variable D = 2 · d instead of the decision variable d. This corresponds to replacing the implicit form (7.5) of the line to be drawn by the following implicit form: ˆ D = F(x, y) = 2 · F(x, y) = 2 · dy · x − 2 · d x · y + 2 · C = 0. For the determination of the fragment to be drawn, it does not matter which of the two decision variables d or D = 2 · d is used, since only the sign of the decision variables is relevant. For the initialisation and the change of D, the result is as follows: Dinit = 2 · dy − d x Dnew = Dold + Δ where 2 · dy if Dold < 0 Δ= 2 · (dy − d x) if Dold > 0.
(7.7)
(7.8)
The decision variable D takes only integer values. For the initialisation of D, multiplication and subtraction are required. In addition, the two values for Δ should be precalculated once at the beginning, which requires two more multiplications and one subtraction. When iteratively updating Δ, only one addition is needed in each step. It should be noted that multiplying a number in binary representation by a factor of two can be done very efficiently by shifting the bit pattern towards the most significant bit and adding a zero as the least significant bit, much like multiplying a number by a factor of ten in the decimal system. An example will illustrate the operation of this algorithm, which is called midpoint algorithm or after its inventor Bresenham algorithm. Table 7.1 shows the resulting values for the initialisation and for the decision variable for drawing a line from point (2, 3) to point (10, 6). After the start fragment (2, 3) is set, the negative value −2 is obtained for Dinit , so that the eastern fragment is to be drawn next and the decision variable changes by Δ E = 6. Thus, the decision variable becomes positive, the north-eastern fragment is to be drawn and the decision variable is changed by Δ N E . In this step, the value of the decision variable is zero, which means that the line is exactly halfway between the next two fragment candidates. For this example, it should be specified that in this case the eastern fragment is always drawn. Specifying that the north-eastern fragment is always drawn in this case would also be possible, but must be maintained throughout the line drawing process. The remaining values are calculated accordingly. Figure 7.8 shows the resulting line of fragments. The precondition for the application of this algorithm is that the slope of the line to be drawn lies between zero and one. As described above, if the absolute values of the slope exceed the value of one, the roles of the x- and y-axes must be swapped in the calculations, i.e., the y-values are incremented iteratively by one instead of the x-values. This always results in lines whose slope lies between −1 and 1. For lines
7.3 Rasterisation of Lines
225
Table 7.1 Calculation steps of the midpoint algorithm using the example of the line from point (2, 3) to point (10, 6) dx
= 10 − 2
=
8
dy
= 6−3
=
3
ΔE
= 2 · dy
=
6
ΔN E
= 2 · (dy − d x)
= −10
Dinit = 2 · dy − d x Dinit+1 = Dinit + Δ E
= −2 (E) =
4 (N E)
Dinit+2 = Dinit+1 + Δ N E = −6 (E) Dinit+3 = Dinit+2 + Δ E = 0 (E?)
Dinit+4 = Dinit+3 + Δ E = 6 (N E) Dinit+5 = Dinit+4 + Δ N E = −4 (E) Dinit+6 = Dinit+7 + Δ E
=
2 (N E)
Dinit+8 = Dinit+7 + Δ E
= −2
Dinit+7 = Dinit+6 + Δ N E = −8 (E)
Fig. 7.8 Example of a line from point (2, 3) to point (10, 6) drawn with the Bresenham algorithm
7
6
5
4
3 2
1
2
3
4
5
6
7
8
9 10 11
with a slope between −1 and 0, completely analogous considerations can be made as for lines with a slope between 0 and 1. Instead of the north-eastern fragment, only the south-eastern fragment has to be considered. An essential prerequisite for the midpoint algorithm is the assumption that the line to be drawn is bounded by two points that have integer coordinates, i.e., they lie on the grid. These points can be taken directly into the grid as fragments. When modelling the line as a vector graphic, this requirement does not have to be met at all. In this case, the line is drawn that results from the connection of the rounded startand endpoints of the line. This might lead to a small deviation of the line of fragments obtained from rounding the y-coordinates compared to the ideal line. However, the deviations amount to at most one integer grid coordinate and are therefore tolerable.
7.3.3 Structural Algorithm for Lines According to Brons The midpoint algorithm (see Sect. 7.3.2) requires n integer operations to draw a line consisting of n fragments, in addition to the one-time initialisation at the beginning. This means that the computational time complexity is linear to the number of frag-
226
7 Rasterisation
Fig. 7.9 A repeating pattern of fragments when drawing a line on a grid
ments. Structural algorithms for lines further reduce this complexity by exploiting repeating patterns that occur when a line is drawn on a fragment raster. Figure 7.9 shows such a pattern, which has a total length of five fragments. To better recognise the repeating pattern, the drawn fragments were marked differently. The basic pattern in Fig. 7.9 consists of one fragment (filled circle), two adjacent fragments diagonally above the first fragment (non-filled circles), followed by two more adjacent fragments diagonally above (non-filled double circles). If D denotes a diagonal step (drawing the “north-eastern” fragment) and H a horizontal step (drawing the “eastern” fragment), one repeating pattern of the line can be described by the sequence D H D H D. If the line from Fig. 7.8 did not end at the point (10, 6) but continued to be drawn, the pattern H D H H D H D H D would repeat again. This can also be seen from the calculations of the midpoint algorithm in Table 7.1. The initial value of the decision variable Dinit is identical to the last value Dinit+8 in the table. Continuing the calculations of the midpoint algorithm would therefore lead to the same results again as shown in the table. Since up to now it has always been assumed that a line to be drawn is defined by a starting point (x0 , y0 ) and an endpoint (x1 , y1 ) on a grid, the values x0 , y0 , x1 , y1 must be integers and therefore also the values d x = x1 − x0 and dy = y1 − y0 . The line to be drawn therefore has a rational slope of ddyx . For drawing the line, the y-values dy ·x +b (7.9) dx with a rational constant b and integer values x must be rounded, regardless of whether this is done explicitly as in the naive straight line algorithm or implicitly with the midpoint algorithm. It is obvious that this can only result in a finite number of different remainders when calculating the y-values (7.9). For this reason, every line connecting two endpoints to be drawn on a fragment raster must be based on a repeating pattern, even if it may be very long. In the worst case, the repetition of the pattern would only start again when the endpoint of the line is reached. Structural algorithms for drawing lines exploit this fact and determine the underlying basic pattern that describes the line. In contrast to the linear time complexity
7.3 Rasterisation of Lines
227
in the number of fragments for the midpoint algorithm, the structural algorithms can reduce this effort for drawing lines to a logarithmic time complexity, but at the price of more complex operations than simple integer additions. Following the same line of argumentation as in the context of the midpoint algorithm, the considerations of structural algorithms in this section are limited to the case of a line with a slope between zero and one. A structural algorithm constructs the repeated pattern of fragments for drawing a line as a sequence of horizontal steps (H ) and diagonal steps (D). The basic principle is outlined below. Given the starting point (x0 , y0 ) and the endpoint (x1 , y1 ) of a line with a slope between zero and one, the values d x = x1 − x0 and dy = y1 − y0 are computed. After drawing the start fragment, a total of d x fragments must be drawn. On this path, the line must rise by dy fragments. For these d x fragments, this requires dy diagonal steps. The remaining (d x − dy) drawing steps must be horizontal steps. The problem to be solved is to find the right sequence of diagonal and horizontal steps. As a first, usually a very poor approximation of the line, i.e., the sequence describing the line, the sequence H d x−dy D dy is chosen. This notation means that the sequence consists of (d x − dy) H steps followed by dy D steps. For example, H 2 D 3 defines the sequence H H D D D. The initial approximation H d x−dy D dy (see above) contains the correct number of horizontal and diagonal steps, but in the wrong order. By appropriately permuting this initial sequence, the desired sequence of drawing steps is finally obtained. The algorithm of Brons constructs the correct permutation from the initial sequence H d x−dy D dy as follows [6,7]: • If d x and dy (and therefore also (d x −dy)) have a greatest common divisor greater than one, i.e., g = gcd(d x, dy) > 1, then the drawing of the line can be realised by g repeating sequences of length d x/g. • Therefore, only the repeated pattern is considered, and it can be assumed without loss of generality that d x and dy have no common divisor greater than one. • Let P and Q be any two words (sequences) over the alphabet {D, H }. • From an initial sequence P p Q q with frequencies p and q having no common divisor greater than one and assuming without loss of generality 1 < q < p, the integer division p = k · q + r, 0 (q − r ). In these formulae, k is the integer result and r is the integer remainder of the integer division. • Continue recursively with the two subsequences of lengths r or (q −r ) until r = 1 or (q − r ) = 1 holds. How this algorithm works will be illustrated by the example of the line from the point (x0 , y0 ) = (0, 0) to the point (x1 , y1 ) = (82, 34). Obviously, d x = 82,
228
7 Rasterisation
dy = 34 and thus gcd(d x, dy) = 2 holds. The line has a slope of dy/d x = 17/41. Thus, starting from the fragment (x0 , y0 ) that lies on the ideal line, after 41 fragments another fragment is reached that lies on the ideal line. It is therefore sufficient to construct a sequence for drawing the first half of the line up to the fragment (41, 17) and then to repeat this sequence once to draw the remaining fragments. Therefore, = dy/2 = 17 are considered. Thus, the the values d x = d x/2 = 41 and dy initial sequence is H 24 D 17 and the corresponding integer division with p = 24 and q = 17 yields 24 = 1 · 17 + 7. This leads to the sequence (H D)10 (H 2 D)7 with p = 10 and q = 7. The integer division 10 = 1 · 7 + 3 results in the sequence (H D H 2 D)4 ((H D)2 H 2 D)3 . Here, p = 4 and q = 3 hold, and the final integer division is 4 = 1 · 3 + 1. Since the remainder of this division is r = 1, the termination condition of the algorithm is satisfied and the correct sequence of drawing steps results in (H D H 2 D(H D)2 H 2 D)2 ((H D H 2 D)2 (H D)2 ((H D)2 H 2 D))1 . This sequence must be applied twice to draw the complete line from point (0, 0) to point (82, 34). In contrast to the midpoint algorithm (see Sect. 7.3.2), the algorithm described in this section can have a logarithmic time complexity depending on the number of fragments to be drawn. Depending on the actual implementation, however, the manipulation of strings or other complex operations may be required, which in turn leads to disadvantages in runtime behaviour.
7.3.4 Midpoint Algorithm for Circles Section 7.3.2 presents an efficient algorithm for drawing lines on a grid whose calculations are based solely on integer arithmetic. This midpoint algorithm can be generalised for drawing circles and other curves under certain conditions. The essential condition is that the centre (xm , ym ) of the circle to be drawn lies on a grid point, i.e., has integer coordinates xm and ym . In this case, it is sufficient to develop a method for drawing a circle with its centre at the origin of the coordinate system. To obtain a circle with centre (xm , ym ), the algorithm for a circle with centre (0, 0) has to be applied and the calculated fragments have to be drawn with an offset of (xm , ym ). In order to determine the coordinates of the fragments to be drawn for a circle around the origin of the coordinate system, the calculations are explicitly carried out for only one-eighth of the circle. The remaining fragments result from symmetry considerations, as shown in Fig. 7.10. If the fragment (x, y) has to be drawn in the hatched octant, the corresponding fragments (±x, ±y) and (±y, ±x) must also be drawn in the other parts of the circle. To transfer the midpoint or Bresenham algorithm to circles [4], assume that the radius R of the circle is an integer. In the octant circle under consideration, the slope of the circular arc lies between 0 and −1. Analogous to the considerations for the midpoint algorithm for lines with a slope between zero and one, only two
7.3 Rasterisation of Lines
229
Fig. 7.10 Exploiting the symmetry of the circle so that only the fragments for one-eighth of the circle have to be calculated
(-x,y)
(x,y)
(-y,x)
(y,x)
(-y,-x)
(y,-x)
(-x,-y)
(x,-y)
Fig. 7.11 Representation of the fragment to be drawn next after a fragment has been drawn with the midpoint algorithm for circles
fragments are available for selection as subsequent fragments of a drawn fragment. If the fragment (x p , y p ) has been drawn in one step, then—as shown in Fig. 7.11— only one of the two fragments E with the coordinates (x p + 1, y p ) or S E with the coordinates (x p + 1, y p − 1) is the next fragment to be drawn. As with the midpoint algorithm for lines, the decision as to which of the two fragments is to be drawn is to be made with the help of a decision variable. To do this, the circle equation x 2 + y 2 = R 2 is rewritten in the following form: d = F(x, y) = x 2 + y 2 − R 2 = 0. For this implicit equation and a point (x, y), the following statements hold: • F(x, y) = 0 ⇔ (x, y) lies on the circular arc. • F(x, y) > 0 ⇔ (x, y) lies outside the circle. • F(x, y) < 0 ⇔ (x, y) lies inside the circle.
(7.10)
230
7 Rasterisation
In order to decide whether the fragment E or S E is to be drawn next, the midpoint M is inserted into Eq. (7.10) leading to the following cases for decision: • If d > 0 holds, S E must be drawn. • If d < 0 holds, E must be drawn. As with the midpoint algorithm for lines, the value of the decision variable d is not calculated in each step by directly inserting the midpoint M. Instead, only the change in d is calculated at each step. Starting with fragment (x p , y p ) which is assumed to be drawn correctly, the change of d is calculated for the transition from fragment (x p+1 , y p+1 ) to fragment (x p+2 , y p+2 ). The following two cases must be distinguished. Case 1: E, i.e., (x p+1 , y p+1 ) = (x p + 1, y p ) was the fragment drawn after (x p , y p ). This corresponds to the case shown in Fig. 7.11. The midpoint M E under consideration for drawing the fragment (x p+2 , y p+2 ) has the coordinates x p + 2, y p − 21 . Inserting this midpoint into Eq. (7.10) yields the following value for the decision variable d: 1 1 2 2 = (x p + 2) + y p − dnew = F x p + 2, y p − − R2. 2 2 In the previous step to determine the fragment (x p+1 , y p+1 ), the midpoint x p + 1, y p + 21 was considered. Inserting this midpoint into Eq. (7.10) gives the prior value of the decision variable as follows: 1 1 2 = (x p + 1)2 + y p − − R2. dold = F x p + 1, y p − 2 2 The change in the decision variable in this case is thus the following value: Δ E = dnew − dold = 2x p + 3. Case 2: S E, i.e., (x p+1 , y p+1 ) = (x p + 1, y p − 1) was the fragment drawn after (x p , y p ). In this case next, the midpoint to be considered is M S E = x p + 2, y p − 23 (see Fig. 7.11). This results in the following value for the decision variable: 3 3 2 2 = (x p + 2) + y p − − R2. dnew = F x p + 2, y p − 2 2 The previous value of the decision variable d is the same as in the first case of the eastern fragment E, so the change of the decision variable is given by the following equation: Δ S E = dnew − dold = 2x p − 2y p + 5.
7.3 Rasterisation of Lines
231
In summary, the change in the decision variable for both cases is as follows: if E was chosen 2x p + 3 Δ = 2x p − 2y p + 5 if S E was chosen.
This means Δ =
2x p + 3 if dold < 0, 2x p − 2y p + 5 if dold > 0,
so that the change Δ of the decision variable d is always an integer. In order to compute the decision variable d in each step, the initial value must also be determined. The first fragment to be drawn has the coordinates (0, R), so that 1, R − 21 is the first centre to be considered. The initial value of d is therefore 5 1 = F 1, R − − R. (7.11) 2 4 As in the case of lines, the decision variable changes only by integer values, but the initial value is not necessarily an integer. Similar to the algorithm for lines, the decision variable D = 4 · d could be used to achieve an initialisation with an integer value. A simpler solution is, however, to ignore the resulting digits after the decimal dot in Eq. (7.11). This is possible for the following reason. In order to decide which fragment to draw in each case, it is only necessary to determine in each step whether the decision variable d has a positive or negative value. Since d only changes by integer values in each step, the decimal part of the initialisation value cannot influence the sign of d. In deriving the midpoint algorithm for drawing circles, it was assumed that the centre of the circle is the coordinate origin or at least a grid point. Furthermore, it was assumed that the radius is also an integer. The midpoint algorithm can easily be extended to circles with arbitrary, not necessarily integer radius. Since the radius has no influence on the change of the decision variable d, the (non-integer) radius only needs to be taken into account when initialising d. For a floating pointradius R, (0, round(R)) is the first fragment to be drawn and thus 1, round(R) − 21 is the first midpoint to be considered. Accordingly, d must be initialised by the following value 5 1 = F 1, round(R) − − round(R). 2 4 For the same reasons as for circles with an integer radius, the digits after the decimal dots can be ignored. This makes the initialisation of d an integer and the change in d remains an integer independent of the radius. Both the midpoint algorithm for lines and for circles can be parallelised (see [24]). This means that these algorithms can be used efficiently by GPUs with parallel processors.
232
7 Rasterisation
7.3.5 Drawing Arbitrary Curves The midpoint algorithm can be generalised not only to circles but also to other curves, for example for ellipses [1,15,18]. An essential, very restrictive condition for the midpoint algorithm is that the slope of the curve must remain between 0 and 1 or between 0 and −1 in the interval to be drawn. For drawing arbitrary or at least continuous curves or for the representation of function graphs, a piecewise approximation by lines is therefore made. For drawing the continuous function y = f (x), it is not sufficient to iterate stepwise through the x-values for each grid position and draw the corresponding fragments with the rounded y-values. In areas where the function has an absolute value of the slope greater than one, the same gaps in drawing the function graph would result as produced by the naive line drawing algorithm in Fig. 7.5. Lines have a constant slope that can be easily determined. Therefore, this problem for lines is solved by swapping the roles of the x- and y-axes when drawing a line with an absolute value of the slope greater than one. This means that in this case the inverse function is drawn along the y-axis. Arbitrary functions do not have a constant slope, and both the slope and the inverse function are often difficult or impossible to calculate. For this reason, drawing arbitrary curves is carried out by stepwise iterating through the desired range on the x-axis and computing the corresponding rounded y-values. However, not only these fragments are drawn but also the connecting line between fragments with neighbouring x-values is drawn based on the midpoint algorithm for lines. Figure 7.12 illustrates this principle for drawing a continuous function y = f (x) in the interval [x0 , x1 ] with x0 , x1 ∈ IN. The filled circles are the fragments of the form (x, round( f (x))) for integer x-values. The unfilled circles correspond to the fragments that are set when drawing the connecting lines of fragments with neighbouring x-coordinates. When a curve is drawn in this way, fragments are generally
int yRound1, yRound2; yRound1 = round(f(x0)) for (int x=x0; x (n T p ) ⇔ q lies on the side of the straight line in the direction of • (nn,i j j n,i i the normal vector nn,i . T q ) < (n T p ) ⇔ q lies on the side of the straight line against the direction • (nn,i j j n,i i of the normal vector nn,i .
If the normal vectors of all edges of the polygon under consideration are set to point inwards, as for the edge from P0 to P1 in Fig. 7.25, then Eq. (7.12) can be extended to the following decision functions: ⎧ ⎨ = 0 if q j lies on the edge i T (7.13) ei (q j ) = ni (q j − pi ) > 0 if q j lies inside relative to edge i ⎩ < 0 if q j lies outside relative to edge i.
7.5 Rasterisation and Filling of Areas
247
If the considered grid point q j according to the decision functions ei lies relative to all edges i with i = 0, 1, ..., (N − 1) within an N -sided polygon, then this point is a fragment of the rasterised polygon. Since only a sign test is needed when using the decision functions, the length of the normal vectors is irrelevant. Therefore, Eq. (7.13) uses the nonnormalised normal vectors ni . Omitting the normalisation saves computing time. In computer graphics, it is a common convention to arrange the vertices on the visible side of a polygon in a counter-clockwise order. If this order is applied to the convex N -sided polygon to be rasterised, then the nonnormalised normal vectors can be determined by rotating the displacement vectors between two consecutive vertices by 90◦ counter-clockwise. The nonnormalised normal vector for the edge i (i = 0, 1, ..., (N − 1)) through points Pi and Pi+1 can be determined using the matrix for a rotation by 90◦ as follows: 0 −1 ( pi+1 − pi ). (7.14) ni = 1 0 To obtain a closed polygon, p N = p0 is set. This captures all N edges of the N -sided polygon and all normal vectors point in the direction of the interior of this polygon. Alternatively, it is possible to work with a clockwise orientation of the vertices. In this case, the calculations would change accordingly. Besides the simplicity of the test if a fragment belongs to the polygon to be rasterised, Eq. (7.13) has a useful locality property. For this, consider the edge i = 0 of a polygon through the points P0 and P1 with their position vectors p0 = ( p0x , p0y )T and p1 = ( p1x , p1y )T . Using Eq. (7.14), the associated nonnormalised normal vector yields as follows: −( p1y − p0y ) 0 −1 p1x − p0x = . (7.15) n0 = p1y − p0y p1x − p0x 1 0 For a location vector q j = (x j , y j )T to a grid point Q j for which it is to be decided whether it lies within the polygon, inserting Eq. (7.15) into Eq. (7.13) yields the following decision function: e0 (x j , y j ) = e0 (q j ) = −( p1y − p0y )(x j − p0x ) + ( p1x − p0x )(y j − p0x ) = −( p1y − p0y )x j + ( p1x − p0x )y j + ( p1y − p0y ) p0x − ( p1x − p0x ) p0x = a0 x j + b0 y j + c0 .
(7.16)
Let the grid for the rasterisation and thus a location vector q j = (x j , y j )T be described in integer coordinates. Furthermore, the decision function e0 (q j ) = e0 (x j , y j ) has already been evaluated for the grid point in question. Then the value of the decision function e0 for the grid point (x j + 1, y j ), which lies one (integer) grid coordinate further along the x-coordinate next to the already evaluated grid point, results from Eq. (7.16) as follows: e0 (x j + 1, y j ) = a0 (x j + 1) + b0 y j + c0 = a0 x j + b0 y j + c0 + a0 = e0 (x j , y j ) + a0 .
(7.17)
248
7 Rasterisation
Fig. 7.26 Traversing the bounding box (thick frame) of a triangle to be rasterised on a zigzag path (arrows). The starting point is at the top left
This means that the value of the decision function for the next grid point in the x-direction differs only by adding a constant value (during the rasterisation process of a polygon). Similar considerations can be made for the negative x-direction and the y-directions and applied to all decision functions. This locality property makes it very efficient to incrementally determine the values of the decision functions from the previous values when traversing an area to be rasterised. In Fig. 7.26, a zigzag path can be seen through the bounding box of a triangle to be rasterised. This is a simple approach to ensure that all points of the grid potentially belonging to the triangle are covered. At the same time, the locality property from Eq. (7.17) can be used for the efficient incremental calculation of the decision functions. Depending on the shape and position of the triangle, the bounding box is very large and many grid points outside the triangle have to be examined. Optimisations for the traversal algorithm are already available from the original work by Pineda [17]. Besides this obvious approach, a hierarchical traversal of areas to be rasterised can be used. Here, the raster to be examined is divided into tiles, which consist of 16× 16 raster points, for example. Using an efficient algorithm (see [2, pp. 970–971]), a test of a single grid point of a tile can be used with the help of the decision functions to decide whether the tile must be examined further or whether the entire tile lies outside the triangle to be rasterised. If the tile must be examined, it can again be divided into smaller tiles of, for example, size 4 × 4, which are examined according to the same scheme. Only for the tiles of the most detailed hierarchy level must an
7.5 Rasterisation and Filling of Areas
249
examination take place for all grid points, for example on a zigzag path. Traversing through the tiles and examining them at each of the hierarchy levels can take place either sequentially on a zigzag path or by processors of a GPU working in parallel. At each level of the hierarchy, the locality property of the decision functions (Eq. (7.17)) for traversing from one tile to the next by choosing the step size accordingly can also be used. For example, on a stage with tiles of 16 × 16 grid points in x-direction, the value (16 · a0 ) has to be added to the value of the decision function e0 for one tile to get the value for the following tile. This enables the very effective use of the parallel hardware of GPUs. In Fig. 7.25, next to the fragments belonging to the rasterised triangle, grid points are brightly coloured, which are called auxiliary fragments. The additional use of these auxiliary fragments results in tiles of size 2 × 2, each containing at least one fragment inside the triangle under consideration. The data from these tiles can be used for the approximation of derivatives, which are used for the calculation of textures. Furthermore, the processing using hierarchical tiles is useful for the use of mipmaps. Chapter 10 presents details on the processing of textures and mipmaps. In this context, it is important to note that in addition to deciding which raster point is a fragment of the triangle to be rasterised, further data must be calculated (by interpolation) for each of these fragments. This includes, for example, depth values, colour values, fog parameters and texture values (see Sect. 7.5.4). This data must be cached if it is derived from the data of the environment of a fragment, which is often the case. Even if the tiles are traversed sequentially, hierarchical traversal has the advantage that locally adjacent data is potentially still in a memory buffer for fast access, i.e., a cache, which can be (repeatedly) accessed efficiently. If the entire raster is processed line by line according to a scan line technique, then the individual scan lines might be so long in x-direction, for example, that required neighbouring data are no longer in the cache when the next scan line is processed for the next y-step. In this case, the capacity of the cache memory might have been exhausted in the meantime. If this happens, the data must either be recalculated or loaded from the main memory, both of which have a negative effect on processing performance. To solve the special case where a grid centre lies exactly on the edges of two directly adjacent triangles, the top-left rule is often used. This rule is applied to make an efficient decision for the assignment to only one triangle. Here, the grid point is assigned exactly to the triangle where the grid centre lies on an upper or a left edge. According to this rule, an upper edge is defined as an edge that is exactly horizontal and located on the grid above the other two edges. A left edge is defined as a nonhorizontal edge located on the left side of the triangle. Thus, either one left edge or two left edges belong to the triangle under consideration. These edge properties can be efficiently determined from Eq. (7.17). According to this, an edge is a • horizontal edge if (a0 = 0) and (b0 < 0) holds, or a • left edge if (a0 = 0) and (b0 > 0) holds. The complete test of whether a grid point is a fragment of the triangle to be rasterised is sometimes referred to in the literature as inside test ([2, pp. 996–997]).
250
7 Rasterisation
The Pineda algorithm can be used for the rasterisation of convex polygons with N edges. The edges of a clipping area can be taken into account and thus clipping can be performed simultaneously with the rasterisation of polygons (cf. [17]). Furthermore, by using edge functions of higher order, an extension to curved edges of areas is possible (cf. [17]). On the other hand, any curved shape can be approximated by a—possibly large—number of lines or triangles, which is often used in computer graphics. Since every polygon can be represented by a decomposition of triangles and a triangle is always convex, a triangle can be used as the preferred geometric primitive. Processing on GPUs can be and is often optimised for triangles. For the rasterisation of a triangle, the edge equations (7.13) for N = 3 are to be used. A line can be considered as a long rectangle, which—in the narrowest case—is only one fragment wide. Such a rectangle can either be composed of two triangles or rasterised using the decision functions (7.13) for N = 4. A point can also be represented in this way as a small square, which in the smallest case consists of a single fragment. This makes it possible to use the same (parallelised) graphics hardware for all three necessary types of graphical primitives.
7.5.4 Interpolation of Associated Data In addition to deciding which raster point represents a fragment of the triangle to be rasterised, whereby each fragment is assigned a position in window coordinates, further data must be calculated for each of these fragments. These data are, for example, depth values, colour values, normal vectors, texture coordinates and parameters for the generation of fog effects. This additional data is called associated data. Each vertex from the previous pipeline stages is (potentially) assigned associated data in addition to the position coordinates. In the course of rasterisation, the associated fragment data must be obtained by interpolating this associated vertex data. For such an interpolation, the barycentric coordinates (u, v, w) within a triangle can be used. Figure 7.27 shows a triangle with vertices P0 , P1 and P2 . Based on the location vectors (position coordinates) p0 = ( p0x , p1y )T , p1 = ( p1x , p1y )T and p2 = ( p2x , p2y )T to these points, a location vector (position coordinates) q j = (q j x , q j y )T can be determined for a point Q j within the triangle or on its edge as follows: q j = u p0 + v p1 + w p2 . The real numbers of the triple (u, v, w) ∈ R3 are the barycentric coordinates of the point Q j . If u + v + w = 1 holds, then u, v, w ∈ [0, 1] and (u, v, w) are the normalised barycentric coordinates of the point Q j . In the following considerations of this section, normalised barycentric coordinates are assumed. As shown below, (u, v, w) can be used to calculate the interpolated associated data for a fragment. The barycentric coordinates of a triangle are defined by the areas A0 , A1 and A2 of the triangles, which are respectively opposite the corresponding points P0 , P1 and
7.5 Rasterisation and Filling of Areas
251
Fig. 7.27 Visualisation of the definition of barycentric coordinates in a triangle with vertices P0 , P1 and P2 . The areas A0 , A1 and A2 , which are defined by subtriangles, each lie opposite a corresponding vertex. The dashed line h represents the height of the subtriangle with the vertices P0 , P1 and Q j
P2 (see Fig. 7.27) as follows (see also [2, p. 998]): A0 A1 A2 u= v= w= with Ag = A0 + A1 + A2 . (7.18) Ag Ag Ag As will be shown in the following, the surface area of the triangles can be determined from the edge functions, which can also be used for Pineda’s algorithm for the rasterisation of polygons (see Sect. 7.5.3). For this purpose, consider the edge function (7.13) for i = 0, i.e., for the line from the point P0 to the point P1 : e0 (q j ) = e0 (x j , y j ) = n0T (q j − p0 ). The definition of the scalar product between two vectors gives the following equation: (7.19) e0 (q j ) = n0T · (q j − p0 ) · cos(α). The angle α is the angle between the two vectors of the scalar product. The first term n0T in the formula is the length (magnitude) of the nonnormalised normal vector. This vector results from a rotation by 90◦ of the vector from P0 to P1 (see Eq. (7.14)). Since the lengths of the non-rotated and the rotated vectors are identical, the following holds for the base b of the triangle A2 : b = n0T | = p1 − p0 . The remaining term from Eq. (7.19) can be interpreted as the projection of the vector from P0 to Q j onto the normal vector. This projection represents exactly the height h of the triangle A2 , so it holds the following: h = (q j − p0 ) · cos(α).
252
7 Rasterisation
Since the area of a triangle is half of the base side multiplied by the height of the triangle, the area can be determined from the edge function as follows: 1 1 b h = e0 (q j ). 2 2 From the definition of the barycentric coordinates for triangles (Eq. (7.18)), the following calculation rule for (u, v, w) results: A2 =
u=
1 1 1 e1 (x j , y j ) v = e2 (x j , y j ) w = e0 (x j , y j ) eg eg eg
(7.20)
with eg = e0 (x j , y j ) + e1 (x j , y j ) + e2 (x j , y j ). The term (1/2) has been dropped out by cancellation. Alternatively, the barycentric w-coordinate can be determined by w = 1 − u − v. To increase efficiency, eg = 2 Ag can be precomputed and reused for a triangle to be rasterised, since the area of the triangle does not change. The barycentric coordinates for a fragment can thus be determined very quickly if Pineda’s algorithm is used for the rasterisation of a triangle and the above-described edge functions are applied for the inside test. The barycentric coordinates according to Eq. (7.20) can be used for interpolation for depth values and in the case of a parallel projection. However, they do not always yield the correct interpolation results for another type of projection or other associated data, if the position coordinates of the underlying vertices of the points P0 , P1 , P2 in four-dimensional homogeneous coordinates do not have 1 as the fourth coordinate after all transformations. The normalisation of the homogeneous coordinates to the value 1 in the fourth component of the position coordinates of a vertex takes place by the perspective division, whereby a division of the first three position coordinates by the fourth component takes place (see Sect. 5.41). If this normalisation has not taken place, it can be done for each fragment. For further explanations and references to the derivation, please refer to [2, pp. 999– 1001]. In the following, only the results of this derivation are presented. With w0 , w1 and w2 as the fourth components (homogeneous coordinates) of the vertices of the points P0 , P1 and P2 after all geometric transformation steps, the perspective correct barycentric coordinates (u, ˜ v, ˜ w) ˜ result as follows: u˜ =
e1 (x j , y j ) e2 (x j , y j ) 1 1 · · v˜ = w˜ = 1 − u˜ − v˜ (7.21) w1 e˜g (x j , y j ) w2 e˜g (x j , y j ) with e˜g (x j , y j ) =
e0 (x j , y j ) e1 (x j , y j ) e2 (x j , y j ) + + . w0 w1 w2
As can be seen from Eq. (7.21), the value e˜g (x j , y j ) must be determined anew for each fragment, while eg from Eq. (7.20) is constant for all fragments per triangle. Let f 0 , f 1 and f 2 be associated data at the points P0 , P1 and P2 assigned to the respective vertices at these points. Then the interpolated value of the associated datum f (x j , y j ) at the point Q j can be determined using the perspective correct barycentric coordinates as follows: f (x j , y j ) = u˜ f 0 + v˜ f 1 + w˜ f 2 .
(7.22)
7.5 Rasterisation and Filling of Areas
253
These data are, for example, colour values, normal vectors, texture coordinates and parameters for fog effects. As mentioned above, the depth values z 0 , z 1 and z 2 at the respective points P0 , P1 and P2 can be interpolated using the barycentric coordinates (u, v, w) from Eq. (7.20). The division z˜ i = z i /wi should take place per vertex. Then the following calculation rule for the interpolated depth values can be used: z(x j , y j ) = u z˜ 0 + v z˜ 1 + w z˜ 2 .
(7.23)
For simple rasterisation of polygons, the OpenGL specification [21] defines the interpolation of depth values by Eq. (7.23) and for all other associated data the interpolation by Eq. (7.22).
7.5.5 Rasterising and Filling Polygons in the OpenGL The rasterisation and filling of polygons in the OpenGL are possible, for example, with the help of a scan line technique (see Sect. 7.5.2) or the Pineda algorithm (see Sect. 7.5.3). The OpenGL specification does not prescribe a special procedure for this. In the core profile, all polygons to be rendered must be composed of planar triangles, because in this profile only separate planar triangles (GL_TRIANGLES) or sequences of triangles (GL_TRIANGLE_STRIP or GL_TRIANGLE_FAN) are supported as polygon shapes (see Sect. 3.2). In the compatibility profile, there is a special geometric primitive for polygons. It must be possible to calculate the data of the fragments belonging to a polygon from a convex combination (a linear combination) of the data of the vertices at the polygon corners. This is possible, for example, by triangulating the polygon into triangles without adding or removing vertices, which are then rasterised (cf. [20]). In OpenGL implementations for the compatibility profile that rely on the functions of the core profile, triangle fans (GL_TRIANGLE_FAN) are preferred for this purpose. In the OpenGL specification, point sampling is specified as a rule for determining which fragment is generated for a polygon in the rasterisation process. For this, a twodimensional projection of the vertex position coordinates is performed by selecting the x- and y-values of the position coordinates of the polygon vertices. In the case that two edges of a polygon are exactly adjacent to each other and a grid centre lies exactly on the edge, then one and only one fragment may be generated from this [21]. For the interpolation of associated data (to the vertices) within a triangle, i.e., for the filling of the triangle, the OpenGL specification defines the use of barycentric coordinates (see Sect. 7.5.4). If an interpolation is to take place, Eq. (7.22) is to be used for most data to achieve a perspective correct interpolation. Only for depth values Eq. (7.23) must be used. It has to be taken into account that the barycentric coordinates (u, v, w) or (u, ˜ v, ˜ w) ˜ used in these equations have to exactly match the coordinates of the grid centres. The data for the fragment to be generated must therefore be obtained by sampling the data at the centre of the raster element [21].
254
7 Rasterisation
By default, the values of the shader output variables are interpolated by the rasterisation stage according to Eq. (7.22). By using the keyword noperspective before the declaration of a fragment shader input variable, Eq. (7.23) is set to be used for the interpolation. This equation shall be used for interpolating the depth values. If the keyword flat is specified for a fragment shader input variable, then no interpolation takes place. In this case, the values of the associated data from a defined vertex of the polygon—the provoking vertex—is taken without any further processing (and without an interpolation) as the associated data of all fragments of the polygon in question (see Sect. 9). Using the keyword smooth for a fragment shader input variable results in an interpolation of the values as in the default setting (see above).3 The default setting option (smooth shading) allows the colour values of a polygon to be interpolated using barycentric coordinates within a triangle, which is required for Gouraud shading. Using this interpolation for normal vectors (as associated data) realises Phong shading. By switching off interpolation, flat shading can be achieved. Even without the use of shaders, flat shading can be set in the compatibility profile by the command glShadeModel(gl.GL_FLAT). The default setting is interpolation, which can be activated by glShadeModel(gl.GL_SMOOTH) and results in Gouraud shading. This allows the rasterisation stage of the OpenGL pipeline to be used effectively for the realisation of the shading methods frequently used in computer graphics. Section 9.4 explains these shading methods in more detail. Section 3.2.4 describes how the command glPolygonMode can be used to determine whether a polygon or a triangle is filled, or whether only the edges or only points at the positions of the vertices are drawn.
7.6 Aliasing Effect and Antialiasing The term aliasing effect or aliasing describes signal processing phenomena that occur when sampling a (continuous) signal if the sampling frequency f s , also called sampling rate, is not twice the highest occurring frequency f n in the signal being sampled. This relationship can be described as a formula in the sampling theorem as follows: (7.24) fs ≥ 2 · fn . Here, the frequency f n is called Nyquist frequency or folding frequency. Sampling usually converts a continuous signal (of the natural world) into a discrete signal. It is also possible to sample a discrete signal, creating a new discrete signal.
3 In the OpenGL specifications [20,21], it is described that the interpolation qualifiers (flat, noperspective and smooth) are to be specified for the output variables of the vertex processing stage. However, https://www.khronos.org/opengl/wiki/Type_Qualifier_(GLSL) clarifies that such a change only has an effect if the corresponding keywords are specified for input variables of the fragment shader. This can be confirmed by programming examples.
7.6 Aliasing Effect and Antialiasing
255
In computer graphics, such a conversion takes place during rasterisation, in which a vector graphic is converted into a raster graphic. For objects described as a vector graphic, colour values (or grey values) can be determined for any continuous location coordinate. However, this only applies within the framework of a vectorial representation. If, for example, a sphere has been approximated by a certain number of vertices of planar triangles, then colour values at arbitrary coordinates can only be determined for these approximating triangles and not for the original sphere. Strictly speaking, in this example sampling of the sphere surface has already taken place when modelling the sphere through the triangles. In this case, the signal is already discrete. If textures, which are mostly available as raster graphics, have to be (underor over-) sampled, there is also no continuous signal available that can be sampled. In these cases, a discrete signal is sampled, resulting in another discrete signal. Due to the necessary approximations, (further) undesirable effects may arise. The rasterisation process samples the colour values of the available signals at the discrete points of the raster (usually with integer coordinates) onto which the scene is mapped for output to the frame buffer and later on a screen. In the scene (in the source and target signal), there are periodic signal components that are recognisable, for example, by repeating patterns. The frequencies at which these signal components repeat are called spatial frequencies. If a measurable dimension of the image is available, a spatial frequency can be specified in units of m1 (one divided by metres) or dpi (dots per inch). The sampling (spatial) frequency results from the distances between the points (or lines) of the raster. The shorter the distance between the dots, the higher the sampling frequency. A special feature of signals in computer graphics is their artificial origin and the resulting ideal sharp edges, which do not occur in signals of the natural world. From Fourier analysis, it is known that every signal can be represented uniquely from a superposition of sine and cosine functions of different frequencies, which in this context are called basis functions. Under certain circumstances, an infinite number of these basis functions are necessary for an exact representation. An ideal sharp edge in a computer graphics scene represents an infinitely fast transition, for example, from black to white. In other words, an infinitely high spatial frequency occurs at the edge. Since the sampling frequency for an error-free sampling of this spatial frequency must be at least twice as large (see Eq. (7.24)), the grid for sampling would have to be infinitely dense. It is therefore impossible in principle (except for special cases) to convert the ideal sharp edges of a computer graphic into a raster graphic without error. There will always be interferences due to approximations, which must be minimised as much as possible. In this book, the theory and relationships in signal and image processing are presented in a highly simplified manner and are intended solely for an easy understanding of the basic problems and solution approaches in computer graphics. Detailed insights into signal processing and Fourier analysis in image processing can be found for example in [8].
256
7 Rasterisation
7.6.1 Examples of the Aliasing Effect In Fig. 7.28, two dashed thin lines are shown greatly magnified, which are discretely sampled by the drawn grid. For this example, the ideal sharp edges through the broken lines shall be neglected and only the spatial frequencies due to the dashing shall be considered. In the middle line, the dashes follow each other twice as often as in the upper line. Due to this dashing, the upper line contains a half as high spatial frequency as the middle line. The filling of the squares in the grid indicates in the figure which fragments are created by rasterising these lines. For the upper line, the dashing is largely correctly reproduced in the raster graphic. For the middle line, however, the resolution of the raster is not sufficient to reproduce the spatial frequency of the dashing in the raster graphic. With this aliasing effect, fragments are generated at the same raster positions as with the rasterisation of a continuous line, which is drawn in the figure below for comparison. In a continuous line, there are no periodic changes (dash present–dash absent), so the spatial frequency contained in this line is zero with respect to the dashing. For the example of the upper line and by choosing this grid resolution, which determines the sampling frequency, the sampling theorem is fulfilled. The sampling frequency is exactly twice as large as the spatial frequency due to the dashing. The sampling theorem is not fulfilled for the middle line. The sampling frequency is equal to the spatial frequency through the dashing, which causes the described aliasing effect. Furthermore, it can be seen in this example that the result of the rasterisation
Fig. 7.28 Aliasing effect when rasterising dashed lines with different spatial frequencies
7.6 Aliasing Effect and Antialiasing
257
Fig. 7.29 Moiré pattern as an example of an aliasing effect
only approximates the upper line, since the lengths of the dashes of the dashing do not correspond to the size of a raster element (fragment). Figure 7.29 shows another example of an aliasing effect. For this graphic, two identical grids were placed exactly on top of each other and then one of these grids was rotated by 4.5 degrees (counter-clockwise). When viewing the image, a (visual) sampling of the rear grid by the front grid takes place. Since both rasters are identical, they have identical resolutions and thus contain identical maximum spatial frequencies. As a consequence, the sampling theorem is violated because the maximum spatial frequency of the signal to be sampled (the rear grid) is equal to the sampling frequency (determined by the front grid). In this case, the sampling frequency is not twice as high, i.e., the sampling raster is not twice as narrow. This causes an aliasing effect, which in this example is expressed by the perceptible rectangular patterns that are not present in the original rasters. Such patterns are referred to as Moiré patterns. They occur on output devices in which a raster is used whenever the sampling theorem is violated. The rectangular patterns observed in the Moiré pattern in this example follow each other more slowly than the lines of the raster when these sequences of patterns and lines are considered in the direction of a coordinate
258
7 Rasterisation
Fig. 7.30 Examples of stairs in lines drawn by an OpenGL renderer: On the right is a cropped magnification of the lines on the left
axis. The new patterns thus have a lower spatial frequency than the spatial frequency generated by the periodic succession of the grid lines. This aliasing effect creates a low spatial frequency component in the sampled signal (the result of the sampling process) that is not present in the original signal. Figure 7.30 shows lines drawn by an OpenGL renderer.4 In the magnification, the staircase-like progression is clearly visible, which is due to the finite resolution of the grid for rasterisation. Such a shape for the oblique lines results from the fact that a fragment can only be assigned one grid position out of a finite given set. In each of the right oblique lines, a regular pattern with a lower spatial frequency compared to the spatial frequency of the raster can be seen, which is not provided by the line in vector graphic representation. In the literature, the staircase effect is often counted among the effects due to aliasing. Watt distinguishes this effect from the classic aliasing effects, where high spatial frequencies that are too high for sampling at the given sampling frequency erroneously appear as low spatial frequencies in the sampled signal (see [23, p. 440f]), as can be seen in the example of the Moiré pattern (see above). Staircases like the one in Fig. 7.30, sometimes referred to as frayed representations, are also found in rasterised points and at the edges of polygons. When objects, usually represented by geometric primitives, are animated, the resulting movement adds temporal sampling at discrete points in time, which can lead to temporal aliasing effects. Together with the spatial aliasing effects at the edges of the objects, these effects can be very disturbing in computer graphics.
4 To illustrate the staircase effect, all antialiasing measures were disabled when drawing these lines.
7.6 Aliasing Effect and Antialiasing
259
7.6.2 Antialiasing As is known from signal processing and can be derived from the sampling theorem (7.24), there are two ways to avoid or reduce aliasing effects. Either the sampling frequency is increased or the (high) spatial frequencies contained in the signal are reduced, so that the sampling frequency is twice as high as the maximum spatial frequency occurring in the signal and thus the sampling theorem holds. To reduce the maximum spatial frequency, the signal must be filtered with a filter that allows only low spatial frequencies to pass and suppresses high frequencies. A filter with these properties is called a low-pass filter. In computer graphics, the resolution of the raster determines the sampling frequency (see Sect. 7.6). Using a fine raster instead of a coarse raster for rasterisation results in an increase of the sampling frequency, which is called upsampling. The resolution of the frame buffer determines the target resolution for the output of the graphics pipeline. However, the grid resolution for the steps in the graphics pipeline from rasterisation to the frame buffer can be increased. Before the signal (the rendered scene) is output to the frame buffer, the increased grid resolution must be reduced again to the resolution of the frame buffer. In signal processing, this step of reducing the resolution is called downsampling and usually consists of low-pass filtering5 with a subsequent decrease of the resolution (decimation). Since this filtering takes place as the last step after processing the fragments, this antialiasing method is called post-filtering. The right side of Fig. 7.31 shows the steps for post-filtering within the OpenGL graphics pipeline. As part of the rasterisation, an increase in the resolution of the grid (compared to the resolution of the frame buffer) takes place through a so-called multisampling of the scene. In OpenGL, the special procedure multisample antialiasing is used for post-filtering, which is explained in more detail in Sects. 7.6.6 and 7.6.8. An essential step of this process is to increase the sampling rate of the scene by using K samples per pixel of the frame buffer resolution. The reduction is the final step in processing the fragments by the per-fragment operations, in which the scene is converted back to the frame buffer resolution by downsampling. Section 7.6.5 describes the general post-filtering procedure. Section 7.6.6 contains detailed explanations of some frequently used algorithms, especially the special procedure multisample antialiasing. In the pre-filtering antialiasing method, low-pass filtering takes place during the generation of the fragments by rasterisation (see the left side in Fig. 7.31). If the original signal—for example, a line to be drawn—is described analytically or as a vector graphic, then the signal—from the point of view of signal processing—is present in an infinitely high resolution. Low-pass filtering as part of pre-filtering
5 In signal processing, the use of the term downsampling is inconsistent. Sometimes, low-pass filtering with subsequent reduction of the number of sampling points is meant and sometimes only the selection of (a smaller number of) sampling points from the image to be sampled is meant, without prior low-pass filtering. In this book, downsampling is understood as consisting of low-pass filtering followed by a decimation of the number of sampling points (samples).
260
7 Rasterisation
Fig. 7.31 Pre-filtering (left) and post-filtering (right) in the OpenGL graphics pipeline: The part of the graphics pipeline after vertex processing is shown. This representation is valid for both the fixed-function pipeline and the programmable pipeline
suppresses high spatial frequencies in order to reduce the resolution of the signal to the resolution of the frame buffer. All further steps of the graphics pipeline are performed with the (non-increased) resolution of the frame buffer. Section 7.6.3 contains more details of this method. Since in the pre-filtering method the signals can be analytically described and thus there is an infinitely high resolution, pre-filtering is also called area sampling. In contrast, post-filtering is also called point sampling to express that signals can also be processed that have already been sampled (discrete signals). The procedures for drawing lines, shown in Sect. 7.3, determine which fragments are to be placed at which points of the grid. Thus, only whether a fragment is present or not is considered. If the associated data are disregarded for a moment, then a fragment is coloured either black or white. In the general case, the lines thus generated will have a staircase shape or a frayed appearance. Furthermore, they will have sharp edges at the transitions between a set and an unset fragment, and thus infinitely high spatial frequencies. The scan line technique for drawing polygons described in Sect. 7.5.2 fills a polygon whose edge has already been drawn by lines. The algorithm according to Pineda (see Sect. 7.5.3) allows polygons to be filled without drawing an edge beforehand. However, the same effects occur at the edges of the polygons as when drawing lines. Thus, these methods for drawing polygons also result in staircase-like edges.
7.6 Aliasing Effect and Antialiasing
261
Fig. 7.32 Smoothed stair steps in lines drawn by an OpenGL renderer using post-filtering: On the right is a cropped magnification of the lines on the left
If geometric primitives for (large) points are understood as special polygons or special lines and drawn with the methods from Sects. 7.3 or 7.5, then it is obvious that even for rasterised points staircase-like edges occur. This staircase effect, described in more detail in Sect. 7.6.2, which is a major problem in computer graphics, can be reduced by antialiasing by using the grey levels (or colour gradations) of a fragment. The low-pass filtering of the antialiasing methods described below softens or blurs the hard edges of the stair steps, producing a smoother or softer rendering result. At the same time, this reduces high spatial frequencies in the scene. Figure 7.32 shows lines drawn by an OpenGL renderer using post-filtering for antialiasing. The smoothing effect of using greyscales is clearly visible, in contrast to the lines in Fig. 7.30, which were drawn without antialiasing measures. Comparing the two images, it is also noticeable that the smoothed lines are blurrier. This is very evident when comparing the magnifications on the right side of the figures. In typical computer graphics scenes, where a suitable camera position and resolution are used for rasterisation, this blurriness is not very disturbing, as can be seen on the left side of Fig. 7.32. The advantage of the smooth and less staircaselike lines prevails. Nevertheless, it can be stated in principle that due to the necessary low-pass filtering for antialiasing, a perceived smooth shape is exchanged for the sharpness of the image.
7.6.3 Pre-Filtering In the following, two approaches for pre-filtering are presented. In the unweighted area sampling, a line is considered as a long, narrow rectangle. Each fragment is assigned a square from the pixel grid. The intensity with which the fragment is
262
7 Rasterisation
Fig. 7.33 An example of a line drawn with unweighted area sampling
coloured is chosen in proportion to how much of the square is covered by the line segment. Figure 7.33 illustrates this concept. A major disadvantage of this method is that for each fragment, the intersection with the line must be determined and then its area must be computed. This calculation is very computationally intensive. A simple heuristic for approximating the proportion covered by the rectangle (the line) is to cover each square of the grid (fragment candidate) with a finer sample grid. Determining the proportion of samples inside the rectangle versus the total number of samples gives an approximation of the proportion of the square area covered by the line and thus an approximation of the intensity for the fragment. Figure 7.34 shows a line segment on a raster as an example. The finer sample raster consists of four samples per pixel. In OpenGL, the position of a raster element is indicated by its bottom left corner, which lies on integer coordinates of the raster. The centres of each raster element (indicated by a dot in the figure) are offset by the vector ( 21 , 21 ) from these integer coordinates. A raster element becomes a fragment by being identified as belonging to a geometric primitive through a rasterisation process. The fragment marked with the cross thus has the coordinate (3, 2) and an approximate value of the area covered by the line of 1. The fragment (1, 1) has the value 21 and the fragment (3, 3) has the value 43 . These values are called coverage values. This concept of using a coverage value for each fragment can also be applied to other geometric primitives. At this point, it is important to note a slightly different usage of the term fragment. In Fig. 7.34, a fragment refers to a generally non-square portion of a geometric primitive that covers a pixel (raster element). In OpenGL, a fragment is represented by a two-dimensional coordinate (the lower left corner), so that it can always be assigned to exactly one whole pixel of the pixel raster and it can only be square.6 Due to the finite resolution of the grid, the incomplete coverage of a pixel in OpenGL is taken into account by the change of the colour or grey value of the fragment by the coverage value.
6 According to the OpenGL specification, non-square pixels are also allowed. However, a fragment in the OpenGL specification is always assumed to be rectangular.
7.6 Aliasing Effect and Antialiasing
263
Fig. 7.34 Example representation of a line segment on a raster: Distinction between the terms fragment, pixel (raster element) and sample (based on [14, p. 1056])
Fig. 7.35 Schematic representation of weighted area sampling
264
7 Rasterisation
In the weighted area sampling approach, not only the covered area is calculated but also a weighting function w(x, y) is used. This achieves a better fit to the geometric primitive and an improvement of antialiasing. The weighting function w(x, y) has the highest value in the centre of the pixel on the pixel raster and decreases with increasing distance. A typical weighting function is shown on the right of Fig. 7.35. This function is defined for a circular area A around a pixel P, as shown on the left of the figure. The intensity of the fragment is calculated as follows: w(x, y)d x d y A P . A w(x, y)d x d y In this term, A P is the intersection of the circle with the line rectangle. Although the formula may seem complicated at first due to the integrals, the intensity of a fragment in this case depends only on its distance from the line. The intensity can therefore be written as a function I (d P ), where d P is the distance of the centre of the fragment P from the line. The number of intensity values that can be displayed is limited, on screens mostly by the value 256. For antialiasing, however, it is sufficient to use only a few intensity levels. Instead of the real function I : [0, ∞) → [0, 1], a discretised variant Iˆ : [0, ∞) → {0, ..., i max } is sufficient if the intensity levels 0, ..., i max are to be used. For this purpose, the grid must be traversed and the resulting value Iˆ must be determined for each fragment. This problem is obviously similar to the task of drawing a curve y = f (x) on a grid. Curve drawing also involves traversing a grid—in this case only along one axis—and calculating the rounded function value round(f(x)) in each step. The differences to antialiasing are that the raster is scanned not only along one coordinate axis but also in the neighbourhood of the fragment and that the calculated discrete value Iˆ is not the y-coordinate of a fragment to be set, but one of the finitely many discrete intensity values. Based on this analogy, efficient algorithms for antialiasing can be developed, which are based on the principle of the midpoint algorithm. These include Gupta–Sproull antialiasing [12] and the algorithm of Pitteway and Watkinson [19]. The algorithm of Wu [25] uses an improved distance measure (error measure) between the ideal line and possible fragments compared to the midpoint algorithm, which leads to a very fast algorithm for drawing lines while taking antialiasing into account. For a detailed description of these methods, please refer to the original literature and to [11,13]. Through the weighted integration present in formula (7.6.3) or—in the discrete variant—through the weighted summation, an averaging and thus the desired lowpass filtering effect is achieved. To optimise this low-pass filtering, other weighting functions can be used as an alternative to the function shown in Fig. 7.35.
7.6.4 Pre-Filtering in the OpenGL Figure 7.36 shows points (GL_POINTS), lines (GL_LINE_STRIP) and a triangle (GL_TRIANGLE_STRIP) drawn by an OpenGL renderer without antialiasing measures. The dot size is eight and the line width is three fragments. In the highly magnified parts of the figure, the staircase effect described in Sect. 7.6.1 is clearly
7.6 Aliasing Effect and Antialiasing
265
Fig. 7.36 Dots, lines and a triangle drawn without antialiasing by an OpenGL renderer: The lower part shows a cropped magnification of the upper part of the image
visible in the drawn lines and at the edges of the triangle. The dots are displayed as squares due to the square geometry of the pixels of the output device (an LCD screen in this case). This can also be interpreted as a staircase effect, since the target geometry of a point is a (small) circular disc. In OpenGL, antialiasing can be enabled using the pre-filtering method for drawing geometric primitives by a glEnable command, and it can be disabled by a glDisable command. Column target of Table 7.4 shows available arguments for these commands. For example, antialiasing is enabled by pre-filtering with the command glEnable(gl.GL_LINE_SMOOTH) for drawing lines. This affects the drawing of contiguous lines (GL_LINE_STRIP, GL_LINE_LOOP) and separate lines (GL_LINES). Antialiasing by pre-filtering for points is only available in the compatibility profile, while pre-filtering for the other primitives can also be activated in the core profile.
Table 7.4 Arguments (see column target) of the glEnable command for activating antialiasing by pre-filtering for geometric primitives in the compatibility and core profile Geometric primitive
Target
Availability
Point
GL_POINT_SMOOTH
Compatibility profile
Line
GL_LINE_SMOOTH
Compatibility and core profile
Polygon
GL_POLYGON_SMOOTH
Compatibility and core profile
266
7 Rasterisation
In OpenGL, colours can be represented by additive mixing of the three primary colours red, green and blue, using the RGB colour model (see Sect. 6.2). Using the fourth alpha component transparency and translucency can be taken into account (see Sect. 6.3). Thus, the colour of a fragment including a transparency value can be expressed as a quadruple of the form (R f , G f , B f , A f ) with R f , G f , B f , A f ∈ [0, 1] which is referred to as RGBA colour value. In the core profile and in JOGL, this is the only way to represent colours. For antialiasing using pre-filtering, the coverage value ρ is determined, which indicates the proportion of a pixel that is covered by a fragment of a geometric primitive (see Sect. 7.6.3 and Fig. 7.34). This coverage value is multiplied by the alpha value of the colour value of the fragment. If the colour value including the alpha value of the fragment is (R f , G f , B f , A f ), then after taking ρ into account, the quadruple (Rs , G s , Bs , As ) = (R f , G f , B f , ρ · A f ) is obtained. The modified colour value of this fragment is then mixed with the existing colour value (Rd , G d , Bd , Ad ) of the pixel at the respective position in the frame buffer. To enable this operation, blending must be activated and the factors for the blending function must be specified. A commonly used blending function can be activated and set by the following JOGL commands: // enable blending gl.glEnable(GL.GL_BLEND); // set blend factors gl.glBlendFunc(GL.GL_SRC_ALPHA, GL.GL_ONE_MINUS_SRC_ALPHA);
This defines the following mixing function: Cd = As · Cs + (1 − As ) · Cd .
(7.25)
Cs in this formula represents each component of the quadruple of the source (Rs , G s , Bs , As ), i.e., the colour values of the fragment modified by ρ. Cd represents each component of the destination (Rd , G d , Bd , Ad ), i.e., the colour values of the pixel to be modified in the frame buffer. This is the classic and frequently used blending function, which is a simple convex combination (see Sect. 5.1.4). Figure 7.37 shows the output result of an OpenGL renderer in which the respective commands GL_POINT_SMOOTH, GL_LINE_SMOOTH, GL_POLYGON_SMOOTH and antialiasing with the blend function (7.25) were used. The same primitives with the same parameters as in Fig. 7.36 were drawn. Comparing the images with and without antialiasing, the smoother rendering with antialiasing is noticeable, significantly improving the quality of the rendering. The magnifications show a round shape of the dots. At the edges of the objects, the use of graduated grey values is noticeable, resulting in a smoothing of the staircase-shaped edges. Furthermore, it can be seen that antialiasing has created blurs that are not very disturbing in the non-magnified illustration (see Sect. 7.6.2). The magnifications of the rendering results in this and the following figures are only shown to illustrate the antialiasing effect. These effects
7.6 Aliasing Effect and Antialiasing
267
Fig.7.37 Dots, lines and a triangle drawn with pre-filtering for antialiasing by an OpenGL renderer: The lower part shows a cropped magnification of the upper part of the image
Fig. 7.38 Rendering of a polygon of two triangles drawn by an OpenGL renderer using pre-filtering for antialiasing: The right side shows a cropped magnification of the left part of the figure. The upper part of the image was created with the blend function (7.25). For the lower part of the image, the blend function (7.26) was used
268
7 Rasterisation
are usually not or only hardly visible in an output of an OpenGL renderer without special measures. As explained in Chap. 4, in order to represent a three-dimensional object in computer graphics very often only its surface is modelled, which usually consists of a sequence of connected planar triangles. The left part of Fig. 7.38 shows a white planar surface against a black background composed of two adjacent triangles. The surface was drawn by a TRIANGLE_STRIP. For antialiasing with POLYGON_SMOOTH in the upper part of the image, the blending function (7.25) was used. The edge between the two triangles is clearly visible, which is normally undesirable for a planar surface. This effect is due to the blending function used. Assume that a fragment at the edge between the triangles has a coverage value of ρ = 0.5. Then the colour value of this white fragment is (1.0, 1.0, 1.0, 0.5). Before drawing, let the background in the frame buffer be black (0, 0, 0, 1.0). While drawing the first triangle, the formula (7.25) is applied and creates the RGB colour components Cd = 0.5 · 1.0 + (1 − 0.5) · 0 = 0.5 for the respective edge pixel in the frame buffer. If the second triangle is drawn, such an edge fragment again meets an edge pixel already in the frame buffer and by the same formula the RGB colour components Cd = 0.5 · 1.0 + (1 − 0.5) · 0.5 = 0.75 are calculated. Since the resulting RGB colour values are not all equal to one and thus do not produce the colour white, the undesired grey line between the triangles is visible. In this case, the grey line separating the two triangles can be suppressed by choosing the following blend function: C d = As · C s + C d .
(7.26)
The following JOGL commands activate blending and define the blend function (7.26): // enable blending gl.glEnable(GL.GL_BLEND); // set blend factors gl.glBlendFunc(GL.GL_SRC_ALPHA, GL.GL_ONE);
Using the blend function (7.26), the colour components of the overlapping edge fragments of the triangles add up exactly to the value one. This results in white fragments also at the transition of the two triangles and thus the grey line disappears. The rendering result for the planar surface with these settings is illustrated in the lower part of Fig. 7.38. Unfortunately, choosing the blending function does not solve all the problems of drawing polygons with antialiasing by pre-filtering. Figure 7.39 shows two coloured overlapping polygons, each consisting of two triangles, drawn by an OpenGL renderer using the primitive TRIANGLE_STRIP. For antialiasing, POLYGON_SMOOTH with the blend function (7.26) was used. In the polygons, edges
7.6 Aliasing Effect and Antialiasing
269
Fig. 7.39 Illustration of two coloured polygons, each consisting of two triangles, drawn by an OpenGL renderer using pre-filtering for antialiasing: The middle part of the figure shows a cropped magnification of one polygon. The right part of the figure contains the cropped magnification of a polygon with contrast enhancement
between the triangles are again visible. For clarity, in the magnification on the right side of the figure, a very strong contrast enhancement has been applied.7 Furthermore, the blending function results in an undesired colour mixture in the overlapping area of the polygons. This colour mixing can be avoided by using the depth buffer test, but in this case the edges of the triangles become visible as black lines. An algorithm for the depth buffer test (depth buffer algorithm) can be found in Sect. 8.2.5. Section 2.5.6 shows the integration into the OpenGL graphics pipelines. Detailed explanations of the depth buffer test in OpenGL are available in [22, pp. 376–380]. The problems presented in this section can be avoided by using post-filtering methods for antialiasing, which are presented in the following sections. Section 7.6.8 contains corresponding OpenGL examples.
7.6.5 Post-Filtering As explained in the introduction to Sect. 7.6, computer-generated graphics usually contain ideal edges that have infinitely high spatial frequencies. This means that in principle the sampling theorem cannot hold and aliasing effects occur. Although such graphics are not band-limited, the resulting disturbances can nevertheless be reduced by antialiasing measures. In post-filtering, an increase of the sampling rate by upsampling takes place during rasterisation (see Sect. 7.6.2 and Fig. 7.31). This uses K samples per pixel, resulting in an overall finer grid for the rasterisation of the scene.
7 The soft blending of the edges has been lost in the magnification on the right side of the figure due
to the contrast enhancement. This desired antialiasing effect is visible in the magnification without contrast enhancement (in the middle of the figure).
270
7 Rasterisation
Fig. 7.40 Examples of sample positions (small rectangles) within a pixel in a regular grid-like arrangement
Figure 7.40 shows some examples of regular, grid-like arrangements of samples within a pixel. The small squares represent the sample positions within a pixel. The round dot represents the centre of a pixel. This position is only sampled in the middle example. If shading—that is, the execution of all steps of the graphics pipeline from rasterisation to reduction (see the right part of Fig. 7.31)—is applied for each of these samples, then this process is called supersampling antialiasing (SSAA). The advantage of this method lies in the processing of all data, including the geometry data, the materials of the objects and the lighting through the entire graphics pipeline with the increased sampling resolution. This effectively counteracts aliasing effects from different sources. Another advantage is the ease of implementation of this method, without having to take into account the properties of individual geometric primitives. One disadvantage of supersampling antialiasing is the computational effort, which is in principle K times as large as the computation without a sampling rate increase. However, this procedure can be parallelised well and thus executed well by parallel hardware. Since the calculation of the entire shading for each sample within a pixel is not necessary in normal cases, efficient algorithms have been developed whose approaches are presented in Sect. 7.6.6. If a shading value (colour value) was determined for more than one sample per pixel, then the rendered image with the increased sampling rate must be reduced to the resolution of the frame buffer by downsampling so that the rendering result can be written to the frame buffer for output to the screen (see Fig. 7.31). If a pixel is represented by K samples at the end of the processing in the graphics pipeline, then in the simplest case exactly one sample per pixel can be selected. However, this would make the processing in the graphics pipeline for the other (K − 1) samples per pixel redundant. This variant is also detrimental from the point of view of signal and image processing, since potentially high spatial frequencies are not reduced and aliasing effects can arise from this simple form of downsampling. As explained in the introduction of Sect. 7.6.2, low-pass filtering is required to reduce this problem. The simplest form of low-pass filtering is the use of an averaging filter (boxcar filter). This can be realised by taking the arithmetic mean or arithmetic average (per colour component) of the colour values of all samples per pixel and using it as the colour value for the downsampled pixel. This averaging can
7.6 Aliasing Effect and Antialiasing
271
Fig. 7.41 Examples of low-pass filter kernels of different sizes: The upper part shows kernels of averaging filters (boxcar filters). The lower part shows kernels of binomial filters
be represented graphically as a convolution matrix, also called filter kernel or kernel. In the top row of Fig. 7.41, averaging filter kernels of different sizes are shown. The value of each cell of the filter kernel is multiplied by the colour value of a sample at the corresponding position within the pixel. Subsequently, these partial results are added, and the total result is divided by the number of cells of the filter kernel. The last operation is shown as factors left to the filter matrices in the figure. The application of these factors is necessary to avoid an undesired increase in contrast to the resulting image.8 This simple averaging can cause visible disturbances (called artefacts in image processing) (see, for example, [8, p. 98]). Therefore, it makes sense to use filter cores with coefficients that slowly drop off at the edges of the filter kernel. Well-suited filter kernels are binomial filters, which can be derived from the binomial coefficients. For high filter orders (for large dimensions of the kernel), this filter approximates the so-called Gaussian filter, which has the characteristic symmetric “bell-shape” curve. In Fig. 7.41, filter kernels of the binomial filter of different sizes are shown in the bottom row. Also when using these filter kernels, the value of a cell of the filter kernel is multiplied by the colour value of a sample at the corresponding position in the pixel. The partial results are summed and then multiplied by the factor left to the
8 The explained filter operation corresponds to the illustrative description of the linear convolution according to Eq. (7.27) (see below). Strictly speaking, the filter kernel (or the image) must be mirrored in the horizontal and vertical directions before executing the explained operation, so that this operation corresponds to the formula. Since the filter kernels are symmetrical in both dimensions, this is not necessary in this case.
272
7 Rasterisation
filter kernels in the figure to obtain a filter weight of one.9 The value resulting from this calculation represents the result of subsampling for one pixel. In the downsampling approach described so far, the number K of samples per pixel (see Fig. 7.40) must exactly match the dimensions of the low-pass filter kernel (see Fig. 7.41) in both dimensions. In principle, downsampling including low-pass filtering can also include samples of surrounding pixels by choosing larger filter kernels, with a number of elements larger than K . This results in a smoothing effect that goes beyond pixel boundaries. However, this results in greater image blur. Furthermore, in the downsampling procedure described so far, it has been assumed that the downsampling is performed in one step and per pixel. Formally, this operation of reduction by downsampling can be broken down into two steps: (1) low-pass filtering and (2) selection of one sample per pixel (decimation). In general, the first step represents the linear filtering of an image I (m, n); m = 0, 1, ..., M; n = 0, 1, ..., N with a filter kernel H (u, v); u = 0, 1, ..., U ; v = 0, 1, ..., V . Let I (m, n) be the intensity function of a greyscale image or one colour channel in a colour image. M and N are the dimensions of the image and U and V are the dimensions of the filter kernel in horizontal and vertical directions, respectively. For example, for low-pass filtering, a filter kernel H (u, v) from Fig. 7.41 can be used. The linear filter operation can be described as a two-dimensional discrete convolution such that the filtered image I (m, n) is given by the following formula: I (m, n) =
∞
∞
H (i, j)I (m − i, n − j).
(7.27)
i=−∞ j=−∞
In this operation, it should be noted that the filter kernel H (u, v) used should have odd dimensions in both dimensions so that a unique value for the sample in the middle can be determined. The dimension of the filter kernel in the horizontal or vertical direction is the finite number of non-zero elements in the central region of the filter kernel (see Fig. 7.41 for examples). Such a kernel can be extended with zeros in both dimensions in the negative and positive directions, which has no effect on the result of Eq. (7.27). Since the resulting image I (m, n) has the same dimension and the same resolution as the original image I (m, n), the second step is to select a sample for each pixel (decimation) according to the downsampling factor (in this case K ) to complete downsampling.
7.6.6 Post-Filtering Algorithms In Sect. 7.6.5, supersampling antialiasing (SSAA) was introduced as an easy-toimplement but computationally expensive procedure for reducing aliasing effects as a post-filtering method. Some alternative algorithms are outlined below. If the graphics hardware does not support supersampling antialiasing, the calculations for the individual K sample positions can also be carried out one after the other
9 See
information in previous footnote.
7.6 Aliasing Effect and Antialiasing
273
and the (intermediate) results stored in a so-called accumulation buffer. At the beginning of this calculation, the buffer is empty. Subsequently, rasterisation and shading of the whole image are performed successively at the K different sample positions. These different sample positions can also be understood as different viewpoints on the scene. The intermediate results for a sample position are each multiplied by a corresponding weighting for the low-pass filtering and added to the content of the accumulation buffer. At the end of the calculation, the low-pass filtering factor is multiplied to avoid a contrast increase in the result image. In the case of an averaging filter, this is a division by the number of sample positions, and in the case of a binomial filter, a factor that can be taken from Fig. 7.41, for example. In order to record the intermediate results without loss, the accumulation buffer usually has a higher resolution for the colour values. The described calculation steps for this method are mathematically equivalent to supersampling antialiasing (see Sect. 7.6.5) including low-pass filtering. This method does not require any memory besides the accumulation buffer, which has the same resolution as the frame buffer. In particular, no frame buffer with an increased resolution by the factor K needs to be provided. The disadvantage, however, is that the possible frame rate decreases or the hardware speed requirements increase due to the temporally sequential computations. Ultimately, the same computations are carried out as with supersampling antialiasing, which still leads to an increased effort by a factor of K compared to a computation without a sampling rate increase. With both supersampling antialiasing and the use of an accumulation buffer, shading must be performed K times more often (with K as the number of samples per pixel) than without sampling rate increase. According to Fig. 7.31, shading in OpenGL includes early-per-fragment operations, fragment processing and perfragment operations. In the case of a programmable graphics pipeline, fragment processing takes place in the fragment shader, so this shader is called K times more often. However, observations of typical computer graphics scenes show that shading results vary more slowly within a geometric primitive than between geometric primitives due to materials of the objects and lighting. If these primitives are large enough relative to the grid so that they often completely cover pixels, then often all samples within a pixel will have the same (colour) values when these values arise from a fragment of one geometric primitive. Thus, shading with a fully increased resolution is rarely necessary. The methods presented below avoid this overshading by performing fewer shading passes per pixel. The A-buffer algorithm, which is described for example in [9] and [14, p. 1057], separates the calculation of coverage and shading. In this algorithm, the coverage is calculated with the increased resolution and the shading without increasing the resolution (i.e., it remains at the resolution of the frame buffer). Furthermore, the depth calculation (see Sect. 8.2.5) is integrated with this algorithm to take into account the mutual occlusion of fragments. This occlusion calculation is also performed for each sample and thus with the increased resolution. If at least one sample of a pixel is covered by a fragment, then this fragment is added to a list for the respective pixel. If this fragment covers another fragment already in the list, then the existing fragment
274
7 Rasterisation
is removed from the list for this pixel. After all fragments have been rasterised, the shading calculation is performed for all fragments that remain in the lists. Only one shading pass is performed per fragment and the result is assigned to all samples that are covered by the respective fragment. Subsequently, the reduction to the resolution of the frame buffer takes place as described in Sect. 7.6.5. The coverage calculation causes a known overhead. In contrast, the general computational overhead for the shading results (in programmable graphics pipelines) is not predictable for all cases due to the free programmability of the fragment shader. In general, the A-buffer algorithm is more efficient than supersampling antialiasing. Furthermore, this algorithm is particularly well suited to take into account transparent and translucent fragments. For an explanation of transparent objects, see Sect. 9.7. The main disadvantage is the management of the lists of fragments per pixel and the associated need for dynamic memory space. Since the complexity of the scenes to be rendered is not known in advance, this memory requirement is theoretically unlimited. Furthermore, this algorithm requires integration with the rasterisation algorithm and the coverage calculation. For this reason, the A-buffer algorithm is preferably used for the pre-computation of scenes (offline rendering) and less for (interactive) real-time applications. Another algorithm for post-filtering is multisample antialiasing (MSAA). This term is not used consistently in computer graphics literature. Usually, multisample antialiasing refers to all methods that use more samples than pixels—especially for coverage calculation—but which, in contrast to supersampling antialiasing, do not perform a shading computation for each sample. Thus, the A-buffer algorithm and coverage sample antialiasing (CSAA) (see below) belong to this group of methods. According to this understanding, supersampling antialiasing does not belong to this group. In the OpenGL specifications [20,21], multisample antialiasing is specified with an extended meaning, since this method can be extended to supersampling antialiasing by parameter settings (see Sect. 7.6.8). In the following, multisample antialiasing (MSAA) is understood in a narrower sense and is presented as a special procedure as described in [14, pp. 1057–1058]. This procedure works similar to the A-buffer algorithm, but no lists of fragments are managed for the individual pixels. A depth test (see Sect. 8.2.5) is performed. The coverage calculation takes place with the increased resolution (for each sample), which is directly followed by the shading computation per fragment. First, a coverage mask is calculated for each fragment, indicating by its binary values which samples within that pixel are covered by the fragment. Figure 7.42 shows an example of such a binary coverage mask for a pixel containing a regular grid-like sample pattern. The covered samples marked on the left side are entered as ones in the coverage mask on the right side. A zero in the mask indicates that the sample is not covered by the primitive. Meaningful operations between these masks are possible through Boolean operations. For example, from two coverage masks derived from different fragments, a coverage mask can be determined by an element-wise exclusive or operation (XOR), which indicates the coverage of the samples of the pixel by both fragments together. In most implementations of multisample antialiasing, the coverage mask is obtained by rasterisation and a depth test with an increased resolution (for K sam-
7.6 Aliasing Effect and Antialiasing
275
0
0
0
0
1
0
0
0
1
1
0
0
1
1
1
0
Fig. 7.42 A triangle primitive covers samples (small rectangles) within a pixel (left side). The small filled circle represents the centre of the pixel. The right side of the figure shows the corresponding binary coverage mask
ples). If at least one sample is visible due to the depth test, a shading value is calculated for the entire pixel. This calculation takes place with the non-increased resolution (with the resolution of the frame buffer). The position within the pixel used for shading depends on the specific OpenGL implementation. For example, the sample closest to the centre of the pixel can be used, or the first visible pixel can be used. It does not matter whether this sample is visible or covered by a fragment. If supported by the OpenGL implementation, the position of the sample used for shading can be moved so that it is guaranteed to be covered by the fragment. This procedure is called centroid sampling or centroid interpolation and should only be used deliberately, as such a shift can cause errors in the (later) calculation of gradients of the rasterised image. Such gradients are used for example for mipmaps (see Sect. 10.1.1). The resulting (single) shading value is used to approximate the shading values of all samples of the pixel. After all shading values for all samples of all pixels have been determined, the reduction to the resolution of the frame buffer takes place as described in Sect. 7.6.5. Since only one shading calculation takes place per pixel, this method is more efficient than supersampling antialiasing. However, this advantage also leads to the disadvantage that the variations of the shading function within a pixel are not taken into account and thus the approximation of a suitable shading value for a pixel is left to the shader program. Essentially, this method reduces disturbances that occur at geometric edges. This approximation is usually useful for flat, matt surfaces, but can lead to visible disturbances on uneven surfaces with strong specular reflections. With multisample antialiasing, shading only needs to be calculated once per pixel, but the resulting shading values are stored per sample, resulting in the same memory requirements as for supersampling antialiasing. Furthermore, multisample antialiasing works well for geometric primitives that are significantly larger than one pixel and when the number of samples per pixel is large. Otherwise, the savings from the per-pixel shading computation are small compared to supersampling antialiasing. Another disadvantage of multisample antialiasing is the difficulty of integration with shading calculations in deferred shading algorithms. Section 9.2 provides explanations of these algorithms.
276
7 Rasterisation
Furthermore, since the multisample antialiasing method described above has not only efficiency advantages but also a number of disadvantages, there are more advanced methods that aim to maximise the advantages of the multisample approach while minimising its disadvantages. For reasons of space, the coverage sample antialiasing (CSAA) is mentioned here only as one such representative where the resolutions (number of samples used per pixel) for the coverage, depth and shading calculations can be different. A description of this algorithm can be found, for example, in [14, pp. 1058–1059]. In contrast to pre-filtering (see Sect. 7.6.3), the post-filtering methods are not context-dependent and have a global effect. In particular, the post-filtering methods are not dependent on specific geometric primitives. The basic strategy for reducing aliasing effects is the same for all image areas within a method. This is usually an advantage. However, it can also be a disadvantage, for example, if the scene contains only a few objects and the entire method is applied to empty areas. It must be added that some methods, such as the A-buffer algorithm, depend on the number of fragments and thus the computational effort is reduced for areas of the scene with few objects. Furthermore, when using regular grid-like arrangements of samples with a fixed number of pixels, very small objects may fall through the fine sample grid and may not be detected. This problem can be reduced by using an irregular or random sample arrangement (see Sect. 7.6.7). Since most of the presented methods for post-filtering have advantages and disadvantages, the appropriate combination of methods and parameter values, as far as they are selectable or changeable, has to be chosen for a specific (real-time) application.
7.6.7 Sample Arrangements for Post-Filtering In contrast to regular arrangements of samples on a grid, as shown in Fig. 7.40, alternative arrangements of samples within a pixel are possible. This can further reduce aliasing effects and improve the quality of the rendering result. It is important to ensure that the samples cover as many different locations within the pixel as possible in order to represent the fragment to be sampled as well as possible. Figure 7.43 shows some alternative sample arrangements. At the top left of the figure is an arrangement that uses only two samples per pixel, with the samples still arranged in a regular grid. The slightly rotated arrangement of the samples in the Rotated Grid Super Sampling (RGSS) pattern in the centre top of the figure improves antialiasing for horizontal or vertical lines or nearly horizontal or nearly vertical lines, which is often necessary. By moving the samples to the edge of the pixel in the FLIPQUAD pattern at the top right of the figure, the sampling points for several neighbouring pixels can be shared. For this, the pattern of the surrounding pixels must be mirrored accordingly. The computational effort of using this pattern is reduced to an average of two samples per pixel. At the bottom left of the figure is a regular pattern with eight samples arranged like on a chessboard. In addition to these examples of deterministic sample arrangements, stochastic approaches use a random component which arranges the samples differently within
7.6 Aliasing Effect and Antialiasing
277
Fig. 7.43 Examples of sample positions (small rectangles) within a pixel: Top left: grid-like; top centre: Rotated Grid Super Sampling (RGSS); top right: FLIPQUAD; bottom left: chessboard-like (checker); bottom right: arrangement in a grid with random elements (jitter)
each pixel. As with deterministic approaches, care must be taken to avoid clustering of samples in certain areas of the pixel in order to represent well the fragment to be sampled. This can further reduce aliasing effects. Furthermore, very small objects in a scene that would fall through a regular grid can be better captured. A random component introduces noise into the image, but this is usually perceived by a viewer as less disturbing than regular disturbing patterns due to aliasing. The bottom right of Fig. 7.43 shows an example of such a sample arrangement with nine samples for one pixel, created by so-called jittering (or jittered or jitter). In this approach, the pixel to be sampled by K samples is evenly divided into K areas. In the example in the figure, K = 9. Within each of these areas, a sample is placed at a random position. The sample pattern looks different for each pixel within the image to be sampled, but the random variation is limited. This allows for image enhancement in many cases with relatively little extra effort. Further patterns for the arrangement of samples within a pixel can be found in [2, p. 140 and p. 144]. In general, a higher number of samples per pixel not only leads to better image quality but also increases the computing effort and memory requirements. Typical values for the number of samples K per pixel are in the range of two to 16 for current graphics hardware.
278
7 Rasterisation
7.6.8 Post-Filtering in the OpenGL The accumulation buffer algorithm (see Sect. 7.6.6) is only available in the OpenGL compatibility profile. This algorithm can be used, for example, for the representation of motion blur or the targeted use of the depth of field for certain areas of the scene. The frequently used post-filtering method specified for both OpenGL profiles is multisample antialiasing (MSAA). As explained in Sect. 7.6.6, according to the OpenGL specifications [20,21], this method includes supersampling antialiasing. These specifications do not define which algorithms, which sample arrangement within a pixel or which type of low-pass filtering must be used. The definition of these variations is left to the OpenGL implementation, which means that the implementation of the rasterisation procedure can be different for different graphics processors and OpenGL graphics drivers. This means that rendering results at the pixel level may differ depending on which hardware and software is used. The rendering results usually differ only in detail. Multisample antialiasing is very closely linked to the frame buffer and the output on the screen. Therefore, the initialisation on the software side usually must take place when creating the window for the output of the rendering result. Figure 7.44 shows how the initialisation of multisample antialiasing can look in the JOGL source code for setting up an output window. The following two commands activate the buffer needed to hold the additional data for the samples per pixel and set the number K of samples (in this case K = 8) per pixel: glCapabilities.setSampleBuffers(true); glCapabilities.setNumSamples(8);
2 4 6 8 10 12 14
// Using the JOGL-Profile GL3 // GL3: Core profile, OpenGL Versions 3.1 to 3.3 GLProfile glProfile = GLProfile.get(GLProfile.GL3); GLCapabilities glCapabilities = new GLCapabilities(glProfile); // Enabling of multisample antialiasing glCapabilities.setSampleBuffers(true); glCapabilities.setNumSamples(8); // Create the OpenGL Window for rendering content GLWindow glWindow = GLWindow.create(glCapabilities); glWindow.setSize(WINDOW_WIDTH, WINDOW_HEIGHT); // Make the window visible glWindow.setVisible(true);
Fig. 7.44 Part of the source code of the JOGL method (Java) for generating the output window: Included is the initialisation of multisample antialiasing for K = 8 samples per pixel
7.6 Aliasing Effect and Antialiasing
279
If multisample antialiasing is initialised and thus the corresponding buffers are prepared, it is activated by default. It can be deactivated and reactivated by the following commands: gl.glDisable(GL.GL_MULTISAMPLE); gl.glEnable(GL.GL_MULTISAMPLE);
If multisample antialiasing is activated, then antialiasing using pre-filtering (see Sect. 7.6.4) is automatically deactivated, so that both methods can never be active at the same time. The commands for antialiasing using pre-filtering are ignored in this case. As explained above, the exact arrangement of the samples within a pixel cannot be specified via the OpenGL interface. However, the following source code lines in the JOGL renderer can be used to specify the minimum fraction of samples to be computed separately by the fragment shader (proportion of separately shaded samples per pixel): gl.glEnable(gl.GL_SAMPLE_SHADING); gl.glMinSampleShading(1.0f);
The floating point value of this fraction must lie in the interval [0, 1]. How this value affects the shading of the individual samples within a pixel and which samples exactly are calculated by the fragment shader is left to the respective OpenGL implementation. If the value 1.0f is used as an argument, as in the example above, then shading is to be performed for all samples. In this case, supersampling antialiasing is applied. Figure 7.45 shows dots, lines and a triangle created by an OpenGL renderer using multisample antialiasing. Eight samples per pixel were used and sample shading was activated for all samples. The drawn objects are the same objects shown in Fig. 7.37 as an example rendering results using pre-filtering. The smoothing by the multisample antialiasing is clearly visible at the edges of all objects. It is noticeable that, in contrast to the pre-filtering example (see Fig. 7.37), the dot is rectangular and does not have a circular shape. As explained earlier, multisample antialiasing is largely independent of the content of the scene, making it impossible to adapt the antialiasing algorithm to specific geometric primitives. For most applications, this rectangular representation of the dot will be sufficient, especially for small dots. However, if round points are needed, a triangle fan (TRIANGLE_FAN) can be used to draw a circular disc to which antialiasing measures can be applied. The upper part of Fig. 7.46 shows the output of an OpenGL renderer using multisample antialiasing for a white planar surface against a black background. The same object as in Fig. 7.38 was drawn. The lower part of the figure represents the output of an OpenGL renderer using multisample antialiasing for the two coloured overlapping polygons shown in Fig. 7.39 using pre-filtering. Eight samples per pixel were used and sample shading for all samples per pixel using glMinSampleShading(1.0f) was activated. In contrast to the output using pre-filtering, no edges are visible
280
7 Rasterisation
Fig. 7.45 Dots, lines and a triangle drawn with multisample antialiasing by an OpenGL renderer: Eight samples per pixel and sample shading with glMinSampleShading(1.0f) were set. The lower part shows a cropped magnification of the upper part of the figure
between the triangles that make up the surfaces.10 Furthermore, there is no unwanted mixing of colours in the overlapping area of the coloured polygons, as can be seen in Fig. 7.39. The use of a (special) blending function as for pre-filtering is not necessary. At the edges of the surfaces, the effect of the low-pass filtering is clearly visible, whereby the quality of the display is significantly improved compared to the rendering result without antialiasing. This is noticeable in the non-magnified objects. The magnifications only serve to illustrate the antialiasing effect, which is usually not or only hardly visible in the output of a renderer without special measures. In Fig. 7.47, the same objects as in Fig. 7.46 are shown against a white background. The colour of the white polygon was changed to black. The same parameters were used for the rendering as for the example in Fig. 7.46. In this case, too, an error-free
10 The soft blending of the edges has been lost in the magnification on the right side of the figure due
to the contrast enhancement. However, the desired antialiasing effect is visible in the magnification without contrast enhancement (in the middle of the figure).
7.6 Aliasing Effect and Antialiasing
281
Fig. 7.46 Polygons drawn with multisample antialiasing by an OpenGL renderer against a black background: Eight samples per pixel and sample shading with glMinSampleShading(1.0f) were set. The top right and the bottom centre of the figure show cropped magnifications of the left parts of the figure. The bottom right shows a cropped magnification with contrast enhancement of the left part of the figure
rendering with functioning and quality-enhancing antialiasing is evident (see left part of the figure).11 As explained in Sect. 7.6.6, a crucial efficiency advantage of multisample antialiasing is that shading only needs to be performed for one sample of a pixel and thus only once per pixel. To achieve this, sample shading can be disabled, for example, by the command glDisable(gl.GL_SAMPLE_SHADING). The position of the sample within a pixel at which the shading calculation takes place is not specified by the OpenGL specification and thus depends on the concrete OpenGL implementation used. For this reason, shading may be performed for a pattern that lies outside of the fragment that (partially) covers the pixel. Normally, this situation is not problematic. For high-quality rendering results, centroid sampling or centroid interpolation can be used, which guarantees the sample to be shaded to be covered by the fragment that (partially) covers the pixel. In OpenGL, this is achieved by the
11 The
sharp cut edges in the enlarged renderings are due to manual cropping of the fully rendered and magnified object. These cropped magnifications were created for the presentation in this book and therefore do not represent a loss of quality due to antialiasing measures.
282
7 Rasterisation
Fig. 7.47 Polygons drawn with multisample antialiasing by an OpenGL renderer against a white background: Eight samples per pixel and sample shading with glMinSampleShading(1.0f) were set. The top right and the bottom centre of the figure show cropped magnifications of the left parts of the figure. The bottom right shows a cropped magnification with contrast enhancement of the left part of the figure
keyword centroid before an input variable of the fragment shader. According to the OpenGL specification, this keyword is an auxiliary storage qualifier. For this purpose, the following GLSL example source code line of the fragment shader can be used: centroid in vec4 vColor;
The examples used in this section were created without a depth test. However, the use of the depth test together with multisample antialiasing is possible without any problems. Section 8.2.5 presents an algorithm for the depth buffer test (depth buffer algorithm). The integration into the OpenGL graphics pipelines is shown in Sect. 2.5.6. Detailed explanations of the depth buffer test in OpenGL can be found in [22, pp. 376–380]. A comparison of pre-filtering (Sect. 7.6.4) and post-filtering methods specified in the OpenGL shows that for most applications multisample antialiasing is the recommended choice.
7.7 Exercises
283
7.7 Exercises Exercise 7.1 Explain the difference between vector and raster graphics. What are the advantages and disadvantages of a vector graphic and a raster graphic for use in computer graphics? Exercise 7.2 Derive the midpoint algorithm for drawing a line with slopes between −1 and 0. Exercise 7.3 Given a line from point (1, 1) to point (7, 6), apply the midpoint algorithm to this line. Exercise 7.4 Let the starting point (2, 3) and the endpoint (64, 39) of the line be given. (a) Apply the structural algorithm of Brons to draw this line. Give the complete drawing sequence. (b) Plot a sequence of the repeating pattern in a coordinate system. In the same coordinate system, plot the drawing sequences for each iteration of the algorithm (for this sequence of the pattern). Exercise 7.5 Use the structural algorithm from Sect. 7.3.3 to draw the line from Fig. 7.9. √ Exercise 7.6 A part of the graph of the function y = −a x + b (a, b ∈ IN+ ) is to be drawn using the midpoint algorithm. (a) For which x-values is the slope between −1 and 0? (b) Write the equation of the function in suitable implicit form F(x, y) = 0. Use d = F(x, y) as the decision variable to develop the midpoint algorithm for this function. How must d be changed depending on whether the eastern (E) or the south-eastern (S E) point is drawn by the midpoint algorithm? (c) How must theinitial value d init be chosen for d when the first point to be drawn is (x0 , y0 ) = a 2 , −a 2 + b ? (d) How can the fractional rational values in the decision variable—at the initial value and each recalculation—be avoided?
284
7
Exercise 7.7 Hatch the inside of the polygon shown below according to the odd parity rule (even–odd rule).
References 1. J. R. Van Aken. “An Efficient Ellipse-Drawing Algorithm”. In: IEEE Computer Graphics and Applications 4.9 (1984), pp. 24–35. 2. T. Akenine-Möller, E. Haines, N. Hoffman, A. Pesce, M. Iwanicki and S. Hillaire. Real-Time Rendering. 4th edition. Boca Raton, FL: CRC Press, 2018. 3. W. J. Bouknight. “A procedure for generation of three-dimensional half-toned computer graphics presentations”. In: Commun. ACM 13.9 (Sept. 1970), pp. 527–536. 4. J. E. Bresenham. “A Linear Algorithm for Incremental Digital Display of Circular Arcs”. In: Communications of the ACM 20.2 (1977), pp. 100–106. 5. J. E. Bresenham. “Algorithm for Computer Control of a Digital Plotter”. In: IBM Systems Journal 4.1 (1965), pp. 25–30. 6. R. Brons. “Linguistic Methods for the Description of a Straight Line on a Grid”. In: Computer Graphics and Image Processing 3.1 (1974), pp. 48–62. 7. R. Brons. “Theoretical and Linguistic Methods for the Describing Straight Lines”. In: Fundamental Algorithms for Computer Graphics. Ed. by Earnshaw R.A. Berlin, Heidelberg: Springer, 1985, pp. 19–57. URL: https://doi.org/10.1007/978-3-642-84574-1_1. 8. W. Burger and M. J. Burge. Digital Image Processing: An Algorithmic Introduction Using Java. 2nd edition. London: Springer, 2016. 9. L. Carpenter. “The A-Buffer, an Antialiased Hidden Surface Method”. In: SIGGRAPH Comput. Graph. 18.3 (1984), pp. 103–108. 10. F. Dévai. “Scan-Line Methods for Parallel Rendering”. In: High Performance Computing for Computer Graphics and Visualisation. Ed. By M. Chen, P. Townsend and J. A. Vince. London: Springer, 1996, pp. 88–98. 11. J. D. Foley, A. van Dam, S. K. Feiner and J. F. Hughes. Computer Graphics: Principles and Practice. 2nd edition. Boston: Addison-Wesley, 1996. 12. S. Gupta and R. E. Sproull. “Filtering Edges for Gray-Scale Displays”. In: SIGGRAPH Comput. Graph. 15.3 (1981), pp. 1–5. 13. D. Hearn and M. P. Baker. Computer Graphics with OpenGL. 3rd edition. Upper Saddle River, NJ: Pearson Prentice Hall, 2004. 14. J. F. Hughes, A. van Dam, M. MaGuire, D. F. Sklar, J. D. Foley, S. K. Feiner and K. Akeley. Computer Graphics. 3rd edition. Upper Saddle River, NJ [u. a.]: Addison-Wesley, 2014.
References
285
15. M. R. Kappel. “An Ellipse-Drawing Algorithm for Raster Displays”. In: Fundamental Algorithms for Computer Graphics. Ed. by Earnshaw R. A. NATO ASI Series (Series F: Computer and Systems Sciences). Berlin, Heidelberg: Springer, 1985, pp. 257–280. 16. R. Li, Q. Hou and K. Zhou. “Efficient GPU Path Rendering Using Scan-line Rasterization”. In: ACM Trans. Graph. 35.6 (Nov. 2016), Article no 228. 17. J. Pineda. “A Parallel Algorithm for Polygon Rasterization”. In: SIG GRAPH Comput. Graph. 22.4 (1988), pp. 17–20. 18. M. L. V. Pitteway. “Algorithm for drawing ellipses or hyperbolae with a digital plotter”. In: The Computer Journal 10.3 (1967), pp. 282–289. 19. M. L. V. Pitteway and D. J. Watkinson. “Bresenham’s Algorithm with Gray Scale”. In: Commun. ACM 23.11 (1980), pp. 625–626. 20. M. Segal and K. Akeley. The OpenGL Graphics System: A Specification (Version 4.6 (Compatibility Profile) - October 22 2019. Abgerufen 8.2.2021. The Khronos Group Inc, 2019. URL: https://www.khronos.org/registry/OpenGL/specs/gl/glspec46.compatibility.pdf. 21. M. Segal and K. Akeley. The OpenGL Graphics System: A Specification (Version 4.6 (Core Profile) - October 22, 2019). Abgerufen 8.2.2021. The Khronos Group Inc, 2019. URL: https:// www.khronos.org/registry/OpenGL/specs/gl/glspec46.core.pdf. 22. G. Sellers, S. Wright and N. Haemel. OpenGL SuperBible. 7th edition. New York: AddisonWesley, 2016. 23. A. Watt. 3D-Computergrafik. 3rd edition. München: Pearson Studium, 2002. 24. W. E. Wright. “Parallelization of Bresenham’s line and circle algorithms”. In: IEEE Computer Graphics and Applications 10.5 (Sept. 1990), pp. 60–67. 25. X. Wu. “An efficient antialiasing technique”. In: SIGGRAPH Comput. Graph. 25.4 (1991), pp. 143–152. 26. C. Wylie, G. Romney, D. Evans and A. Erdahl. “Half-tone perspective drawings by computer”. In: Proceedings of the November 14-16, 1967, Fall Joint Computer Conference. New York, NY, USA: Association for Computing Machinery, 1967, pp. 49–58.
8
Visibility Considerations
For the representation of a section of a three-dimensional model world, it must be determined (just as in two dimensions) which objects are actually located in the area to be displayed. In addition to these clipping calculations, only for the objects located in the visible area, the problem of obscuring objects or parts of objects by other objects must be solved. This chapter introduces three-dimensional clipping, the spatial reduction of the entire scene to the visible space to be displayed. The procedure is explained in two-dimensional space, which can easily be extended to the three-dimensional space. Straight-line segments are considered, since the edges of polygons can be understood as such. The concept of clipping volume is explained. In addition, this chapter describes procedures for determining the visible objects in a scene. These include backside removal, which removes polygons that are not visible in the scene and lie on the backside, so that further calculations in the graphics pipeline only have to be applied to the front sides. This saves computing time. Techniques are presented that identify the visible areas of objects that are to be displayed in the case of concealed objects. These techniques are divided into object and image-space procedures, depending on whether they are applied to the objects or only later in the graphics pipeline to the image section.
8.1 Line Clipping in 2D When drawing a section of an image of a more complex “world” that is modelled using vector graphics, the first step is to determine which objects are located wholly or at least partially in the area to be displayed. The process of deciding which objects can be omitted entirely when drawing or which parts of an object should be considered when the drawing is called clipping. The area from which objects are to be displayed is known as the clipping area. In this section, algorithms for clipping of straight lines are discussed in more detail. Since primitives are usually triangles made up of three line segments, the generality is not limited when clipping is examined below in terms © Springer Nature Switzerland AG 2023 K. Lehn et al., Introduction to Computer Graphics, Undergraduate Topics in Computer Science, https://doi.org/10.1007/978-3-031-28135-8_8
287
288
8 Visibility Considerations
Fig. 8.1 Different cases of straight-line clipping
of line segments. Later in this chapter, the methods discussed here will be applied to the three-dimensional case. With straight-line clipping, four main cases can occur, which are illustrated in Fig. 8.1 as an example: • The start and endpoints of the line segment lie in the clipping area, i.e., the line segment lies entirely within the area to be drawn. The entire line segment must be drawn. • The start point is inside the clipping area, and the endpoint is outside the clipping area or vice versa. The calculation of an intersection of the straight-line segment with the clipping area is required. Only a part of the line segment is drawn. • The start and endpoints are both outside the clipping area, and the straight segment intersects the clipping area. The calculation of two intersections of the line segment with the selected clipping area is necessary. The line segment between these two intersection points must be drawn. • The start and endpoints are both outside the clipping area, and the straight segment does not intersect the clipping area. The straight-line segment, therefore, lies entirely outside the clipping area and does not need to be drawn. An obvious but very computationally intensive method for clipping straight-line segments is to calculate all intersections of the straight-line segment with the edge of the rectangular clipping area to be displayed. Since it is necessary to determine the intersection points of the line segment with the four-line segments that limit the clipping range, it is not sufficient to know only the intersection points of the corresponding lines. If an intersection point is outside a line segment, it does not matter. To calculate the intersection points, the line segment with start point (x0 , y0 ) and endpoint (x1 , y1 ) can be displayed as a convex combination of start and endpoint: g(t) =
x(t) y(t)
= (1 − t) ·
x0 y0
+t ·
x1 y1
(0 ≤ t ≤ 1).
(8.1)
8.1 Line Clipping in 2D
289
The rectangle defines the clipping area left corner (xmin , ymin ) and the upper right corner (xmax , ymax ). As an example, the determination of a possibly existing intersection point of the straight-line segment with the lower edge of the rectangle is considered here. For this purpose, the line segment (8.1) must be equated with the line segment for the lower side (1 − t1 ) ·
x0 y0
+ t1 ·
x1 y1
= (1 − t2 ) ·
xmin ymin
+ t2 ·
xmax ymin
.
(8.2)
The x- and the y-component of (8.2) result in an equation in the variables t1 and t2 , respectively. If this linear system of equations has no unique solution, the two straight lines run parallel, so that the lower edge of the clipping area does not cause any problems during clipping. If the system of equations has a unique solution, the following cases must be distinguished: 1. t1 < 0 and t2 < 0: The intersection point is outside the straight section and in front of xmin . 2. 0 ≤ t1 ≤ 1 and t2 < 0: The line segment intersects the line defined by the lower edge before xmin . 3. t1 > 1 and t2 < 0: The point of intersection is outside the straight section and in front of xmin . 4. t1 < 0 and 0 ≤ t2 ≤ 1: The straight-line cuts the edge (x0 , y0 ). 5. 0 ≤ t1 ≤ 1 and 0 ≤ t2 ≤ 1: The straight section cuts the bottom edge. 6. t1 > 1 and 0 ≤ t2 ≤ 1: The straight-line cuts the edge behind (x1 , y1 ). 7. t1 < 0 and t2 > 1: The point of intersection lies outside the straight-line and behind xmax . 8. 0 ≤ t1 ≤ 1 and t2 > 1: The line segment intersects the line defined by the lower edge behind xmax . 9. t1 > 1 and t2 > 1: The point of intersection lies outside the straight section and behind xmax . The same considerations can be made for the remaining rectangular edges, and from this, it can be concluded whether or which part of the straight-line is to be drawn. In the following, two methods for two-dimensional line clipping are introduced, that is the Cohen–Sutherland clipping algorithm and the Cyrus–Beck clipping algorithm.
8.1.1 Cohen–Sutherland Clipping Algorithmus The Cohen–Sutherland clipping algorithm (see, e.g. [2]) avoids the time-consuming calculations of the line intersections. For this purpose, the plane is divided into nine subareas, which are described by a 4-bit code. One point P = (x P , y P ) is assigned (P) (P) (P) (P) the four-digit binary code b1 b2 b3 b4 ∈ {0, 1}4 according to the following pattern (otherwise 0):
290
8 Visibility Considerations
b1(P) = 1 ⇔ x p < xmin b2(P) = 1 ⇔ x p > xmax b3
(P)
= 1 ⇔ y p < ymin
(P) b4
= 1 ⇔ y p > ymax
Figure 8.2 shows the nine regions and the binary codes assigned to them. A line segment to be drawn with start point P and endpoint Q is considered. The binary code b(P) or b(Q) assigned to these two points can be determined by simple numerical comparisons. Three cases can be distinguished when drawing the line segment. Drawing the straight-line segment is done piece by piece. The first two cases represent a termination condition. In the third case, the straight-line segment is divided into pieces. 1. case: The bitwise combination of the binary codes assigned to the two points by the logical OR operator results in b(P) ∨ b(Q) = 0000. Then obviously, both points lie within the clipping area, the entire route P Q is drawn, and the drawing of the straight segment is finished. 2. case: The bitwise combination of the binary codes assigned to the two points by the logical AND operator results in b(P) ∧b(Q) = 0000. This means that the two binary codes have a common one, at least in one position. The binary code is read from left to right. If there is a common one in the first position, the entire straight segment is on the left side of the clipping area, and if there is a one at the second position, the entire straight segment is on the right side of the clipping area. Accordingly, a one at the third or fourth position means that the straight-line segment is located above or below the clipping range. In this case, the line segment does not need to be drawn and the drawing of the line segment is finished. 3. case: Neither the first nor the second case applies.
Fig. 8.2 Binary code for the nine areas of the Cohen–Sutherland Clipping
1001
0001
0101
1000
(xmax , ymax )
0000
0100
0010
0110
(xmin , ymin ) 1010
8.1 Line Clipping in 2D
291
Then b(P) = 0000 and b(Q) = 0000 must apply. Without the restriction of the generality, it can be assumed that b(P) = 0000, otherwise the points P and Q are swapped. One calculate in in this case, the intersection or intersections of the line segment with the rectangular lines to which the ones in b(P) = 0000 belong. There exist either one or two intersection points. An example of this will be shown using the range with the binary code 1000. The straight-line segment cannot intersect the lower and the upper limiting straight line of the clipping area at the same time. The lower line can only be cut if b(Q) has a one at the third position, whereas the upper line can only be cut if b(Q) has a one at the fourth position. But the third and fourth position of b(Q) can never have the value one at the same time. If there is only one intersection point of the line segment with the boundary lines of the clipping area, the point P is replaced by this intersection point. If there are two intersection points, one of them is determined, and this intersection point replaces P. The straight-line segment shortened in this way is treated accordingly until one of the first two cases applies. Figure 8.3 illustrates this procedure. The straight-line segment P Q is shortened via the intermediate steps S1 Q, S2 Q, S2 S3 to the straight-line segment S2 S4 , which is finally drawn completely.
8.1.2 Cyrus–Beck Clipping Algorithmus The Cyrus–Beck clipping algorithm [1] determines the piece of a straight-line segment to be drawn using normal vectors of the clipping area and parametric representation of the straight-line segment, which is represented as a convex combination of start and endpoint as in Eq. (8.1). If p0 is the start point and p1 is the endpoint of the
Fig. 8.3 Cohen–Sutherland Clipping
1001
1000 S1
0001
S 4 0000 S2
S3 Q
0101
0100
P
1010
0010
0110
292
8 Visibility Considerations
Fig. 8.4 Cyrus Beck clipping
line segment, then the line segment corresponds exactly to the points g(t) = (1 − t) · p0 + t · p1 = p0 + (p1 − p0 ) · t
(t ∈ [0, 1]).
A normal vector is determined for each of the four edges bounding the clipping area. The normal vector is determined so that it points outwards. For the left edge, therefore, the normal vector edge (−1, 0) is used. For the lower edge (0, −1) , for the upper edge (0, 1) and for the right edge (1, 0) are used. Figure 8.4 shows an example of the Cyrus–Beck clipping procedure on the left boundary edge of the clipping area. Next to the respective normal vector n, a point on the corresponding edge of the clipping area is also selected. For the left edge, this point is marked p E (E for “edge”) in the Fig. 8.4. This point can be chosen anywhere on this edge; for example, one of the corner points of the clipping area located on it could be selected. The connection vector of the point p E with a position on the line defined by the points p0 and p1 , can be expressed in the form p0 + (p1 − p0 )t − p E . To calculate the intersection of the straight line with the edge of the CP, the following must apply: 0 = n E · (p0 + (p1 − p0 )t − p E ) = n E · (p0 − p E ) + n E · (p1 − p0 )t.
This equation states that the connection vector of p E with the intersection point must be orthogonal to the normal vector n E , since it is parallel to the left edge of the clipping area. Thus, the following applies to the parameter t from which the intersection point results: t = −
n E · (p0 − p E ) n E · (p1 − p0 )
.
(8.3)
The denominator can only become 0 if either p0 = p1 applies, i.e., the straight line consists of only one point, or if the straight line is perpendicular to the normal
8.1 Line Clipping in 2D
293
Fig. 8.5 Potential intersection points with the clipping rectangle between points P0 and P1
P1 PA
PE
PA
PE
P0
vector n E , i.e., parallel to the edge E under consideration, then there can be no point of intersection. The value t is determined for all four edges of the clipping rectangle to determine whether there are intersections of the straight-line with the edges of the rectangle. If a value t is outside the interval [0, 1], there is no intersection of the line segment with the corresponding rectangle edge. Note that the calculation of scalar products in the numerator and denominator of Eq. (8.3) is performed by the pure form of the normal vectors and consists of only the selection of the x- or y-component of the respective right difference vector, if necessary with a change of the sign. For each of the four edges of the clipping rectangle, consider the calculated value t for the intersection of the straight line with the edge. If a value t lies outside of the interval [0, 1], the straight-line segment does not intersect the corresponding rectangular edge. The remaining potential intersection points with the rectangle edges of the clipping area are characterised as “potentially exiting” (PA) and “potentially entering” (PE). These clipping points are all within the straight segment, however, you do not have to cut a square edge, but can also reduce its extensions outside the clipping area. Figure 8.5 illustrates this situation. Mathematically, the decision PA or PE can be made based on the angle between the straight-line p0 p1 and the normal vector n belonging to the corresponding rectangular edge • If the angle is greater than 90◦ , the case is PE. • If the angle is smaller than 90◦ , the case is PA. For this decision, only the sign of the scalar product must be determined n · (p1 − p0 ) =| n | · | p1 − p0 | · cos(α) (where α is the angle between the two vectors n and p1 − p0 ). The right side of the equation becomes negative only when the cosine becomes negative because the lengths of the vectors are always positive. A negative sign implies the case PE, and a positive sign implies the case PA. Since, in each case, a component of the normal vectors of the rectangle edges is zero, the sign of the scalar product is obtained by considering the corresponding sign of the vector (p1 − p0 ).
294
8 Visibility Considerations
To determine the straight section lying in the clipping rectangle, the largest value t E must be calculated that belongs to a PE point and the smallest value t A that belongs to a PA point. If t E ≤ t A is valid, exactly the part of the line between points p0 + (p1 − p0 )t E and p0 + (p1 − p0 )t A must be drawn in the rectangle. In the case t E > t A , the line segment lies outside the rectangle.
8.2 Image-Space and Object-Space Methods Clipping determines which objects of a three-dimensional world modelled in the computer are at least partially within the clipping volume. These objects are candidates for the objects to be displayed in the scene. Usually, not all objects will be visible, since, for example, objects further back will be covered by those further forward. Also, the backsides of the objects may not be visible. A simple algorithm for drawing the visible objects in a scene could be as follows. Think of the pixel grid drawn on the projection plane and place a beam through each pixel in the projection direction, i.e., parallel to the z-axis. A pixel is to be coloured in the same way as the object on which its corresponding ray first hits. This technique is called the imagespace method because it uses the pixel structure of the image to be constructed. For p pixels and n objects, a calculation effort of n · p steps is needed. At a usual screen resolution, one can calculate from about one million pixels. The number of objects can vary greatly depending on the scene. Objects are the triangles that are used to approximate the surfaces of the more complex objects (see Chap. 7). For this reason, hundreds of millions or more objects, i.e., triangles, are not uncommon in more complex scenes. In contrast to most image-space methods, object-space methods do not start from the fragments but determine directly for the objects, i.e., the triangles, which objects at least partially obscure others. Only the visible parts of an object are then projected. Object-space methods must test the objects in pairs for mutual obscuration so that for n objects, a computational effort of about n 2 , i.e., a square effort, is required. In general, it is true that the number of objects is smaller than the number of pixels so that n 2 n · p follows. This means that the object-space procedures must perform a much lower number of steps than the image-space procedures. The individual steps are with the object-space procedures substantially more complex. One advantage of the object-space methods is that they can be calculated independently of the image resolution because they do not access the pixels for the visibility considerations. Only during the subsequent projection (see Chap. 4) of the visible objects, the image resolution does play a role. In the following two objects, space methods are presented, that is the backside removal and the partitioning methods. Afterwards, three image-space procedures are explained. The depth buffer algorithm and its variants play the most crucial role in the practical application of screening. Additionally, the scan-line method and the priority algorithms are discussed. Beam tracing as an additional image-space method is presented in the Sect. 9.9. This procedure is called raytracing.
8.2 Image-Space and Object-Space Methods
295
8.2.1 Backface Culling Regardless of which visibility method is preferred, the number of objects, i.e., triangles or, more generally, plane polygons, should be reduced by backface culling before use. Back faces are surfaces that point away from the viewer, and that one cannot see them. These areas can be neglected by the visibility considerations, and are, therefore, omitted in all further calculations. The orientation of the surfaces described in Sect. 4.2 allows the normal vectors to the surfaces to be aligned in such a way that they always point to the outside of the surface. If a normal vector is oriented, in this waypoints, away from the observer, the observer looks at the back of the surface, and it does not have to be taken into account in the visibility considerations. With the parallel projection on the x y-plane considered here, the projection direction is parallel to the z-axis. In this case, the viewer looks at the back of a surface when the normal vector of the surface forms an obtuse angle with the z-axis, i.e., when it points approximately in the opposite direction of the z-axis. Figure 8.6 illustrates this situation using two sides of a tetrahedron. The two parallel vectors correspond to the projection direction and thus point in the direction of the z-axis. The other two vectors are the normal vectors to the front or back of the tetrahedron. One can see that the normal vector of the visible front side is at an acute angle to the projection direction. In contrast, the normal vector of the invisible backside forms an obtuse angle with the projection direction. If there is an obtuse angle between the normal vector of a surface and the projection direction, this means that the angle is greater than 90◦ . If one calculates the scalar product of the normal vector n = (n x , n y , n z ) with the projection direction vector, i.e., with the unit vector ez = (0, 0, 1) in the z-direction, results e z · n = cos(ϕ)· ez · n
(8.4)
Where ϕ is the angle between the two vectors, and v is the length of the vector v. The lengths of the two vectors on the right side of the equation are always positive. Thus, the right side becomes negative exactly when cos(ϕ) < 0, d.h. ϕ > 90◦ applies. The sign of the scalar product (8.4) thus indicates whether the side with the normal vector n is to be considered in the visibility considerations. All pages where the scalar
Fig. 8.6 A front surface whose normal vector has an acute angle with the z-axis, and a posterior one whose normal vector gives an obtuse angle so that the surface at representation is negligible
296
8 Visibility Considerations
product is negative can be neglected. Since here the scalar product is calculated with a unit vector, it is reduced to ⎛ ⎞ nx ⎝ n y ⎠ = nz . e z · n = (0, 0, 1) · nz To determine the sign of the scalar product, therefore, neither multiplications nor additions have to be carried out; only the sign of the z-component of the normal vector has to be considered. Thus, in the case of parallel projection on the x y-plane, backside removal consists of ignoring all plane polygons that have a normal vector with a negative z-component. One can easily see that a good placement of the culling is in the normalised device coordinates of the viewing pipeline. The NDC performs a parallel projection of the x and y values into the window space coordinate system. At the same time, the corresponding z-coordinates are used for further considerations in the rasterizer (see Chap. 7). The results of the Sect. 5.11 show that each normal vector n must be multiplied by the transposed inverse of the total matrix M from model- to NDC, i.e., (M −1 )T · n to obtain the corresponding normal vector nNDC in NDC. According to the results of this section, it is sufficient to consider the z-component of nNDC concerning the sign. If this is negative, the corresponding side of the primitive (usually triangle) is removed, because it is not visible.
8.2.2 Partitioning Methods Rear side removal reduces the computational effort for the visibility considerations and all subsequent calculations. Spatial partitions also serve to reduce the effort further. For this purpose, the clipping volume is divided into disjunctive subareas, e.g., into eight sub-cubes of equal size. The objects are assigned to the sub-cuboids with which they have a non-empty intersection. In this way, an object that extends beyond the boundary of a sub-quadrant is assigned to several sub-quadrants. When using an object-space method (regardless of the resolution of the image area), only the objects within a sub-quadrant need to be compared to decide which objects are visible. In the case of successive sub-quadrants, the objects of the following sub-quadrants are projected first and then those of the front ones, so that they overwrite the rear objects with the front ones if necessary. If the clipping volume is divided into k sub-cubes, a reduction in the calculation effort of n objects of n 2 2 2 on k · nk = nk . However, this only applies if each object lies in exactly one sub-quadrant, and the objects are distributed evenly over the sub-cubes so that nk objects can be viewed in each sub-cube. If there is a large number of sub-boxes this assumption will not even come close to being true. For image-space methods (depending on the resolution of the image area), it may be advantageous to partition the clipping volume, as shown in Fig. 8.7 on the left.
8.2 Image-Space and Object-Space Methods
297
Fig. 8.7 Split of clipping volume for image-space (left) and object-space methods
For this purpose, the projection plane is divided into rectangles, and each of these rectangles induces a cuboid within the clipping volume. Thus, for a pixel, only those objects that are located in the same sub-cube as the pixel need to be considered for the visibility considerations. With recursive division algorithms, the studied area is recursively broken down into smaller and smaller areas until the visibility decision can be made in the lower regions. The image resolution limits the maximum recursion depth. With the area subdivision method, the projection surface is divided. They are, therefore, imagespace methods. Octal tree methods (see Chap. 4) partition the clipping volume and belong to the object-space methods.
8.2.3 The Depth Buffer Algorithm The sl depth or z-buffer algorithm is the most widely used method for determining which objects are visible in a scene. It involves the conversion of a set of fragments into a pixel to be displayed (see Chap. 7)). The depth buffer algorithm is, therefore, an image-space method that works according to the following principle: A colour buffer is used in which the colour for each pixel is stored. The objects are projected in an arbitrary order and entered into the colour buffer. With this strategy alone, the object came last by chance would be visible, but it would not usually be the one furthest ahead. For this reason, before entering an object into the colour buffer, one checks whether there is already an object at this position that is closer to the viewer. Then this object must not be overwritten by the further object. To determine whether an object further ahead has already been entered in the colour buffer, use a second buffer, the depth buffer, or z-buffer. This buffer is initially initialised with the z-coordinate of the rear clipping level. Since here again, only a parallel projection on the x y-plane is considered, no object lying in the clipping volume (see Sect. 5.7) can have a larger z-coordinate. When an object is projected, not only the colour values of the pixels are entered in the colour buffer, but also the corresponding z-values in the depth buffer. Before entering an amount in the colour buffer and the depth buffer, check whether the depth buffer already contains a smaller value. If this is the case, this means that an object further ahead has already been
298
8 Visibility Considerations
Fig. 8.8 How the depth buffer algorithm works
entered at the corresponding position, so that the object currently being viewed must not overwrite the previous values in the colour and depth buffer. The operation of the depth buffer algorithm is illustrated in Fig. 8.8. There are two objects to be projected, a rectangle, and an ellipse. In reality, both objects are tessellated, and the procedure is carried out with the fragments of the corresponding triangles. For the sake of clarity, the process will be explained below using the two objects as examples. The viewer looks at the scene below. The colour buffer is initialised with the background colour, the depth buffer with the z-value of the rear clipping level. For better clarity, these entries in the depth buffer have been left empty in the figure, and normally they have the greatest value. The rectangle is projected first. Since it lies within the clipping volume and its z-values are, therefore, smaller than the z-value of the rear clipping bend, which has been in the depth buffer everywhere up to now, the corresponding colour values of the projection are written into the colour buffer and the z-values into the depth buffer. During the subsequent projection of the ellipse, it is found that all z-values at the corresponding positions are larger so that the ellipse is entered into the colour and depth buffer and thus partially overwrites the already projected rectangle. If one had projected the ellipse first and then edited the rectangle, one would notice that there is already a smaller z-value in the depth buffer at two of the pixels of the
8.2 Image-Space and Object-Space Methods
299
rectangle to be projected, which, therefore, must not be overwritten by the rectangle. Also, in this projection sequence, the same correct colour buffer would result in the end, which is shown in the lower part of Fig. 8.8. Note that the z-values of an object are usually not the same and that for each projected fragment of an object, it must be decided individually whether it should be entered into the colour and depth buffer. The depth buffer algorithm can also be used very efficiently for animations where the viewer’s point of view does not change. The objects that do not move in the animation form the background of the scene. They only need to be entered once in the colour and depth buffer, which are then reinitialised with these values for the calculation of each image. Only the moving objects must be entered again. If a moving object moves behind a static object in the scene, this is determined by the z-value in the depth buffer, and the static object correctly hides the moving object. Also, the depth buffer algorithm can be applied to transparent objects (transparency is explained in Chap. 6). To enter a plane polygon into the colour and depth buffer, a scan-line method is used in which the pixels in the projection plane are scanned line by line. The implicit equation describes the plane induced by the polygon A · x + B · y + C · z + D = 0.
(8.5)
The z-value within a scan line can be expressed in the form z new = z old + Δz because of the z-values over a plane change linearly along the scan line. For the pixel (x, y), let the corresponding z-coordinate polygon be z alt . For the z-coordinate z neu of the pixel (x + 1, y), it applies with the plane Eq. (8.5) and under the assumption that the point (x, y, z alt ) lies on the plane 0 = A · (x + 1) + B · y + C · z new + D = A · (x + 1) + B · y + C · (z old + Δz) + D = A · x + B · y + C · z old + D +A + C · Δz
=0
= A + C · Δz. From this, one get the searched change of the z-coordinate along the scan line A Δz = − . C This means that not every z-coordinate of the fragments has to be calculated. For each scan line, the following z-coordinate is determined by adding Δz to the current z-value. Only the first z-coordinate of the scan line has to be interpolated, which reduces the overall calculation effort. However, this leads to data dependencies, and the use of parallelism is reduced.
300
8 Visibility Considerations
8.2.4 Scan-Line Algorithms With the depth buffer algorithm, the projection of individual polygons is carried out using scan-line methods. Alternatively, the edges of all objects can also be projected to determine which objects are to be displayed using a scan-line method. With this scan-line method for determining visibility, it is assumed that the objects to be projected are plane polygons. The coordinate axes of the display window—or the corresponding rectangular area in the projection plane—are denoted by u and v (not to be confused with the texture coordinates). The scan-line method works with three tables. The edge table contains all edges that do not run horizontally and has the following structure: vmin u(vmin ) vmax Δu Polygon numbers
vmin is the smallest v-value of the edge, u(vmin ), the corresponding u-value to the vmin -value of the edge. Accordingly, vmax is the largest v-value of the edge. Δu is the slope of the edge. The list of polygons to which the edge belongs is entered in the column Polygon numbers. The edges are sorted in ascending order according to the values vmin and, in case of equal values, in ascending order according to u(vmin ). The second table in the scan-line method is the polygon table. It contains information about the polygons in the following form: Polygon number A B C D Colour In-flag
The polygon number is used to identify the polygon. The coefficients A, B, C and D define the implicit plane equation belonging to the polygon Ax + By + C z + D = 0. The colour or colour information for the polygon is entered in the Colour column. The requirement that a unique colour can be assigned to the polygon considerably limits the applicability of the scan-line method. It would be cheaper, but more complex, to read the colour information directly from the polygon point by point. The inflag indicates whether the currently considered position on the scan line is inside or outside the polygon. The last table contains the list of all active edges. The active edges are those that intersect the currently viewed scan line. They are sorted in ascending order by the u-components of the intersection points. The length of the table of the active edge changes for each scan line in the calculation. The lengths and the entries of the other two tables remain unchanged during the entire calculation except for the in-flag.
8.2 Image-Space and Object-Space Methods
301
Fig. 8.9 Determination of the active edges for the scan lines v1 , v2 , v3 , v4
In the example shown in Fig. 8.9, the following active edges result for the scan lines v1 , v2 , v3 , v4 : v1 v2 v3 v4
: : : :
P3 P1 , P3 P1 , P3 P1 , P6 P5 ,
P1 P2 P1 P2 , P6 P4 , P5 P4 P6 P5 , P3 P2 , P5 P4 P5 P4 .
If one look at a scan line, after determining the active edges, first, all in-flags are set to zero. Afterwards, the scan line is traced with the odd-parity rule (see Sect. 7.5). If an active edge is crossed, the in-flags of the polygons are updated. The in-flags of the polygons to which the edge belongs need to be adjusted, as one enter or exit the polygon when crossing the edge. At each pixel, the visible polygon must be determined among the active polygons with In-Flag=1. For this purpose, only the z-value of each polygon must be determined from the corresponding plane equation. The polygon with the smallest z-value is at a similar position visible. The corresponding z-value of a polygon can be defined incrementally as with the depth buffer algorithm.
8.2.5 Priority Algorithms With the depth buffer algorithm, the order in which the objects are projected is not essential. By exploiting the information in the z-buffer, the overwriting of objects further ahead by objects further back is prevented. The goal of priority algorithms is to select the projection order of the objects so that objects further forward are projected only after all objects behind them have already been projected. This projection sequence means that the z-buffer can be dispensed with. Also, the determination of
302
8 Visibility Considerations
Fig. 8.10 No overlapping of x- or y-coordinates
Fig. 8.11 Is one polygon completely behind or in front of the plane of the other?
the order is independent of the resolution, so that priority algorithms belong to the class of object-space methods. In the following, the sequence in which two objects, i.e., two plane polygons P and Q, are to be projected is to be decided, so that the polygon that may be located further back does not overwrite the front one. If the z-coordinates of the two polygons do not overlap, the object with the larger z-coordinates is drawn to the front. If the z-coordinates overlap, further investigations must be made. If the x- or y-coordinates of P and Q do not overlap, the order in which they are projected does not matter because for the viewer, they are next to each other or on top of each other, as shown in Fig. 8.10. To check whether the polygons overlap in a coordinate, only the values of the corresponding vertex coordinates of one polygon need to be compared with those of the other. If all values of one polygon are smaller than those of the other or vice versa, an overlap in the corresponding coordinate can be excluded. If overlaps occur in all coordinates, the question is whether one polygon is completely behind or in front of the plane of the other. These two possibilities are shown in Fig. 8.11. On the left side of the figure, the polygon that lies behind the plane of the other one should be projected first. On the right, the polygon that lies entirely in front of the plane of the other would have to be drawn last. These two possible cases can be solved by using a suitably oriented normal vector to the plane. An example of this is shown here for the situation on the right in Fig. 8.11. The normal vector to the plane induced by the one polygon is used. The normal vector should be oriented towards the observer. Otherwise, the polygon would already have been deleted from the list of the polygons to be considered during the
8.2 Image-Space and Object-Space Methods
303
Fig. 8.12 Determining whether one polygon is entirely in front of the plane of the other
Fig. 8.13 A case where there is no correct projection order of the objects
backside removal (see Sect. 8.2.1). If any point in the plane is selected, the connection vectors of these points must form an angle smaller than 90◦ with the normal vector to the corner points of the other polygon. This is equivalent that the scalar product of each of these vectors is positive with the normal vector. When this is true, one polygon is entirely in front of the plane induced by the other polygon. Figure 8.12 illustrates this situation: The angles that the normal vector makes with the vectors to the three corners of the polygon are all less than 90◦ . Unfortunately, these criteria are not sufficient to construct a correct projection order of the objects. There are cases like in Fig. 8.13 where it is impossible to project the objects one after the other so that in the end, a correct representation is obtained. If none of the above criteria apply, priority algorithms require a finer subdivision of the polygons involved to allow a correct projection order. Even if these cases are
304
8 Visibility Considerations
rarely encountered, the determination of a suitable finer subdivision of the polygons can be very complicated.
8.3 Exercises Exercise 8.1 In Fig. 8.13, if we assume triangles rather than arbitrary plane polygons, the projection and application of the odd-parity rule is not necessary to determine whether the intersection is lying within the triangle. Specify a procedure to make the decision for triangles easier. Exercise 8.2 Specify a procedure to test whether the left case in Fig. 8.11 is present. Proceed in a similar way as in Fig. 8.12. Exercise 8.3 Let the following clipping range be given with (xmin , ymin ) = (1, 2) and (xmax , ymax ) = (6, 4). In addition, let a straight-line segment be given, which leads from the point P0 = (0, 0) to the point P1 = (5, 3). Sketch the described situation and perform a clipping of the straight-line segment using (a) the Cohen–Sutherland algorithm and (b) the Cyrus–Beck algorithm. Exercise 8.4 A backface culling is to be carried out in a scene: (a) Consider the triangle defined by the homogeneous vertices (1, 0, 1, 1)T ,(2, 1, 0, 1)T , (2, 2, 1, 1)T . To perform culling, you need to calculate the surface’s normal (normal vector) (assuming it is flat). What is the normal of the surface, taking into account the constraint that the vertices are arranged counterclockwise to allow a clear meaning of the front and back? The viewer’s position should be on the negative z-axis, and the surface should be visible from there. (b) For another scene area, the normal (−3, 1, −2, 0)T was calculated. The eye point is located at (0, 10, 1, 1)T . Determine by calculation whether the corner point of the surface (−1, 1, −1, 1)T is visible or facing the viewer, taking into account the position of the eye point.
References 1. M. Cyrus and J. Beck. “Generalized Two- and Three-Dimensional Clipping”. In: Computers and Graphics 3.1 (1978), pp. 23–28. 2. J. D. Foley, A. van Dam, S. K. Feiner and J. F. Hughes. Computer Graphics: Principles and Practice. 2nd edition. Boston: Addison-Wesley, 1996.
9
Lighting Models
Section 5.8 discusses projections that are needed to represent a three-dimensional scene. Projections are considered special mappings of three-dimensional space onto a plane. In this sense, projections only describe where a point or object is to be drawn on the projection plane. The visibility considerations in Chap. 8 are only about ensuring that front objects are not covered by rear ones when projecting objects. Just the information where an object is to be drawn on the projection plane, i.e., which fragment covers this object, is not sufficient for a realistic representation. Figure 9.1 shows the projections of a grey sphere and a grey cube in two variants each. In the first representation, the projection of the sphere or cube is coloured in the grey basic colour of the objects. This results in uniformly coloured flat structures without threedimensional information. The projection of the sphere is a uniformly grey circle. The projection of the cube results in a grey filled hexagon. Only the consideration of illumination and reflection, which leads to different shades of the object surfaces and thus also of the projections, results in the threedimensional impression of the second sphere and the cube on the right in the figure. Figure 9.2 shows this situation using the Utah teapot as an example. The colour or brightness modification of an object surface due to lighting effects is called shading. This chapter presents the basic principles and techniques required for shading. It should be noted in this context that shaders (small programmable units that run on the GPU, see Chap. 2) are different from shading. According to theory, the calculations for shading explained in the following sections would have to be carried out separately for each wavelength of light. Computer graphics algorithms are usually limited to the determination of the three RGB values for red, green and blue (colour representations are discussed in Chap. 6). In the meantime, much more complex illumination models exist that are physically-based and even simulate illumination at the photon level. A very good overview is given in [6, p. 215ff]. In this chapter, the basics are considered.
© Springer Nature Switzerland AG 2023 K. Lehn et al., Introduction to Computer Graphics, Undergraduate Topics in Computer Science, https://doi.org/10.1007/978-3-031-28135-8_9
305
306
9
Lighting Models
Fig. 9.1 Objects without and with lighting effects
Fig. 9.2 Utah Teapot without and with lighting effects
9.1 Light Sources of Local Illumination In addition to the objects and the viewer, a three-dimensional scene also includes information about the lighting, which can come from different light sources. The colour of the light belongs to each type of lighting. It should be noted that colour is not a physical aspiration of light, but that wavelength-dependent energy distribution is mapped onto colours and linear colours are used for calculation. Usually, white light is used, but it can deviate from this, for example, with coloured lamps or the reddish light of a sunset. The colour of light from a light source is defined by RGB value. The aim is to create a very efficient and realistic image of the lighting conditions. In the following, therefore, the physically exact illumination is not but a very good efficient approximation. In addition, the reflections of light from adjacent surfaces to each other are neglected, so that only light emitted by light sources is distributed. This means that only the intensity of the incident light on the object and the reflection behaviour of the object in question towards the observer is evaluated. Thus, indirect illuminations of the objects to each other are neglected as well as the visibility of the light source and the generation of corresponding shadows. This is called local lighting and is the standard lighting model for computer graphics. More computationally intensive techniques, which take into account the influence of neighbouring surfaces and thus act globally, are dealt with in Sect. 9.8 (radiosity) and Sect. 9.9 (ray tracing). This means that the sum of all reflections, direct and indirect, is perceived as lighting. In addition, these computationally intensive global lighting techniques can be calculated in advance and stored in a raster image similar to a texture. It should be noted that this applies to diffusely reflecting surfaces. This is not possible for specularly reflecting surfaces since the illumination depends on
9.1 Light Sources of Local Illumination
307
the observer’s position. Similarly, with static light sources, the shadow cast can be calculated in advance and saved in the so-called shadow maps, again similar to a texture. With dynamic light sources, the shadow maps are recalculated. In the local calculation process, these textures can then be accessed in order to incorporate them directly as a factor of illumination. Indirect light that can be calculated by global illumination is, therefore, not integrated into the local illumination model without the detour via textures. Therefore, an additional simulation of indirect light is necessary to keep objects that are not directly illuminated displayable. This simulation is achieved by a constant approximation of the diffuse indirect illumination. For this purpose, the simplest form of light is used, i.e., ambient light. It is not based on a special light source but represents the light that should be present almost everywhere due to multiple reflections on different surfaces. It is, therefore, represented by the constant light intensity in the scene. In a room with a lamp above a table, there will not be complete darkness under the table, even if the light from the lamp cannot shine directly into it. The light is reflected from the table’s surface, the ceiling and the walls, and thus also gets under the table, albeit at a lower intensity. In the model, the scattered light does not come from a certain direction, but from all directions with the same intensity, and therefore, has no position. Parallel incident or directional light has a colour and a direction from which it comes. Directional light is produced by a light source such as the sun, which is almost infinitely far away. The light rays run parallel and have no position for the simulation. Therefore, they have a constant direction of light and constant intensity. A lamp is modelled by a point light source. In addition to the colour, a point light source must be assigned a unique position in space. The light rays spread out from this position in all directions. The intensity of the light decreases with the distance from the light source. This effect is called attenuation. The following consideration shows that the intensity of the light decreases quadratically with distance. To do this, imagine a point source of light that is located in the centre of a sphere with radius r . The entire energy of the light is thus distributed evenly on the inside of the sphere’s surface. If the same situation is considered with a sphere with a larger radius R, the same light energy is distributed over a larger area. The ratio of the two spherical surfaces is r 2 4πr 2 = . 4π R 2 R With a ratio of r/R = 1/2, each point on the inside of the larger sphere, therefore, receives only a quarter of the energy of a point on the smaller sphere. Theoretically, therefore, the intensity of the light from a point source would have to be multiplied by a factor of 1/d 2 if it were to strike an object at a distance d. In this way, the intensity of the light source falls very quickly to a value close to zero, so that from a certain distance on, one can hardly perceive any differences in light intensity. On the other hand, very close to the light source, there are extreme variations in the intensity. The intensity can become arbitrarily large and reaches the value infinity at the point of the light source. To avoid these effects, a damping factor is used in the form 1 , 1 (9.1) f att = min c1 + c2 d + c3 d 2
308
9
Lighting Models
with constants c1 , c2 , c3 to be selected by the user where d is the distance to the light source. First of all, this formula prevents intensities greater than one from occurring. The quadratic polynomial in the denominator can be used to define a parabola shape that produces moderate attenuation effects. In addition, the linear term with the coefficient c2 is used to model atmospheric attenuation. The quadratic decrease of the light intensity simulates the distribution of light energy over a larger area as the distance increases. In addition, the light is attenuated by the atmosphere, i.e., the air turbidity. Fine dust in the air absorbs a part of the light s. In this model, this leads to a linear decrease of light intensity with distance. When a light beam travels twice the distance, it must pass twice as many dust particles on its path, each of which absorbs a certain value of its energy. It should be noted that, in reality, absorption in the homogeneous media such as mist and dust behaves approximately according to Beer’s Law. The intensity decreases exponentially with distance. Additionally, outward scattering is added, so that light intensity is lost. To illustrate this is much more complicated and is not realised in this model. Another commonly used form of the light source is spotlighting. A spotlight differs from a point light source, in that, it emits light only in one direction, which is conical in shape. A spotlight is characterised by its light colour, its unique position, the direction in which it radiates and the aperture angle of its light cone (the beam angle). As in the case of a point source of light, a headlamp is characterised by attenuation in accordance with Eq. (9.1). The root means square of the attenuation results from considerations similar to those for point light sources. Figure 9.3 shows that the energy of the headlamp is distributed over the area of a circle. The radius of the circle grows proportionally with distance so that the circular area increases quadratically and the light energy decreases quadratically. To get a more realistic model of a headlamp, it should be taken into account that the intensity of light decreases towards the edge of the cone. The Warn model [9] can be used for this purpose. The decrease of light intensity towards the edge of the cone is controlled in a similar way to specular reflection (see Sect. 9.9). If l is the vector pointing in the direction of the headlamp from a point on a surface and l S is the axis of the cone, the light intensity at the point under consideration is p I = I S · f att · (cos γ ) p = I S · f att · −l , (9.2) S ·l where I S is the intensity of the headlamp, f att is the distance-dependent attenuation from Eq. (9.1) and y is the angle between the axis of the cone and the line connecting the point on the surface to the headlamp. The value p controls how much the headlamp
Fig. 9.3 Light beam of a headlamp
9.2 Reflections by Phong
309
Fig. 9.4 The warning model for headlamps
focuses. In this way, it also controls the intensity depending on the direction. For p = 0, the headlamp behaves like a point light source. With the increasing value of p, the intensity of the headlamp focuses to an increasingly narrow area around the axis of the cone. The cosine term in Eq. (9.2) can again be calculated using the scalar product of the vectors l and −l S , provided they are normalised. Figure 9.4illustrates p the calculation for the warning model. Furthermore, let gcone be equal to −l S ·l .
9.2 Reflections by Phong In order to achieve lighting effects such as those shown in Figs. 9.1 and 9.2, it is necessary to specify how the surfaces of the objects reflect the light present in the scene. The light that an object reflects is not explicitly determined when calculating the reflections of other objects but is considered as general scattered light in the scene. The more complex radiosity model presented in Sect. 9.8 explicitly calculates the mutual illumination of the objects. For the model of light reflections from object surfaces described in this section, a point on a surface is considered that should be correctly coloured in consideration of the light in the scene. The individual effects presented in the following usually occur in combination. For each individual effect, RGB value is determined, which models the reflection part of this effect. To determine the colouration of the point on the surface, the RGB values of the individual effects must be added together. It must be taken into account that the total intensity for the R, G and B components can each have a maximum value of one. Figure 9.5 visualises this procedure using the example of the Utah Teapot. This underlying model is the illumination model according to Phong [8]. The Phong lighting model is not based on physical calculations, but on purely heuristic considerations. Nevertheless, it provides quite realistic effects for specular reflection. Sometimes a modified Phong’s model, the Blinn–Phong lighting model, is used.
9
Fig. 9.5 The Utah teapot with effects of the Phong lighting model: Additive superposition of emissive, ambient, diffuse and specular lighting or reflection
310 Lighting Models
9.2 Reflections by Phong
311
A very simple contribution to the colouration of a pixel is made by an object that has its own luminosity independent of other light sources, which is characterised by an RGB value. This means that objects with their own luminosity are not interpreted as a light source, and therefore, are not included in indirect lighting. Objects also rarely have their own luminosity. The illumination equation for the intensity I of a fragment by the (emissive) intrinsic luminosity of the object is, therefore I = Ie = k e . This equation should be considered separately for each of the three colours red, green and blue. Correctly, it should, therefore, read (red) (red) I (red) = Ie = ke
(green) (green) I (green) = Ie = ke
(blue) (blue) I (blue) = Ie = ke .
Since the structure of the illumination equation is always the same for all types of reflection for each of the three colours, only one intensity equation is given in the following, even if the calculation must be carried out separately for red, green and blue. A self-luminous object is not considered a light source in the scene. It shines only for the viewer but does not illuminate other objects. Self-luminous objects can be used sensibly in combination with light sources. A luminous lamp is modelled by a point light source to illuminate other objects in the scene. In order to make the lamp itself recognisable as a luminous object, a self-luminous object is additionally generated, for example, in the form of an incandescent lamp. In this way, a self-luminous object does not create a three-dimensional impression of its surface. The projection results in uniformly coloured surfaces, as can be seen in Fig. 9.1 on the far left and secondly from the right. Figure 9.5 also shows that the Utah Teapot does not contain any intrinsic luminosity. ke is, therefore, set to (0, 0, 0) in this figure, considering all RGB colour channels, which is shown as black colour in the illustration. All subsequent lighting effects result from reflections of light present in the scene. The lighting equation has the form: I = kpixel · Ilighting . Ilighting is the intensity of the light coming from the light source under consideration. kpixel is a factor that depends on various parameters: The colour and type of surface, the distance of the light source in case of attenuation and the angle at which the light in the fragment under consideration strikes the surface. In case of ambient or scattered light, one obtains the illumination equation: I = k a · Ia . Ia is the intensity of the scattered light and ka the reflection coefficient of the surface for scattered light. As with self-luminous objects, scattered light results in uniformly coloured surfaces during projection without creating a three-dimensional impression. The scattered light is, therefore, responsible for the basic brightness of
312
9
Lighting Models
Fig. 9.6 Light intensity depending on the angle of incidence
an object. ka should be rather small compared to the other material parameters. In Fig. 9.5, the Utah Teapot has received the values (0.1745, 0.01175, 0.01175) for the individual colour channels for ka as ambient part.1 Ia in this visualisation is set to (1, 1, 1), again concerning all RGB colour channels. Only when the light can be assigned a direction—in contrast to scattered light, which comes from all directions—does a non-homogeneous shading on the object surface occurs, which leaves a three-dimensional impression. On (ideally) matt surfaces, an incident light beam is reflected uniformly in all directions. How much light is reflected depends on the intensity of the incoming light, the reflection coefficient of the surface and the angle of incidence of the light beam. The influence of the angle of incidence is shown in Fig. 9.6. The same light energy that hits the circle perpendicular to the axis of the cone of light is distributed over a larger area when the circle is inclined to the axis. The more oblique the circle is, i.e., the flatter the angle of incidence of the light, the less light energy is incident at a point. This effect is also responsible for the fact that, on earth, it is warmer at the equator, where the sun’s rays strike vertically at least in spring and autumn than at the poles. Also, summer and winter are caused by this effect, because the axis of the earth is inclined differently in relation to the sun during a year. In the period from midMarch to mid-September, the northern hemisphere is inclined towards the sun. In the rest of the year, the southern hemisphere is inclined towards the sun. According to Lambert’s reflection law for such diffuse reflection, the intensity I of the reflected light is calculated according to the following illumination equation: I = Id · kd · cos θ,
(9.3)
Where Id is the intensity of the incident light, 0 ≤ kd ≤ 1 is the materialdependent reflection coefficient of the surface and θ is the angle between the normal vector n to the surface at the point under consideration and the vector l pointing in
1 See
[5] material property ruby.
9.2 Reflections by Phong
313
Fig. 9.7 Diffuse Reflection
the direction from which the light comes. Figure 9.7 illustrates this situation. Note that the diffuse reflection is independent of the viewing direction of the observer. The illumination equation for diffuse reflection applies only to angles θ between 0◦ and 90◦ . Otherwise, the light beam hits the surface from behind, so that no reflection occurs. The intensity of the incident light depends on the type of light source. With directional light, which is emitted from an infinitely distant, the light source Id has the same value everywhere, according to its intensity, specified for the directional light. In the case of a point light source, Id is the intensity of the light source multiplied by the distance-dependent attenuation factor given in Eq. (9.1). In the case of a headlamp, the light beam is attenuated further in addition to the attenuation, depending on the distance from the central axis of the light cone deviates. The ambient and diffuse material colours are usually identical, as ambient light also takes into account the diffuse reflection, but as a highly simplified model for the dispersion of the atmosphere with indirect reflection. The description is only possible indirectly by constant irradiance of surfaces. However, the ambient part is usually smaller than the diffuse. Since the reflection calculations are carried out for each fragment, the cosine in Eq. (9.3) can be evaluated very often. If the normal vector n of the surface and the vector in the direction of the light source l are normalised, i.e., normalised to the length of one, the cosine of the angle between the two vectors can be determined by their scalar product, so that Eq. (9.3) becomes simplified I = Id · kd · (n · l) With directional light and a flat surface, the vectors l and n do not change. In this case, a surface is shaded evenly. One can see this with the right cube in Fig. 9.1, which was illuminated with directional light. Because of their different inclination, the individual surfaces of the cube are shaded differently, but each individual surface is given a constant shading. In Fig. 9.5, kd is set to (0.61424, 0.04136, 0.04136) as the diffuse component of the Utah Teapot.2 Id receives the values (1, 1, 1).
2 See
[5] material property ruby.
314
9
Lighting Models
Fig. 9.8 Diffuse reflection and specular reflection
In addition to diffuse reflection, which occurs on matt surfaces and reflects light uniformly in all directions, the Phong lighting model also includes specular reflection (also known as mirror reflection). With (ideally) smooth or shiny surfaces, part of the light is reflected only in a direction which is determined according to the law “angle of incidence = angle of reflection”. The incident light beam and the reflected beam thus form the same angle with the normal vector to the surface. The difference between diffuse reflection and specular reflection is illustrated in Fig. 9.8. Note that specular reflection, unlike diffuse reflection, is dependent on the viewing direction of the observer. Real smooth surfaces usually have a thin, transparent surface layer, such as painted surfaces. The light that penetrates the lacquer layer is reflected diffusely in the colour of the object. In contrast, specular reflection occurs directly at the lacquer layer, so that the light does not change its colour on reflection. This effect can also be seen in Fig. 9.2 on the second object from the left. The bright spot on the surface of the sphere is caused by specular reflection. Although the sphere itself is grey, the point in the centre of the bright spot appears white. The same effect can be observed for the Utah Teapot in Fig. 9.5. Shading caused by diffuse reflection is only determined by the angle of incidence of the light and the reflection coefficient. The point of view of the observer is not important. Where an observer sees a specular reflection depends on the observer’s point of view. In the case of a flat surface on which a light source appears, a mirror reflection is ideally only visible in one point. However, this only applies to perfect mirrors. On more or less shiny surfaces, mirror reflection is governed by the attenuated law “angle of incidence approximately equal to the angle of reflection”. In this way, an extended bright spot is created instead of a single point on the abovementioned grey sphere. Before presenting approaches to modelling this effect of non-ideal specular reflection, the following first explains how the direction of specular reflections is calculated. In Fig. 9.9, l denotes the vector pointing in the direction from which light is incident on the considered point of the surface. n is the vector normal to the surface at the considered point. The vector v points in the direction of the observer, i.e., in the direction of projection. r is the direction in which the mirror reflection can be seen. The normal vector n forms the same angle with l and r.
9.2 Reflections by Phong
315
Fig. 9.9 Calculation of the specular reflection
To determine the specular reflection direction r for the given normalised vectors n and l, the auxiliary vector is introduced as shown in Fig. 9.9 on the right. The projection from l onto n corresponds to the vector are normalized n in the figure, shortened to the vector s. Because n and l, this projection is the vector n · cos θ . For the searched vector, r applies r = n · cos(θ ) + s.
(9.4)
The auxiliary vector s can be determined from l and the projection of l on n s = n · cos(θ ) − l. Inserting s into Eq. (9.4) yields r = 2 · n · cos(θ ) − l. As in the case of diffuse reflection, the cosine of the angle between n and l can be calculated via the scalar product: n · l = cos θ . This results in the direction of the specular reflection r = 2 · n(n · l) − l. Again, it is assumed that the surface is not illuminated from behind, i.e., that 0◦ ≤ θ < 90◦ applies, which is exactly fulfilled when the scalar product n · l is positive. Only with an ideal mirror does the specular reflection occur solely in the direction of the vector r. As a rule, real mirrors are not ideally smooth, and so specular reflection with decreasing intensity can be seen at angles other than the angle of reflection. The illumination model according to Phong takes this fact into account by calculating the intensity that the observer sees due to the specular reflection that depends on the viewing angle a and decreases with increasing α I = Is · W (θ ) · (cos(α))n .
(9.5)
316
9
Lighting Models
1
0.8
0.6
0.4
0.2
0 0
0.2
0.4
0.6
0.8
1
1.2
1.4
Fig. 9.10 The functions (cos α)64 , (cos α)8 , (cos α)2 , cos α
Is is the intensity of the incident light beam, in which, if necessary, the distancedependent attenuation in the case of a point light source and, in addition, the deviation from the axis of the beam in the case of a headlamp is calculated. The value 0 ≤ W (θ ) ≤ 1 is the proportion of light subject to specular reflection at an angle of incidence of θ . As a general rule, W (θ ) = ks is taken as a constant specular reflection coefficient of the surface. n is the specular reflection exponent of the surface. For a perfect mirror, n = ∞ would apply. The smaller n is chosen, the more the specular reflection scatters. The exponent n thus simulates the more or less existing roughness of the object surface. The rougher the material, the smaller n. Figure 9.10 shows the value of the function3 f (α) = (cos(α))n for different specular reflection exponents n. For n = 64, the function already falls at very small angle α to a value close to zero, so that specular reflection is almost only visible in the direction of the angle of reflection. In contrast, the pure cosine function with the specular reflection exponent n = 1 drops relatively slowly and produces a visible specular reflection in a larger range. The shinier and smoother a surface is, the larger the specular reflection exponent should be.
3α
is given in radians.
9.2 Reflections by Phong
317
Fig. 9.11 The angle bisector h in Phong’s model
The calculation of the cosine function in Eq. (9.7) can again be traced back to the scalar product of the normalised vectors n and r: cos α = r · v. Thus, for the specular component applies I = Is · ks · (r · v)n .
(9.6)
The vector r is automatically normalised if the vectors l and n are normalised. In Fig. 9.5, ks is set to (0.727811, 0.626959, 0.626959) as the specular component of the Utah Teapot.4 ls receives the values (1, 1, 1). For the exponent n is selected 0.6 · 128 = 76.8.5 Figure 9.12 shows the rendering results with different specular reflection exponents n. The angle α in Phong’s model indicates the deviation of the viewing direction from the ideal specular reflection direction. Another way to determine this deviation is the following consideration. If the observer is exactly in the direction of the specular reflection, the normal vector n forms the bisector between l and v. The angle β between the normal vector n and the bisector h of l and v, as shown in Fig. 9.11 is, therefore, also used as a measure of the deviation from the ideal specular reflection. However, this calculation is faster because the calculation of the reflecting vector is avoided (Fig. 9.12). In the modified Phong model, therefore, the term cos β is used instead of the term cos α in Eq. (9.7), which in turn can be determined using a scalar product cos β = n · h. The normalised angle bisector h is given by the formula h =
l+v . l+v
With directional light and an infinitely distant observer, i.e., with parallel projection, h does not change in contrast to the vector r in the original by Phong. In this
4 See 5 See
[5] material property ruby. [5] material property ruby.
9
Fig. 9.12 Rendering results with different specular reflection exponents n
318 Lighting Models
9.2 Reflections by Phong
319
case, this leads to an increase in the efficiency of the calculation. The specular part of the modified Phong model is, therefore I = Is · ks · (n · h)n .
(9.7)
Note that the diffuse reflection is stored in a kind of texture, the irradiance map. Non-uniform light from anywhere, e.g., specular reflection, is stored in the environment map (see Chap. 10). The texture of the diffuse visualisation in the irradiance map looks as if it was created by a strong low pass filtering. This is due to the fact that the scattering behaviour of the surface is also stored. The considerations in this section have always referred to a light source, a point on a surface and one of the three basic colours. If several light sources, as well as scattered light (ambient light) Ia and possible (emissive) intrinsic light Ie of the surface, are to be taken into account, the individually calculated intensity values must be added up so that the lighting equation I = I e + Ia · k a I j · f att · gcone · kd · (n · l j ) + ks · (rj · v)n +
(9.8)
j
where ka is the reflection coefficient for scattered light, which is usually identical to the reflection coefficient kd for diffuse reflection. I j is the basic intensity of the j-th light source, which in the above formula has the same proportion of diffuse and specular reflection. Of course, if the proportions Id and Is are unequal, this can be separated into diffuse and specular reflection, as described in this chapter. In the case of directional light, the two factors f att and gcone are equal to one, in the case of a point light source, the first factor models the distance-dependent attenuation of the light, and in the case of a headlamp, the second factor ensures the additional decrease in light intensity towards the edge of the cones of light. n, l, v, and r, are the vectors already used in Figs. 9.7 and 9.9, respectively. In this equation, the original Phong model for the specular reflection was used. The reflection coefficient Fig. 9.9 for the specular reflection is usually different from kd or ka . The following laws apply to material properties. No more radiation may be emitted than is received. Thus, 0 ≤ ka , kd , ks ≤ 1 with ka + kd + ks ≤ 1 must apply. If something should be represented strongly reflecting, then it is important that ks is much higher than kd . For matt surfaces, however, the kd can be set much larger than ks . For low-contrast display, it is recommended to set ka much higher than kd and ks . Plastic-like materials have a white specular material colour ks , i.e., the RGB colour channels are almost identical for ks . Metal-like materials have more RGB colour channels that are similar for diffuse and specular material colour, i.e., for each RGB colour channel kd is almost identical to ks . Nevertheless, it should be noted that a realistic representation of shiny metals is difficult to realise with the local Phong lighting model. Here you can get realistic effects by using, e.g., environment maps and projecting them onto the object (see Chap. 10). A good list of specific material properties and their implementation with the parameters ka , kd , ks and the specular
320
9
Lighting Models
reflection exponents n can be found in [5]. The visualisation of the Utah Teapot with the material property ruby is based on this in this chapter. The application of Eq. (9.8) to the three primary colours red, green and blue does not mean a great deal of additional computational effort since only the coefficients Iself-lighting , Iscattered lighting , I j , ka , kd and ks change with the different colours, which are constant and do not have to be calculated. On the other hand, the other coefficients values for each point, i.e., for each vertex or fragment, and for each light source, require laborious calculation. One technique that can lead to acceleration when several light sources are used is deferred shading, which is based on the depth buffer algorithm (see Sect. 8.2.3). The disadvantage of the depth buffer algorithm is that the complex calculations are also performed for those objects that are not visible and are overwritten by other objects further ahead in the scene in the later course of the algorithm. Therefore, in the case of deferred shading, the depth buffer algorithm first run through, in which only the depth values are entered into the z-buffer. In addition to the depth value, the normals for each pixel and the material are stored in textures. The z-buffer is also saved in texture to have access to the original depth values. Only on the second pass with already filled z-buffer, the entries for the colours in the frame buffer are updated. Since both the normals and the material are stored in textures, the illumination (without having to rasterize the geometry again) can be calculated. It should be noted that this process has to be repeated depending on the number of light sources. This means that the complex calculations for illumination only need to be carried out for the visible objects.
9.3 The Lighting Model According to Phong in the OpenGL As explained in the previous Sects. 9.1 and 9.2, the standard lighting model in the OpenGL is composed of the components ambient (ambient light), diffuse (diffuse reflections), specular (specular reflections) and emissive (self-luminous). For the lighting, the following light sources are available as directional lights, point lights and spotlights. Furthermore, material properties must be defined for an object, which determines the effects of illumination. A simple illuminated scene is shown in Fig. 9.13. The Blinn–Phong lighting model is implemented in the OpenGL as follows. In the following, the analysis is separated into a fixed function and programmable pipeline. First, the fixed function pipeline is considered, in which Blinn–Phong is the standardised procedure. One specifies in the init method the parameters for a light source. One can use the following method “setLight” from Fig. 9.14, which specifies the parameters of a light source. The source code in Fig. 9.15 defines a white light source at position (0, 2, 6). Ambient, diffuse and specular illumination or reflection are fixed to the colour white. Note that the colours are specified in the range between 0 and 1 in the data type Float instead of integer (instead of 0 to 255 with 8 bits, see Sect. 6.3).
9.3 The Lighting Model According to Phong in the OpenGL
321
Fig. 9.13 An illuminated scene in the OpenGL
Thus, the parameters of the Blinn–Phong lighting model are assigned as follows. From I = ka · Ia resulting Ia is calculated for each colour channel separately with lightAmbientCol, Id from I = Id ·kd ·(n ·l) with lightDiffuseCol and Is from I = Is · ks · (r · v)n with lightSpecularCol. The parameters ka , kd and ks determine the material properties and are also set individually for each colour channel with matAmbient, matDiffuse and matSpecular. The parameter matShininess is the exponent n of the specular light component. Figure 9.16 shows the source code examples for the material properties of the treetop from Fig. 9.12. In the programmable pipeline (core profile), the commands of the fixed function pipeline do not exist, and therefore, the formulas of the light components must be implemented. This represents a certain amount of programming effort, but in contrast, allows for extensive design options when implementing the lighting. In the fixed function pipeline, the lighting calculations are carried out before the rasterization. This has the advantage that these calculations can be made very efficiently. On closer inspection, however, one notices distortions, especially in the area of the mirror reflection, which are caused by the interpolation of the colour values in the screen. The reason for this is that the colour values for the corner points are created adapted to the lighting before the screening and these colour values are then interpolated in the screening. The flat and Gouraud shading procedures presented in Sect. 9.4 work according to this principle. To reduce such distortions, it is better to interpolate the normal values instead of the colour values, and then use the interpolated normals to calculate the illumination (and from this the colour value calculation). For this reason, one intends to implement the more complex calculation in the fragment shader of the programmable pipeline. This means that the normal of the vertices are transformed by the viewing pipeline (note that the transposed inverse of the overall matrix of the viewing pipeline is to be used, see Sect. 5.11). These geometrical calculations including the calculation
322
9
Lighting Models
Fig. 9.14 Illumination according to Blinn-Phong in the fixed function pipeline (Java): This is part of the init method of the OpenGL renderer
of the positions of the vertices with the total matrix of the viewing pipeline are performed in the vertex shader. The transformed normals are then interpolated in the rasterization and must be normalised again in the fragment shader after this phase. For each fragment, lighting calculations are implemented in the fragment shader based on the interpolated normals and from this the colour value calculations. This principle corresponds to the Phong shading from Sect. 9.4. Of course, Gouraud shading can also be implemented with shaders, but nowadays, the hardware is no longer a performance limitation for phong shading. For this purpose, the parameters for a light source are specified in the init method. A class “ LightSource” can be created for this purpose (see Figs. 9.22
9.3 The Lighting Model According to Phong in the OpenGL float float float float
2 4 6
323
lightPosition [ ] = {0.0 f , 2.0 f , 6.0 f , 1.0 f }; lightAmbientCol [ ] = {1.0 f , 1.0 f , 1.0 f , 1.0 f } ; lightDiffuseCol [ ] = {1.0 f , 1.0 f , 1.0 f , 1.0 f }; lightSpecularCol [ ] = {1.0 f , 1.0 f , 1.0 f , 1.0 f };
g l . g l L i g h t f v ( g l . GL LIGHT0 , g l . GL POSITION , lightPosition , 0); g l . g l L i g h t f v ( g l . GL LIGHT0 , g l . GL AMBIENT, lightAmbientCol , 0 ) ; g l . g l L i g h t f v ( g l . GL LIGHT0 , g l . GL DIFFUSE , lightDiffuseCol , 0); g l . g l L i g h t f v ( g l . GL LIGHT0 , g l . GL SPECULAR, lightSpecularCol , 0);
8 10 12
Fig. 9.15 Definition of a white light source at position (0, 2, 6) (Java): Part of the init method of the OpenGL renderer
2 4 6
private void s e t L e a f G r e e n M a t e r i a l (GL2 f l o a t matAmbient [ ] = { 0 . 0 f , 0 . 1 f , float matDiffuse [ ] = {0.0 f , 0.5 f , float matSpecular [ ] = {0.3 f , 0 . 3 f , f l o a t matEmission [ ] = { 0 . 0 f , 0 . 0 f , float matShininess = 1.0 f ;
8
g l . g l M a t e r i a l f v ( g l . GL FRONT AND BACK, matAmbient , 0 ) ; g l . g l M a t e r i a l f v ( g l . GL FRONT AND BACK, matDiffuse , 0 ) ; g l . g l M a t e r i a l f v ( g l . GL FRONT AND BACK, matSpecular , 0 ) ; g l . g l M a t e r i a l f v ( g l . GL FRONT AND BACK, matEmission , 0 ) ; g l . g l M a t e r i a l f (GL. GL FRONT AND BACK, matShininess ) ;
10 12 14 16 18
gl ) { 0.0 f , 0.0 f , 0.3 f , 0.0 f ,
1.0 f 1.0 f 1.0 f 1.0 f
}; }; }; };
g l . GL AMBIENT, g l . GL DIFFUSE , g l . GL SPECULAR, g l . GL EMISSION , g l . GL SHININESS ,
}
Fig. 9.16 Definition of a material for lighting (Java): This is a part of init method of the OpenGL renderer
2 4 6
float [ ] lightPosition = {0.0 f , 3 . 0 f , 3 . 0 f , 1 . 0 f }; float [ ] lightAmbientColor = {1.0 f , 1 . 0 f , 1 . 0 f , 1 . 0 f }; float [ ] lightDiffuseColor = {1.0 f , 1 . 0 f , 1 . 0 f , 1 . 0 f }; float [ ] lightSpecularColor = {1.0 f , 1 . 0 f , 1 . 0 f , 1 . 0 f }; l i g h t 0 = new L i g h t S o u r c e ( l i g h t P o s i t i o n , l i g h t A m b i e n t C o l o r , lightDiffuseColor , lightSpecularColor );
Fig. 9.17 Definition of a white light source for illumination (Java)
and 9.23). For example, the following source code of Fig. 9.17 defines a white light source at the position (0, 3, 3) Furthermore, material properties are defined in the init method for each object, as shown in Fig. 9.18.
324
2 4 6
9
Lighting Models
f l o a t [ ] matEmission = { 0 . 0 f , 0 . 0 f , 0 . 0 f , 1 . 0 f } ; f l o a t [ ] matAmbient = { 0 . 0 f , 0 . 0 f , 0 . 0 f , 1 . 0 f } ; float [ ] matDiffuse = {0.8 f , 0 . 0 f , 0 . 0 f , 1 . 0 f }; float [ ] matSpecular = {0.8 f , 0 . 8 f , 0 . 8 f , 1 . 0 f } ; float matShininess = 200.0 f ; m a t e r i a l 0 = new M a t e r i a l ( matEmission , matAmbient , m a t D i f f u s e , matSpecular , m a t S h i n i n e s s ) ;
Fig. 9.18 Definition of a red material for the illumination in the programmable pipeline (Java) 1 3 5
// t r a n s f e r p a r a m e t e r s o f gl . glUniform4fv (2 ,1 , l i g h t 0 gl . glUniform4fv (3 ,1 , l i g h t 0 gl . glUniform4fv (4 ,1 , l i g h t 0 gl . glUniform4fv (5 ,1 , l i g h t 0
l i g h t source . getPosition () ,0); . getAmbient ( ) , 0 ) ; . getDiffuse () ,0); . getSpecular ( ) , 0 ) ;
7 9 11 13
// t r a n s f e r m a t e r i a l p a r a m e t e r s gl . glUniform4fv (6 ,1 , material0 . getEmission ( ) , 0 ) ; g l . g l U n i f o r m 4 f v ( 7 , 1 , m a t e r i a l 0 . getAmbient ( ) , 0 ) ; gl . glUniform4fv (8 ,1 , material0 . ge tDiffuse ( ) , 0 ) ; gl . glUniform4fv (9 ,1 , material0 . getSpecular ( ) , 0 ) ; gl . glUniform1f (10 , material0 . getShininess ( ) ) ;
Fig.9.19 Transfer of the parameters for the illumination of the programmable pipeline to the shader program (Java)
In the display method, for each object, the light source and material parameters are transferred to the vertex and the fragment shader is specified. It is best to define it directly after passing the model-view matrix “MvMatrix”) to the shaders. Figure 9.19 shows this situation. Since the normals in the rasterizer are interpolated for each fragment, the transposed inverse of the overall matrix of the viewing pipeline must be interpolated with call glGetMvitMatrixf of the PMVMatrix class (PMVTool). To avoid having to recalculate this class each time a fragment is called, the result matrix is transferred to the vertex shader as a uniform variable. / / T r a n s f e r t h e t r a n s p o s e i n v e r s e model −v i e w m a t r i x / / to m u l t i p l y with normals gl . glUniformMatrix4fv (11 ,1 , false , t h i s . pmvMatrix . g l G e t M v i t M a t r i x f ( ) ) ;
In the vertex shader, this matrix is provided as follows: l a y o u t ( l o c a t i o n = 1 1 ) u n i f o r m mat4 n M a t r i x ;
In the vertex shader, the updated transformed normals are calculated and passed to the rasterizer as out variables. 1 3
/ / C a l c u l a t e n o r m a l i n v i e w −s p a c e / / m a t r i x has t o be t h e t r a n s p o n s e i n v e r s e o f mvMatriv v s _ o u t . N = v e c 3 ( n M a t r i x ∗ v e c 4 ( vNormal , 0 . 0 f ) ) ;
9.3 The Lighting Model According to Phong in the OpenGL
325
Then, after the normals are normalised again in the fragment shader, the illumination calculations (according to Blinn–Phong) are performed. 1 3 5 7 9 11
/ / N o r m a l i s e t h e i n c o m i n g N , L and V v e c t o r s vec3 N = n o r m a l i s e ( f s _ i n .N ) ; vec3 L = n o r m a l i s e ( f s _ i n . L ) ; vec3 V = n o r m a l i s e ( f s _ i n .V ) ; vec3 H = n o r m a l i s e (L + V ) ; / / Compute t h e d i f f u s e and s p e c u l a r c o m p o n e n t s f o r e a c h f r a g m e n t v e c 3 d i f f u s e = max ( d o t ( N , L ) , 0 . 0 ) ∗ v e c 3 ( m a t e r i a l D i f f u s e ) ∗ vec3 ( l i g h t S o u r c e D i f f u s e ) ; v e c 3 s p e c u l a r = pow ( max ( d o t ( N , H ) , 0 . 0 ) , m a t e r i a l S h i n i n e s s ) ∗ vec3 ( m a t e r i a l S p e c u l a r ) ∗ vec3 ( l i g h t S o u r c e S p e c u l a r ) ;
13 15 17
/ / Write f i n a l colour to the framebuffer FragColor = vec4 ( e m i s s i v + ambient + d i f f u s e + s p e c u l a r ,
1.0);
/ / Write only the emissive colour to the framebuffer / / FragColor = vec4 ( emissive , 1 . 0 ) ;
19 21 23
/ / Write only the ambient colour to the framebuffer / / FragColor = vec4 ( ambient , 1 . 0 ) ; / / Write only the d i f f u s e colour to the framebuffer / / FragColor = vec4 ( d i f f u s e , 1 . 0 ) ;
25 27
/ / Write only the specular colour to the framebuffer / / FragColor = vec4 ( specular , 1 . 0 ) ;
In the case of additional texture incorporation (techniques for this are explained in the Chap. 10), the final colour values can be calculated as follows: 1 3
/ / Write f i n a l colour to the framebuffer FragColor = ( vec4 ( e m i s s i v + ambient + d i f f u s e , 1 . 0 ) ∗ t e x t u r e ( t e x , f s _ i n . vUV ) ) + v e c 4 ( s p e c u l a r , 1 . 0 ) ;
Figures 9.20 and 9.21 show the implementation in the vertex and fragment shaders. If one wants to display several light sources of different types, then it is desirable to send the parameters of the light sources bundled per light source to the shaders. This means that a procedure is needed that passes array structures to the vertex or fragment shaders. For this, the LightSource class is modified to allow access via uniform array structure as a struct. Essentially, only the structures defined in the class LightSource arrays must be accessible from outside, in addition to the get and set methods. In addition, the parameters for a headlamp illumination are supplemented. Figures 9.22 and 9.23 show the extended LightSource class.
326 1 3
9 // p o s i t i o n and layout ( location layout ( location layout ( location
color = 0) = 1) = 2)
of in in in
Lighting Models
v e r t e x as i n p u t v e r t e x a t t r i b u t e vec3 v P o s i t i o n ; vec3 vInColor ; v e c 3 vNormal ;
5 7 9
// P r o j e c t i o n and model v i e w m a t r i x as i n p u t uniform v a r i a b l e s l a y o u t ( l o c a t i o n = 0 ) u n i f o r m mat4 pMatrix ; l a y o u t ( l o c a t i o n = 1 ) u n i f o r m mat4 mvMatrix ; layout ( l o c a t i o n = 2) uniform vec4 l i g h t P o s i t i o n ; l a y o u t ( l o c a t i o n = 1 1 ) u n i f o r m mat4 nMatrix ;
11 13 15 17
// Outputs from v e r t e x s h a d e r out VS OUT { v e c 3 N; vec3 L ; v e c 3 V; } vs out ;
19 21 23
void main ( void ) { // C a l c u l a t e view s p a c e c o o r d i n a t e v e c 4 P = mvMatrix vec4 ( vPosition , 1 . 0 ) ;
25
// C a l c u l a t e normal i n view s p a c e // normals do n o t change w i t h t r a n s l a t i o n // t h e r e f o r e nMatrix i s s u f f i c i e n t as 3 x3 m a t r i x v s o u t .N = mat3 ( nMatrix ) vNormal ;
27 29
// C a l c u l a t e l i g h t v e c t o r v s o u t . L = l i g h t P o s i t i o n . xyz
31 33
P . xyz ;
// C a l c u l a t e v i e w v e c t o r v s o u t .V = P . xyz ;
35 // C a l c u l a t e t h e c l i p s p a c e p o s i t i o n o f each v e r t e x g l P o s i t i o n = pMatrix P;
37 }
Fig. 9.20 Implementation of the illumination model according to Blinn–Phong in the vertex shader (GLSL)
It should be noted that the shaders do not perform attenuation inversely proportional to the distance, but the light source is a point light source. For the sake of clarity, this attenuation is neglected in the implementation. In the following, the Utah Teapot is shown as an object with several light sources. The method initObject1 from Figs. 9.24, 9.25 and 9.26 initialises the Utah teapot. This method is called in the init method, as soon as all shaders and the program are activated. In contrast to the previous implementation of a point light source, where the locations of the in-variables and uniforms in the shader are stored in of the init method, a further variant is presented below. If the locations are not set in advance, they are automatically assigned by the shader. This will be explained in the following for the uniforms. With the method gl.glGetUniformLocation,
9.3 The Lighting Model According to Phong in the OpenGL 1 3 5 7 9
327
// Parameters o f l i g h t s o u r c e as uniform v a r i a b l e s from a p p l i c a t i o n layout ( l o c a t i o n = 3) uniform vec4 lightSourceAmbient ; layout ( l o c a t i o n = 4) uniform vec4 l i g h t S o u r c e D i f f u s e ; layout ( l o c a t i o n = 5) uniform vec4 l i g h t S o u r c e S p e c u l a r ; // M a t e r i a l p a r a m e t e r s as uniform v a r i a b l e s layout ( l o c a t i o n = 6) uniform vec4 materialEmission ; layout ( l o c a t i o n = 7) uniform vec4 materialAmbient ; layout ( l o c a t i o n = 8) uniform vec4 m a t e r i a l D i f f u s e ; layout ( l o c a t i o n = 9) uniform vec4 m a t e r i a l S p e c u l a r ; layout ( l o c a t i o n = 10) uniform float m a t e r i a l S h i n i n e s s ;
11 13 15 17 19 21
in vec4 vColor ; out v e c 4 F r a g C o l o r ; // I n p u t from v e r t e x s h a d e r i n VS OUT { v e c 3 N; vec3 L ; v e c 3 V; } fs in ;
23 25 27 29 31
void main ( void ) { vec3 e m i s s i v e = vec3 ( materialEmission ) ; v e c 3 ambient = v e c 3 ( m a t e r i a l A m b i e n t ) vec3 ( lightSourceAmbient ) ; vec3 d i f f u s e A l b e d o = vec3 ( m a t e r i a l D i f f u s e ) vec3 ( l i g h t S o u r c e D i f f u s e ) ; vec3 specularAlbedo = vec3 ( m a t e r i a l S p e c u l a r ) vec3 ( l i g h t S o u r c e S p e c u l a r ) ;
33 // Normalize t h e incoming N, L and V v e c t o r s v e c 3 N = n o r m a l i z e ( f s i n .N ) ; vec3 L = normalize ( f s i n . L ) ; v e c 3 V = n o r m a l i z e ( f s i n .V ) ; v e c 3 H = n o r m a l i z e (L + V ) ;
35 37 39
// Compute t h e d i f f u s e and s p e c u l a r components f o r each // f r a g m e n t v e c 3 d i f f u s e = max( dot (N, L ) , 0 . 0 ) diffuseAlbedo ; v e c 3 s p e c u l a r = pow (max( dot (N, H) , 0 . 0 ) , m a t e r i a l S h i n i n e s s ) specularAlbedo ;
41 43 45
// Write f i n a l c o l o r t o t h e f r a m e b u f f e r F r a g C o l o r = v e c 4 ( e m i s s i v e + ambient + d i f f u s e + s p e c u l a r , 1 . 0 ) ;
47 }
Fig. 9.21 Implementation of the illumination model according to Blinn–Phong in the Fragment Shader (GLSL)
the locations can also be activated and queried. This must be done after linking the program. Since these locations are also required in the display method for activating the CPU- and GPU-based uniforms, they should be stored in an array, here, e.g., lightsUnits for the parameters of a light source i. For example, lightsUnits[i].ambient, in this case, returns the location of the parameter of the ambient light component of a light source i from a shader point of view. Then
328
9 1 3 5 7 9 11
Lighting Models
public c l a s s L i g h t S o u r c e { private f l o a t [ ] p o s i t i o n = new f l o a t [ 4 ] ; private f l o a t [ ] ambient = new f l o a t [ 3 ] ; private f l o a t [ ] d i f f u s e = new f l o a t [ 3 ] ; private f l o a t [ ] s p e c u l a r = new f l o a t [ 3 ] ; private f l o a t [ ] a t t e n u a t i o n = new f l o a t [ 4 ] ; private f l o a t s p o t C u t O f f ; private f l o a t spotExponent ; private f l o a t [ ] s p o t D i r e c t i o n = new f l o a t [ 3 ] ; public L i g h t S o u r c e ( ) { }
13 15 17 19 21 23 25
// // // //
p o s i t i o n i m i t a t e s GL POSITION i n g l L i g h t p o s i t i o n : f o u r t h c o o r d i n a t e component i s 0.0 f i f d i r e c t i o n a l l i g h t source ( i n f i n i t e ) 1 . 0 f i f p o i n t or s p o t l i g h t s o u r c e ( f i n i t e )
// ambient i m i t a t e s GL AMBIENT i n g l L i g h t // d i f f u s e i m i t a t e s GL DIFFUSE i n g l L i g h t // s p e c u l a r i m i t a t e s GL SPECULAR i n g l L i g h t // // // //
SPOT LIGHTS s p o t C u t O f f i m i t a t e s GL SPOT CUTOFF i n g l L i g h t s p o t E x p o n e n t i m i t a t e s GL SPOT EXPONENT i n g l L i g h t s p o t D i r e c t i o n i m i t a t e s GL SPOT DIRECTION i n g l L i g h t
27 29 31 33 35 37
public L i g h t S o u r c e ( f l o a t [ ] p o s i t i o n , f l o a t [ ] ambient , float [ ] diffuse , float [ ] specular , float [ ] attenuation ) { this . p o s i t i o n = p o s i t i o n ; t h i s . ambient = ambient ; this . d i f f u s e = d i f f u s e ; this . specular = specular ; this . attenuation = attenuation ; this . spotCutOff = 1; // maximal c o s ( 1 8 0 ) , i d no s p o t t h i s . spotExponent = 0 ; // r e s u l t cone a t t e n u a t i o n z e r o t h i s . s p o t D i r e c t i o n = new f l o a t [ ] { 0 . 0 f , 0 . 0 f , 1 . 0 f } ; }
39 41 43 45 47 49 51
public L i g h t S o u r c e ( f l o a t [ ] p o s i t i o n , f l o a t [ ] ambient , float [ ] diffuse , float [ ] specular , float [ ] attenuation , f l o a t spotCutOff , f l o a t spotExponent , float [ ] spotDirection ) { this . p o s i t i o n = p o s i t i o n ; t h i s . ambient = ambient ; this . d i f f u s e = d i f f u s e ; this . specular = specular ; this . attenuation = attenuation ; this . spotCutOff = spotCutOff ; t h i s . spotExponent = spotExponent ; this . spotDirection = spotDirection ; }
53 55
public f l o a t [ ] g e t P o s i t i o n ( ) { return p o s i t i o n ; }
57 59
public void s e t P o s i t i o n ( f l o a t [ ] this . p o s i t i o n = p o s i t i o n ; }
Fig. 9.22 LightSource class part 1 (Java)
position ) {
9.3 The Lighting Model According to Phong in the OpenGL 1 public f l o a t [ ] getAmbient ( ) { return ambient ; }
3 5
public void setAmbient ( f l o a t [ ] ambient ) { t h i s . ambient = ambient ; }
7 9
public f l o a t [ ] g e t D i f f u s e ( ) { return d i f f u s e ; }
11 13
public void s e t D i f f u s e ( f l o a t [ ] this . d i f f u s e = d i f f u s e ; }
15
diffuse ) {
17 public f l o a t [ ] g e t S p e c u l a r ( ) { return s p e c u l a r ; }
19 21
public void s e t S p e c u l a r ( f l o a t [ ] this . specular = specular ; }
23
specular ) {
25 public f l o a t [ ] g e t A t t e n u a t i o n ( ) { return a t t e n u a t i o n ; }
27 29
public void s e t A t t e n u a t i o n ( f l o a t [ ] a t t e n u a t i o n ) { this . attenuation = attenuation ; }
31 33
public f l o a t g e t S p o t C u t O f f ( ) { return s p o t C u t O f f ; }
35 37
public void s e t S p o t C u t O f f ( f l o a t s p o t C u t O f f ) { this . spotCutOff = spotCutOff ; }
39 41
public f l o a t getSpotExponent ( ) { return spotExponent ; }
43 45
public void s e t S p o t E x p o n e n t ( f l o a t spotExponent ) { t h i s . spotExponent = spotExponent ; }
47 49
public f l o a t [ ] g e t S p o t D i r e c t i o n ( ) { return s p o t D i r e c t i o n ; }
51 53
public void s e t S p o t D i r e c t i o n ( f l o a t [ ] s p o t D i r e c t i o n ) { this . spotDirection = spotDirection ; }
55 57 }
Fig. 9.23 LightSource Class part 2 (Java)
329
330
9
Lighting Models
/ 2 4 6 8 10
I n i t i a l i z e s t h e GPU f o r drawing o b j e c t 2 @param g l OpenGL c o n t e x t / private void i n i t O b j e c t 1 (GL3 g l ) { // BEGIN: Prepare t e a p o t f o r drawing ( o b j e c t 1) shaderProgram1 = new ShaderProgram ( g l ) ; shaderProgram1 . loadShaderAndCreateProgram ( shaderPath , v e r t e x S h a d e r 1 F i l e N a m e , f r a gmentShader1FileName ) ;
12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56
PMatrixUnit = g l . glGetUniformLocation ( shaderProgram1 . getShaderProgramID ( ) , ” pMatrix ” ) ; MVMatrixUnit = g l . glGetUniformLocation ( shaderProgram1 . getShaderProgramID ( ) , ” mvMatrix ” ) ; // Get l o c a t i o n s o f l i g h t s s o u r c e s f o r ( i n t i = 0 ; i < l i g h t s U n i t s . l e n g t h ; i ++) { l i g h t s U n i t s [ i ] = new L i g h t S o u r c e U n i t s ( ) ; lightsUnits [ i ] . position = g l . glGetUniformLocation ( shaderProgram1 . getShaderProgramID ( ) , ” l i g h t s [ ” + S t r i n g . valueOf ( i ) + ” ] . p o s i t i o n ” ) ; l i g h t s U n i t s [ i ] . ambient = g l . glGetUniformLocation ( shaderProgram1 . getShaderProgramID ( ) , ” l i g h t s [ ” + S t r i n g . valueOf ( i ) + ” ] . ambient ” ) ; lightsUnits [ i ] . diffuse = g l . glGetUniformLocation ( shaderProgram1 . getShaderProgramID ( ) , ” l i g h t s [ ” + S t r i n g . valueOf ( i ) + ” ] . d i f f u s e ” ) ; lightsUnits [ i ] . specular = g l . glGetUniformLocation ( shaderProgram1 . getShaderProgramID ( ) , ” l i g h t s [ ” + S t r i n g . valueOf ( i ) + ” ] . s p e c u l a r ” ) ; lightsUnits [ i ] . attenuation = g l . glGetUniformLocation ( shaderProgram1 . getShaderProgramID ( ) , ” l i g h t s [ ” + S t r i n g . valueOf ( i ) + ” ] . attenuation ” ) ; l i g h t s U n i t s [ i ] . spotC = g l . glGetUniformLocation ( shaderProgram1 . getShaderProgramID ( ) , ” l i g h t s [ ” + S t r i n g . v a l u e O f ( i ) + ” ] . spotC ” ) ; l i g h t s U n i t s [ i ] . spotExp = g l . glGetUniformLocation ( shaderProgram1 . getShaderProgramID ( ) , ” l i g h t s [ ” + S t r i n g . v a l u e O f ( i ) + ” ] . spotExp ” ) ; l i g h t s U n i t s [ i ] . spotDir = g l . glGetUniformLocation ( shaderProgram1 . getShaderProgramID ( ) , ” l i g h t s [ ” + S t r i n g . valueOf ( i ) + ” ] . spotDir ” ) ; }
58 60 62
// Load v e r t i c e s from o u t OBJ f i l e TriangularMesh o b j e c t = loadOBJVertexCoordinates ( ” r e s o u r c e s / custom / t e a p o t . o b j ” );
Fig. 9.24 Method to initialise an object, in this case, Utah teapot part 1 (Java)
9.3 The Lighting Model According to Phong in the OpenGL
2
331
// f l o a t [ ] c o l o r 0 = { 0 . 7 f , 0 . 7 f , 0 . 7 f } ; this . vertexCoordinates = object . v e r t i c e s ; this . v e r t e x I n d i c e s = object . i n d i c e s ;
4 6 8 10 12 14
// C r e a t e and a c t i v a t e a v e r t e x a r r a y o b j e c t (VAO) vaoName [ 0 ] = t h i s . c r e a t e V e r t e x A r r a y ( g l ) ; g l . g l B i n d V e r t e x A r r a y ( vaoName [ 0 ] ) ; // a c t i v a t e and i n i t i a l i z e v e r t e x b u f f e r o b j e c t (VBO) vboName [ 0 ] = t h i s . c r e a t e B u f f e r ( g l ) ; g l . g l B i n d B u f f e r (GL. GL ARRAY BUFFER, vboName [ 0 ] ) ; // f l o a t s u s e 4 b y t e s i n Java g l . g l B u f f e r D a t a (GL. GL ARRAY BUFFER, this . vertexCoordinates . length F l o a t . BYTES, F l o a t B u f f e r . wrap ( t h i s . v e r t e x C o o r d i n a t e s ) , GL. GL STATIC DRAW ) ;
16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52
// a c t i v a t e and i n i t i a l i z e i n d e x b u f f e r o b j e c t (IBO) iboName [ 0 ] = t h i s . c r e a t e B u f f e r ( g l ) ; g l . g l B i n d B u f f e r (GL. GL ELEMENT ARRAY BUFFER, iboName [ 0 ] ) ; // i n t e g e r s u s e 4 b y t e s i n Java g l . g l B u f f e r D a t a (GL. GL ELEMENT ARRAY BUFFER, this . v e r t e x I n d i c e s . length I n t e g e r . BYTES, I n t B u f f e r . wrap ( t h i s . v e r t e x I n d i c e s ) , GL . GL STATIC DRAW ) ; // A c t i v a t e and o r d e r v e r t e x b u f f e r o b j e c t d a t a // f o r t h e v e r t e x s h a d e r // The v e r t e x b u f f e r c o n t a i n s : // p o s i t i o n ( 3 ) , c o l o r ( 3 ) , normals ( 3 ) // D e f i n i n g i n p u t f o r v e r t e x s h a d e r // P o i n t e r f o r t h e v e r t e x s h a d e r t o t h e p o s i t i o n // i n f o r m a t i o n p e r v e r t e x gl . glEnableVertexAttribArray ( 0 ) ; g l . g l V e r t e x A t t r i b P o i n t e r ( 0 , 3 , GL. GL FLOAT, f a l s e , 8 4 , 0 ) ; // P o i n t e r f o r t h e v e r t e x s h a d e r t o t h e normal // i n f o r m a t i o n p e r v e r t e x gl . glEnableVertexAttribArray ( 1 ) ; g l . g l V e r t e x A t t r i b P o i n t e r ( 1 , 3 , GL. GL FLOAT, f a l s e , 8 4 , 5 F l o a t .BYTES ) ; // P o i n t e r f o r t h e v e r t e x s h a d e r t o t h e t e x t u r e // c o o r d i n a t e i n f o r m a t i o n p e r v e r t e x gl . glEnableVertexAttribArray ( 2 ) ; g l . g l V e r t e x A t t r i b P o i n t e r ( 2 , 2 , GL. GL FLOAT, f a l s e , 8 4 , 3 F l o a t .BYTES ) ; // S p e c i f i c a t i o n o f m a t e r i a l p a r a m e t e r s ( b l u e m a t e r i a l ) f l o a t [ ] matEmission = { 0 . 0 f , 0 . 0 f , 0 . 0 f , 1 . 0 f } ; f l o a t [ ] matAmbient = { 0 . 1 7 4 5 f , 0 . 0 1 1 7 5 f , 0 . 0 1 1 7 5 f , 1 . 0 f } ; float [ ] matDiffuse = {0.61424 f , 0.04136 f , 0.04136 f , 1.0 f }; float [ ] matSpecular = {0.727811 f , 0.626959 f , 0.626959 f , 1.0 f }; float matShininess = 256.0 f ;
54 56
m a t e r i a l 0 = new M a t e r i a l ( matEmission , matAmbient , matDiffuse , matSpecular , matShininess ) ;
Fig. 9.25 Method to initialise an object, in this case, Utah teapot part 2 (Java)
332
9
Lighting Models
1 // Load and p r e p a r e t e x t u r e Texture t e x t u r e = null , normalmap = n u l l ; try { File textureFile = new F i l e ( t e x t u r e P a t h+t e x t u r e F i l e N a m e 0 ) ; t e x t u r e = TextureIO . newTexture ( t e x t u r e F i l e , true ) ;
3 5 7 9
texture . setTexParameteri ( gl , g l . GL TEXTURE MIN FILTER , g l . GL LINEAR ) ; texture . setTexParameteri ( gl , g l . GL TEXTURE MAG FILTER, g l . GL LINEAR ) ; texture . setTexParameteri ( gl , g l . GL TEXTURE WRAP S, g l .GL MIRRORED REPEAT ) ; texture . setTexParameteri ( gl , g l . GL TEXTURE WRAP T, g l .GL MIRRORED REPEAT ) ; } catch ( IOException e ) { e . printStackTrace ( ) ; } i f ( t e x t u r e != n u l l && normalmap != n u l l ) System . out . p r i n t l n ( ” Texture l o a d e d s u c c e s s f u l l y from : ” + t e x t u r e P a t h ) ; else System . e r r . p r i n t l n ( ” E r r o r l o a d i n g t e x t u r e . ” ) ;
11 13 15 17 19 21 23 25
texture . enable ( gl ) ; // A c t i v a t e t e x t u r e i n s l o t 0 // ( might have t o go t o ”\ t e x t t t { d i s p l a y }”) g l . g l A c t i v e T e x t u r e (GL TEXTURE0 ) ; // Use t e x t u r e as 2 D t e x t u r e // ( might have t o go t o ”\ t e x t t t { d i s p l a y }”) g l . g l B i n d T e x t u r e (GL TEXTURE 2D, texture . getTextureObject ( gl ) ) ;
27 29 31 33 35
// END: Prepare t e a p o t f o r drawing }
Fig. 9.26 Method to initialise an object, in this case, Utah teapot part 3 (Java)
the Utah teapot is loaded, its vertices, normals and texture coordinates specified by vertex buffer object (VBO) and vertex array object (VAO) are transferred to the GPU. Finally, the material properties are initialised and a texture is loaded and assigned (see Chap. 10). The displayObject1 method, shown in Figs. 9.27 and 9.28, is called in the display method. It prepares the Utah teapot for drawing. With glUniform, the uniforms with the previously assigned locations are passed to the shaders. Then with glDrawElements, the call for drawing the Utah teapot is made. The vertex shader is almost identical to that of the point light source in Fig. 9.20. This is shown in Fig. 9.29. The Blinn–Phong illumination model according to Eq. (9.8) is implemented in the fragment shader of Figs. 9.30–9.32. f att is identical to the variable atten, and gcone corresponds to the variable textttspotatten. More details about the illumination model in the OpenGL can be found in [6].
9.3 The Lighting Model According to Phong in the OpenGL
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
333
private void d i s p l a y O b j e c t 1 (GL3 g l ) { // BEGIN: Draw t h e secon d o b j e c t ( o b j e c t 1) // Utah Teapot w i t h a r b i t r a r y l i g h t s o u r c e s g l . glUseProgram ( shaderProgram1 . getShaderProgramID ( ) ) ; // T r a n s f e r t h e PVM Matrix ( model v i e w and // p r o j e c t i o n m a t r i x ) t o t h e v e r t e x s h a d e r g l . g l U n i f o r m M a t r i x 4 f v ( PMatrixUnit , 1 , f a l s e , t h i s . pmvMatrix . g l G e t P M a t r i x f ( ) ) ; g l . g l U n i f o r m M a t r i x 4 f v ( MVMatrixUnit , 1 , f a l s e , t h i s . pmvMatrix . glGetMvMatrixf ( ) ) ; // t r a n s f e r m a t e r i a l p a r a m e t e r s g l . glUniform4fv ( g l . glGetUniformLocation ( shaderProgram1 . getShaderProgramID ( ) , ” materialEmission ” ) ,1 , material0 . getEmission ( ) , 0 ) ; g l . glUniform4fv ( g l . glGetUniformLocation ( shaderProgram1 . getShaderProgramID ( ) , ” materialAmbient ” ) , 1 , m a t e r i a l 0 . getAmbient ( ) , 0 ) ; g l . glUniform4fv ( g l . glGetUniformLocation ( shaderProgram1 . getShaderProgramID ( ) , ” materialDiffuse ” ) ,1 , material0 . getDiffuse () , 0); g l . glUniform4fv ( g l . glGetUniformLocation ( shaderProgram1 . getShaderProgramID ( ) , ” materialSpecular ” ) ,1 , material0 . getSpecular () , 0); g l . glUniform1f ( g l . glGetUniformLocation ( shaderProgram1 . getShaderProgramID ( ) , ” materialShininess ” ) , material0 . getShininess ( ) ) ;
32 34 36
// T r a n s f e r t h e t r a n s p o s e i n v e r s e model v i e w m a t r i x // t o m u l t i p l y w i t h normals g l . glUniformMatrix4fv ( g l . glGetUniformLocation ( shaderProgram1 . getShaderProgramID ( ) , ” nMatrix ” ) , 1 , f a l s e , t h i s . pmvMatrix . g l G e t M v i t M a t r i x f ( ) ) ;
38 40
g l . glUniform1i ( g l . glGetUniformLocation ( shaderProgram1 . getShaderProgramID ( ) , ” numLights ” ) , numberLights ) ;
42 44 46 48 50 52 54 56 58 60
f o r ( i n t i = 0 ; i < numberLights ; i ++) { //same as l i g h t s [ i ] . p o s i t i o n i n f r a g m e n t s h a d e r gl . glUniform4fv ( l i g h t s U n i t s [ i ] . position , 1 , lights [ i ] . getPosition () , 0); //same as l i g h t s [ i ] . ambient i n f r a g m e n t s h a d e r g l . g l U n i f o r m 3 f v ( l i g h t s U n i t s [ i ] . ambient , 1 , l i g h t s [ i ] . getAmbient ( ) , 0 ) ; //same as l i g h t s [ i ] . d i f f u s e i n f r a g m e n t s h a d e r gl . glUniform3fv ( l i g h t s U n i t s [ i ] . d i f f u s e , 1 , lights [ i ] . getDiffuse () , 0); //same as l i g h t s [ i ] . s p e c u l a r i n f r a g m e n t s h a d e r gl . glUniform3fv ( l i g h t s U n i t s [ i ] . specular , 1 , l i g h t s [ i ] . getSpecular () , 0); //same as l i g h t s [ i ] . a t t e n u a t i o n i n f r a g m e n t s h a d e r gl . glUniform4fv ( l i g h t s U n i t s [ i ] . attenuation , 1 , l i g h t s [ i ] . getAttenuation () , 0 ) ; // same as l i g h t s [ i ] . spotC i n f r a g m e n t s h a d e r g l . g l U n i f o r m 1 f ( l i g h t s U n i t s [ i ] . spotC , l i g h t s [ i ] . getSpotCutOff ( ) ) ;
Fig. 9.27 Method displayObject1 for the preparatory visualisation of the Utah teapot part 1 (Java)
334
9
1
// same a s l i g h t s [ i ] . s p o t E x p i n f r a g m e n t s h a d e r g l . g l U n i f o r m 1 f ( l i g h t s U n i t s [ i ] . spotExp , l i g h t s [ i ] . getSpotExponent ( ) ) ; // same as l i g h t s [ i ] . s p o t D i r i n f r a g m e n t s h a d e r g l . glUniform3fv ( l i g h t s U n i t s [ i ] . spotDir , 1 , l i g h t s [ i ] . getSpotDirection () , 0);
3 5 7 9 11 13
15
Lighting Models
} // Draws t h e e l e m e n t s i n t h e o r d e r d e f i n e d // by t h e i n d e x b u f f e r o b j e c t (IBO) g l . glDrawElements ( GL TRIANGLES, // mode this . v e r t e x I n d i c e s . length , // c o u n t GL UNSIGNED INT , // t y p e 0 // e l e m e n t a r r a y b u f f e r o f f s e t ); }
Fig. 9.28 Method displayObject1 for the preparatory visualisation of the Utah teapot part 2 (Java)
9.4 Shading In the calculations for the reflection of light from a surface in Sect. 9.2, it was assumed that the normal vector is known at every point on the surface. To correctly colour a fragment in the projection plane, it is necessary to determine not only which object surfaces are visible there, but also where the beam passing through the corresponding pixel hits the surface. For surfaces modelled with cubic freeform surfaces, this would mean that one would have to solve a system of equations whose unknowns occur to the third power, which would require enormous computing effort. Therefore, when calculating the reflection, one does not use the freeform surfaces themselves, but their approximation by plane polygons. In the simplest case, one ignores that the normal vectors should actually be calculated for the original curved surface, and instead determines the normal vector of the respective polygon. With constant shading (flat shading), the colour of a polygon is calculated at only one point, i.e., one normal vector and one point for shading are used per polygon. The polygon is shaded evenly with the colour thus determined. This procedure would be correct if the following conditions were met: • The light source is infinitely far away so that n · l is constant. • The observer is infinitely far away, so that n · v is constant. • The polygon represents the actual surface of the object and is not just an approximation of a curved surface.
9.4 Shading
335
#v e r s i o n 430 c o r e 2 u n i f o r m i n t numLights ; 4 6 8
// p o s i t i o n , c o l o r , normal and t e x t u r e c o o r d i n a t e s o f v e r t e x // as i n p u t v e r t e x a t t r i b u t e layout ( l o c a t i o n = 0) in vec3 v P o s i t i o n ; l a y o u t ( l o c a t i o n = 1 ) i n v e c 3 vNormal ; l a y o u t ( l o c a t i o n = 2 ) i n v e c 2 vInUV ;
10 12 14 16
// P r o j e c t i o n , model v i e w m a t r i x and // t r a n s p o s e d i n v e r s e model v i e w m a t r i x // f o r normal t r a n s f o r m a t i o n // as i n p u t uniform v a r i a b l e s u n i f o r m mat4 pMatrix ; u n i f o r m mat4 mvMatrix ; u n i f o r m mat4 nMatrix ;
18 20 22 24
// Outputs from v e r t e x s h a d e r out VS OUT { vec4 P; v e c 3 N; v e c 2 vUV ; } vs out ;
26 28 30
void main ( void ) { // C a l c u l a t e view s p a c e c o o r d i n a t e v e c 4 P = mvMatrix vec4 ( vPosition , 1 . 0 ) ; vs out .P = P;
32 // C a l c u l a t e normal i n view s p a c e // Matrix has t o be t h e t r a n s p o n s e i n v e r s e o f mvMatriv // f o r t r a n s f o r m a t i o n o f normals ! // The f o r t h normal c o o r d i n a t e component i s // 0 . 0 f i n homogenous c o o r d i n a t e s //N i s NOT e q u a l t o v e c 3 ( nMatrix ) vNormal v s o u t .N = v e c 3 ( nMatrix v e c 4 ( vNormal , 0 . 0 f ) ) ;
34 36 38 40
// T r a n s f e r t e x t u r e c o o r d i n a t e s t o f r a g m e n t s h a d e r v s o u t . vUV = vInUV ;
42 44 46
// C a l c u l a t e t h e c l i p s p a c e p o s i t i o n o f each v e r t e x g l P o s i t i o n = pMatrix P; }
Fig. 9.29 Vertex shader for multiple light sources (GLSL)
Under these assumptions, shading can be calculated relatively easily and quickly, but it leads to unrealistic representations. Figure 9.33 shows the same sphere as in Fig. 4.11, also with different tessellations. In Fig. 9.33, however, a constant shading was used. Even with the very fine approximation by triangles on the far right of the figure, the facet structure on the surface of the sphere is still clearly visible, whereas in
336
9
Lighting Models
#v e r s i o n 430 c o r e 2 4 6 8 10 12 14 16 18
// a t t e n u a t i o n ( c 1 , c 2 , c 3 , maximum v a l u e ) // f o r d i r e c t i o n a l l i g h t s e t a t t e n u a t i o n v e c t o r ( 1 , 0 , 0 , 1 ) // > a t t e n u a t i o n =1 // o f t e n ( 0 , 1 , 0 , 1 ) s t r u c t LightSource { vec4 p o s i t i o n ; v e c 3 ambient ; vec3 d i f f u s e ; vec3 s p e c u l a r ; vec4 a t t e n u a t i o n ; f l o a t spotC , spotExp ; vec3 spotDir ; }; // we e x p e c t maximum 8 l i g h t s o u r c e s w i t h x , y , z , w components ; const i n t maxNumLights = 8 ;
20 22
// p a r a m e t e r s o f l i g h t s o u r c e as uniform v a r i a b l e s from a p p l i c a t i o n u n i f o r m L i g h t S o u r c e l i g h t s [ maxNumLights ] ; u n i f o r m i n t numLights ;
24 26 28 30 32
// M a t e r i a l p a r a m e t e r s as uniform v a r i a b l e s uniform vec4 materialEmission ; uniform vec4 materialAmbient ; uniform vec4 m a t e r i a l D i f f u s e ; uniform vec4 m a t e r i a l S p e c u l a r ; uniform float m a t e r i a l S h i n i n e s s ; // p r e d e f i n e d t y p e f o r t e x t u r e u s a g e l a y o u t ( b i n d i n g = 0 ) u n i f o r m sampler2D t e x ;
34 36
// Write c o l o r t o t h e f i n a l f r a m e b u f f e r // Color p e r p i x e l t o be v i s u a l i z e d out v e c 4 F r a g C o l o r ;
38 40 42 44
// I n p u t from v e r t e x s h a d e r i n VS OUT { vec4 P; v e c 3 N; v e c 2 vUV ; } fs in ;
46 48 50 52 54
void main ( void ) { // e m i s s i v e m a t e r i a l c o l o r vec3 e m i s s i v e = vec3 ( materialEmission ) ; vec4 c o l o r = vec4 ( emissive , 1 . 0 ) ; //To combine w i t h t e x t u r e d e c l a r e g l o b a l v e c 3 ambient , d i f f u s e , s p e c u l a r , a t t e n ;
56 58
// Normalize t h e incoming N v e c t o r s ( normals ) v e c 3 N = n o r m a l i z e ( f s i n .N ) ;
Fig. 9.30 Fragment shader for multiple light sources part 1 (GLSL)
9.4 Shading
337
1 3 5 7
// C a l c u a t e v i e w v e c t o r s V and n o r m a l i z e t h e s e // TODO: V = P . x y z and n o r m a l i z e // Use c a r t e s i a n c o o r d i n a t e s f o r p o i n t s P as f o l l o w s // (P . x /P . w , P . y /P . w , P . z /P . w) i f P i n homogenenous c o o r d i n a t e s // P . w i s s t i l l 1 . 0 f a f t e r view s p a c e t r a n s f o r m a t i o n // t h e r e f o r e u s e P . x y z i n s t e a d o f P . x y z /P . w v e c 3 V = n o r m a l i z e ( f s i n . P . xyz ) ;
9 11 13 15 17 19
// f o r a l l l i g h t s o u r c e s f o r ( i n t i = 0 ; i < numLights ; i ++) { // C a l c u l a t e t h e ambient , d i f f u s e // and s p e c u l a r l i g h t s o u r c e f a c t o r v e c 3 ca l c A m b i e n t = v e c 3 ( m a t e r i a l A m b i e n t ) l i g h t s [ i ] . ambient ; vec3 d i f f u s e A l b e d o = vec3 ( m a t e r i a l D i f f u s e ) lights [ i ]. diffuse ; vec3 specularAlbedo = vec3 ( m a t e r i a l S p e c u l a r ) lights [ i ] . specular ;
21 23 25 27 29 31 33 35
// Normalize t h e incoming L v e c t o r // b u t f i r s t we need t h e d i s t a n c e f o r a t t e n u a t i o n // f o u r t h c o o r d i n a t e component w o f l i g h t s o u r c e p o s i t i o n // d i r e c t i o n a l l i g h t s o u r c e ( i n f i n i t e ) : w = 0 // p o i n t or s p o t l i g h t s o u r c e ( f i n i t e ) : w = 1 // i f ( l i g h t [ i ] . p o s i t i o n [ 3 ] != 0 . 0 f ) // t h e n L = l i g h t s [ i ] . p o s i t i o n + f s i n .V; // e l s e L = l i g h t s [ i ] . p o s i t i o n ; // M u l t i p l i c a t i o n i s f a s t e r than i f e l s e // n o t n o r m a l i z e d V = P . xyz , s e e a b o v e v e c 3 L = l i g h t s [ i ] . p o s i t i o n . xyz lights [ i ] . position [3] f s i n . P . xyz ; f l o a t d i s t a n c e = l e n g t h (L ) ; v e c 3 LV = n o r m a l i z e (L ) ;
37 39 41 43 45 47 49 51 53 55 57 59 61 63
// s p o t l i g h t : cone a t t e n u a t i o n f l o a t s p o t a t t e n = dot ( LV, l i g h t s [ i ] . s p o t D i r ) ; // // // // // // // if
I f the angle between the d i r e c t i o n o f the l i g h t and t h e d i r e c t i o n from t h e l i g h t t o t h e v e r t e x b e i n g l i g h t e d i s g r e a t e r than t h e s p o t c u t o f f a n g l e , t h e l i g h t i s c o m p l e t e l y masked . h t t p s : / /www. k h r o n o s . o r g / r e g i s t r y /OpenGL R e f p a g e s / g l 2 . 1 / x h t m l / g l L i g h t . xml checked 19.04.2019 ( s p o t a t t e n < l i g h t s [ i ] . spotC ) spotatten = 0 . 0 ; else s p o t a t t e n = pow ( s p o t a t t e n , l i g h t s [ i ] . s p o t E x p ) ;
// C a l c u l a t e t h e a t t e n n u a t i o n // A t t e n u a t i o n o n l y f o r p o i n t and s p o t l i g h t s o u r c e s float atten ; i f ( l i g h t s [ i ] . p o s i t i o n [ 3 ] == 1 . 0 f ) { float iatten = l i g h t s [ i ] . attenuation [ 0 ] + l i g h t s [ i ] . attenuation [ 1 ] distance + l i g h t s [ i ] . attenuation [ 2 ] distance distance ; atten = spotatten min ( 1 / i a t t e n , l i g h t s [ i ] . a t t e n u a t i o n [ 3 ] ) ; } else atten = 1.0 f ;
Fig. 9.31 Fragment shader for multiple light sources part 2 (GLSL)
338
9
2
Lighting Models
// Normalize H v e c t o r ( B l i n n Phong L i g h t i n g ) v e c 3 H = n o r m a l i z e (LV + V ) ;
4 // Compute t h e d i f f u s e and s p e c u l a r components // f o r each f r a g m e n t v e c 3 c a l c D i f f u s e = max( dot (N, LV) , 0 . 0 ) diffuseAlbedo ; vec3 c a l c S p e c u l a r = pow (max( dot (N, H) , 0 . 0 ) , m a t e r i a l S h i n i n e s s ) specularAlbedo ;
6 8 10 12
// To combine w i t h t e x t u r e // ambient += c a l c A m b i e n t ; // d i f f u s e += c a l c D i f f u s e a t t e n ; // s p e c u l a r += c a l c S p e c u l a r a t t e n ;
14 16
// F i n a l B l i n n Phong L i g h t i n g c o l o r += vec4 ( calcAmbient + ( c a l c D i f f u s e + c a l c S p e c u l a r ) atten , 1 . 0 f ) ;
18 20
}
22
// Write f i n a l c o l o r t o t h e f r a m e b u f f e r ( no t e x t u r e ) FragColor = c o l o r ;
24 // Write f i n a l c o l o r t o t h e f r a m e b u f f e r // when u s i n g t e x t u r e and s p e c u l a r l i g h t // FragColor = t e x t u r e ( t e x , f s i n . vUV) + v e c 4 ( s p e c u l a r , 1 . 0 f ) ;
26 28
// Write f i n a l c o l o r t o t h e f r a m e b u f f e r when u s i n g t e x t u r e // and Blinn Phong l i g h t i n g // P r e p r o c e s s i n g : F i r s t l y , a l l used v a r i a b l e s need t o be // d e c l a r e d g l o b a l . // S e c o n d l y , s u m e r i z e d one by one . // FragColor = v e c 4 ( e m i s s i v e + ambient + d i f f u s e , 1 . 0 ) // t e x t u r e ( t e x , f s i n . vUV) + v e c 4 ( s p e c u l a r , 1 . 0 f ) ;
30 32 34 36
}
Fig. 9.32 Fragment shader for multiple light sources part 3 (GLSL)
Fig. 9.33 A sphere, shown with constant shading and different tessellation
Fig. 4.11 the facet structure is almost no longer optically perceptible even at medium resolution. The same can be observed in Fig. 9.34. In order to make the facets disappear with constant shading, an enormous increase in resolution is required. This is partly due to the fact that human vision is automatically pre-processed, which increases edges, i.e., contrasts so that even small kinks are detected. Instead of constant shading, interpolation is, therefore, used in shading. The normal vectors of the vertices of the polygon are needed for this. In this way, a triangle approximating a part of a curved surface is assigned three normal vectors, all of
9.4 Shading
339
Fig. 9.34 The Utah teapot, shown with constant shading and different tessellation
which do not have to coincide with the normal vector belonging to the triangle plane. If the triangles are not automatically calculated to approximate a curved surface but were created individually, normal vectors can be generated at the vertices of a triangle by averaging the normal vectors of all triangular surfaces that meet at this vertex, component by component. This technique has already been described in Sect. 4.7. Assuming that triangles approximate the surface of an object, the Gouraud shading [4] determines the shading at each corner of the triangle using the different normal vectors in the vertices. Shading of the points inside the triangle is done by colour interpolation using the colours of the three vertices. This results in a linear progression of the colour intensity over the triangle. Figure 9.35 shows the colour intensity as a function over the triangle area. Figure 9.37 shows the implementation with different degrees of tessellation for the Utah teapot. Figure 9.38 shows the same model with flat and Gouraud shading with specular reflection. The effective calculation of the intensities within the triangle is done by a scanline method. For this purpose, for a scan line ys first the intensities Ia and Ib are determined on the two edges intersecting the scan line. This is done by interpolating
340
9
Lighting Models
Fig. 9.35 The colour intensity as a function over the triangular area in Gouraud shading
y I1
y1 Ia
y
Ip
I
b
scan line
s
y3
I3
y
2
I2
Fig. 9.36 Scan-line method for calculating the Gouraud shading
the intensities of the two corners accordingly. The intensity changes linearly along the scan line from the start value Ia to the end value Ib . Figure 9.36 illustrates this procedure. The intensities are determined with the following formulas: y1 − ys y1 − y2 y1 − ys Ib = I1 − (I1 − I3 ) y1 − y3 xb − x p I p = Ib − (Ib − Ia ) . xb − xa Ia = I1 − (I1 − I2 )
The intensities to be calculated are integer colour values between 0 and 255. As a rule, the difference in intensity on the triangle will not be very large, so that the slope of the linear intensity curve on the scan line will almost always be less than one.
9.4 Shading
341
Fig. 9.37 The Utah Teapot with Gouraud shading of different tessellations
In this case, the midpoint algorithm can be used to determine the discrete intensity values along the scan line. The facet effect is significantly reduced by the Gouraud shading. Nevertheless, the maximum intensity of the Gouraud shading within a triangle can only be assumed in the corners due to the linear interpolation, so that it can still happen that the corners of the triangles stand out a bit (Fig. 9.38). Phong shading [8] is similar to Gouraud shading, but instead of intensities, the normal vectors are interpolated. Thus, the maximum intensity can also be obtained inside the triangle, depending on the angle at which light falls on the triangle. Figure 9.39 shows a curved area and a triangle that approximates a part of the area.
342
9
Lighting Models
Fig. 9.38 The Utah Teapot with flat and Gouraud shading for specular reflection
The normal vectors at the corners of the triangle are the normal vectors to the curved surface at these three points. Within the triangle, the normal vectors are calculated as a convex combination of these three normal vectors. Gouraud and Phong shading provide only very good approximations of the actual shading of a curved surface. It would be correct to determine for each triangle point the corresponding normal vector on the original curved surface and thus determine the intensity in the point. However, this would mean that when displaying a scene, the information about the triangles including normal vectors in the corner points would
9.5 Shading in the OpenGL
343
Fig. 9.39 Interpolated normal vectors in Phong shading
not be sufficient and that the curved surfaces would always have to be accessed directly when calculating the scene, which would lead to a very high calculation effort.
9.5 Shading in the OpenGL Flat and Gouraud shading is implemented before Phong shading is implemented after rasterizing. In OpenGL, rasterising always interpolates. In the case of flat shading, the same colour is assigned per primitive to each vertex of the primitive. The source vertex for this colour is called provoking vertex (see Sect. 2.5.2). This directly implies that flat and Gouraud shading can be performed both in the fixed function and in the programmable pipeline. Phong shading, on the other hand, can only be implemented in the programmable pipeline. The source code of the lighting calculation in the fragment shader shown in Sect. 9.3 already corresponds to the implementation of Phong shading. In the fixed function pipeline, flat or Gouraud shading can be controlled by the functions gl.glShadeModel(gl.GL_FLAT) or gl.glEnable(gl.GL_FLAT), gl.glShadeModel(GL_SMOOTH) or gl.glEn able(gl.GL_SMOOTH). In the programmable pipeline, the correct choice ofan interpolation qualifier (flat for flat shading and smooth for Gouraud shading) defines flat or Gouraud shading. All that is required is that the respective qualifier is assigned before in or out variable of the corresponding normal (or colour). In the vertex shader, this means concretely using the example of the colour: / / F l a t −S h a d i n g f l a t out vec4 vColor ; / / C a l c u l a t e l i n e a r i n t e r p o l a t i o n i n window s p a c e / / without perspective n o p e r s p e c t i v e out vec4 vColor ; / / Calculate linear i n t e r p o l a t i o n with perspective / / Gouraud−S h a d i n g smooth o u t vec4 vColor ;
344
9
Lighting Models
and in the fragment shader / / F l a t −S h a d i n g f l a t in vec4 vColor ; / / C a l c u l a t e l i n e a r i n t e r p o l a t i o n i n window s p a c e / / without perspective n o p e r s p e c t i v e in vec4 vColor ; / / Calculate linear i n t e r p o l a t i o n with perspective / / Gouraud−S h a d i n g smooth i n vec4 vColor ;
9.6 Shadows An important aspect that has not yet been considered in shading in the previous sections is shadows. The Chap. 8 describes methods to determine which objects are visible to the viewer of a scene. With the previous considerations of shading visible surfaces, the angle between the normal vector and the vector pointing in the direction of the light source or its scalar product can be used to decide whether the light source illuminates the object from behind. In this case, it has no effect on the representation of the surface. It was not taken into account whether a light source whose light points in the direction of an object actually illuminates this object or whether another object might be an obstacle on the way from the light source to the object so that the object is in shadow with respect to the light source. Figure 9.40 shows how a shadow is created on an object when it is illuminated from above by a light source and another object is located in between.
Fig. 9.40 Shadow on an object
9.6 Shadows
345
Shadow does not mean blackening of the surface, but only that the light of the corresponding light source does not—or at most in the form of the separately modelled scattered light—contribute to the illumination of the object. In terms of the illumination model according to Phong, these would be the diffuse and specular light components. Determining whether the light of a light source reaches the surface of an object is equivalent to determining whether the object is visible from the light source. The question of the visibility of an object has already been dealt with in the Sect. 8.2.3, but from the point of view of the observer. This relationship between visibility and shadow is exploited by the two-pass depth buffer algorithm (z-buffer algorithm). In the first pass, the depth buffer algorithm (see Sect. 8.2.3) is executed with the light source as an observer’s point of view. With directional light, this means a parallel projection in the opposite direction to the light beams. With a point light source or a spotlight a perspective projection with the light source in the projection centre is applied. For the depth buffer algorithm, this projection must be transformed to a parallel projection that can be traced back to the x y-plane. In this first pass of the depth buffer algorithm, only the z-buffer Z L and no colour buffer is used. The second pass corresponds to the normal depth buffer algorithm from the observer’s point of view with the following modification. A transformation TV is necessary, which reduces the perspective projection with the viewer in the projection centre to a parallel projection on the x y-plane. If during this second pass, an object or a point on its surface is entered into the z-buffer TV for the viewer, before calculating the value for the colour buffer, check whether the point is in the shadow of another object. If the point has the coordinates (x V , yV , z V ) in the parallel projection on the x y-plane, the transformation ⎛ ⎞ ⎛ ⎞ xV xL −1 ⎝ y L ⎠ = TL · T · ⎝ y V ⎠ V zL zV receive the coordinates of the point as seen from the light source. Here TV−1 is the inverse transformation, i.e., the inverse matrix, to TV . The value z L is then compared with the entry in the z-buffer Z L of the light source at the point (x L , y L ). If there is a smaller value in the z-buffer Z L at this point, there is obviously an object that is closer to the light source than the point under consideration. Thus the point is in the shadow of the light source. Otherwise, the light of the light source reaches the point. When entering the point in the colour buffer FV for the viewer, it can be taken into account whether the light source contributes to the illumination of the point or not. In the case of a light source, if the point is in shadow, only the scattered light would be included in its colouration. If there are several light sources, the first run of the two-pass depth buffer algorithm must be calculated separately for each light source. On the second pass, after entering a value in the z-buffer Z V of the observer, a check is made for each light source to see if it illuminates the point. This is then taken into account accordingly when determining the value for the colour buffer. The following algorithm describes this procedure for calculating shadows using the z-buffer algorithm from Sect. 8.2.3:
346
9
Lighting Models
Step 1: Calculate z-buffer (smallest z-value) z_depth for each pixel as seen from the light source L. Step 2: Render the image from the viewer’s perspective using the modified zbuffer algorithm. For each visible pixel, P(x, y, z) transformation is performed from the point of view of the light source L through, receive (x , y , z ). If z‘> z_depth(x‘,y‘): P in the shadow of L If z‘= z_depth(x‘,y‘): illuminate P with L
9.7 Opacity and Transparency Transparency means that you can partially see through a fragment of an object or surface, such as a tinted glass plane (see Chap. 6 on colour and the alpha channel in RGBA). Transparency of a surface means here that although only a part of the light from the fragments of an object behind it is transmitted, no further distortion takes place, such as a frosted glass pane that allows light to pass through but behind which no contours can be seen. This frosted glass effect is called translucency, which is not as well as transparency in combination with refraction. Opacity is the opposite of transparency. Strictly speaking, opacity is defined with the same alpha channel: Alpha equal to 0 means transparent whereas alpha equal to 1 means opaque. OpenGL uses the term opacity in this context. To explain how opacity can be modelled, a surface F2 is considered, which lies behind the transparent surface F1 . For the model of interpolated or filtered transparency, an opacity coefficient kopaque ∈ [0, 1] must be selected, which indicates the proportion of light that passes through the surface F1 . For kopaque = 1, the surface F1 is completely opaque. For kopaque = 0, the surface F1 is transparent. The colour intensity I P of a point P on the transparent area F1 is then calculated as follows: I P = kopaque · I1 + (1 − kopaque ) · I2 .
(9.9)
Here, I1 is the intensity of the point, which would result from the observation of the surface F1 alone if it were opaque. I2 is the intensity of the point behind it, which would result from the observation of the surface F2 alone if F1 were not present. The transparent surface is assigned a colour with the intensities I1 for red, green and blue. It should be noted that k is normally used for the absorption coefficient. This means that in this simplified model the scattering is neglected. Transmission, strictly speaking, considers both absorption and scattering. Transparent surfaces make the calculations more difficult for visibility considerations. Here the depth buffer algorithm shall be considered as an example. If a transparent surface is to be entered into the z- and the colour buffer, the following problems arise: • Which z-value should be stored in the z-buffer? If the old value of the object behind the transparent area is kept at O, an object that lies between O and the transparent
9.7 Opacity and Transparency
347
Fig. 9.41 50% (left) and 25% (right) screen door
area and is entered later would completely overwrite the colour buffer due to the smaller z coordinate, even though it lies behind the transparent area. If instead, one use the z-value of the transparent area, such an object would not be entered into the colour buffer at all, although it would have to be visible behind the transparent area at least. • Which value should be transferred to the colour buffer? If you use the interpolated transparency according to Eq. (9.9), the information about the value I1 is lost for later objects directly behind the transparent area. The value I1 alone would not be sufficient either. There is not quite an optimal possibility to use alpha blending. As the usual storage of RGB colours occupies three bytes, and blocks of two or four bytes are easier to manage in the computer, RGB values are often provided with a fourth value, called alpha. This alpha value contains the opacity coefficient kopaque for opacity. But even when using this alpha value, it is not clear to which object behind the transparent surface the alpha blending, i.e., Eq. (9.9), should be applied. To display opacity, all opaque areas should, therefore, be entered in the depth buffer algorithm. Only then are the transparent areas with alpha blending added. Again, this can cause problems if several transparent areas lie behind each other. Make sure that the transparent areas are entered in the correct order, i.e., from back to front. Usually, they are sorted according to their z-coordinates. An alternative solution to this problem is the screen door transparency, which is based on a similar principle as the halftone method introduced in the Sect. 6.1. The mixture of the intensity of the transparent surface with that of the object behind it is simulated by the proportionate colouring of the fragments. With an opacity coefficient of kopaque = 0.75 every fourth pixel would be coloured with the colour of the object behind it, all other pixels with the colour of the transparent area. Figure 9.41 illustrates this principle using greatly enlarged fragments. The red colour is assigned to the transparent area, the green colour to the object behind it. Left in of the figure was used kopaque = 0.5, right kopaque = 0.75. The screen door transparency could be implemented with the depth buffer algorithm. The value from which the colouring originates would be entered in the z-buffer. So with kopaque = 0.75, three-quarters of the points would be the z-value of the front one, transparent area and a quarter of the z-value of the area behind it. A later viewed object in front of the transparent area would correctly overwrite everything. An object behind the opaque object already entered would not be entered at all. An object between the transparent surface and the previously entered object will over-
348
9
Lighting Models
write exactly the same amount of pixels that were assigned to the object behind, i.e., you would get the same amount of pixels for the new object behind the transparent surface. Screen door transparency in combination with the depth buffer algorithm, like the halftone methods, only leads to satisfactory results if the resolution is sufficiently high. With opacity coefficients close to 50%, the results can hardly be distinguished from the interpolated transparency. With very weak or very strong opacity, the screen door transparency tends to form dot patterns.
9.8 Radiosity Model When dealing with reflections from surfaces in Sect. 9.2, it is explained that objects that have their own luminosity are not considered a light source in the scene unless you add an additional light source at the position of the object. This simplification is a special case of the basic assumption in the reflection calculation that the light emitted by an object by its own luminosity or by reflection does not contribute to the illumination of other objects. By adding constant scattered light to the scene, the effect is to model the effect that objects are not only illuminated by light coming directly from light sources. The result of this simplifying modelling is partly very sharp edges and contrasts. For example, if light and a dark wall form a corner, the light wall will radiate light onto the dark wall. This effect can be seen especially where the light wall meets the dark one, that means on the edge between the two walls. With the illumination models for light sources from Sect. 9.1 and for reflection from Sect. 9.2, the interaction of light between individual objects is neglected. This results in a representation of the two walls as shown in Fig. 9.42 on the left. This results in a very sharp edge between the two walls. In the right part of the figure, it was taken into account that the bright wall radiates light onto the dark one. This makes the dark wall appear slightly brighter near the corner and the edge between the two walls is less sharp.
Fig. 9.42 Light emission of an object onto another
9.8 Radiosity Model
349
A simple approach to take this effect into account is to use irradiance mapping. However, irradiance mapping is not used as textures in the classical sense (i.e., as an image on an object surface), but to calculate the light that other objects radiate onto a given object. To do this, the shading of the scene is first calculated as in Sect. 9.2, without considering the interaction between the objects. Afterwards, the corresponding irradiance map is determined for each object. The colour value resulting from this in each fragment is treated as if it was derived from an additional light source originates and is added to the previously calculated intensities according to the reflection properties of the object surface. The radiosity model [3,7] bypasses these recursive calculations. The radiosity Bi is the rate of energy that the surface Oi emits in the form of light. This rate is composed of the following individual values when only diffuse reflection is considered: • The inherent luminosity t E i of the surface. This term is not equal to zero only for the light sources within the scene. • The reflected light that is emitted from other surfaces onto the surface Oi . For example, if Oi is a surface section of the dark wall in Fig. 9.42 and O j is a surface portion of the light-coloured wall, the energy of the light reflected by O j is calculated as follows: i · B j · F ji i is the reflection coefficient of the surface Oi , B j is the energy still to be determined which radiates O j . F ji is a (also dimensionless) shape or configuration factor specifying the proportion of light emitted from the surface piece O j that strikes Oi . In F ji the shape, the size and the relative orientation of the two surface pieces to each other are taken into account. For example, if the two pieces of surface are at a perpendicular angle to each other, less light will pass from one piece of the surface to the other than when they are opposite. The exact determination of the form factors is explained below. • In the case of transparent surfaces, the energy of light shining through the surface from behind must also be taken into account. For reasons of simplification, transparency is not taken into account here. The total energy of light emitted by the surface Oi is the sum of these individual energies. For n surfaces—including the light sources—in the scene, one get Bi = E i + i ·
n j=1
B j · F ji .
(9.10)
350
9
Lighting Models
Fig. 9.43 Determination of the form factor with the normals n i and n j of the two infinitesimal small areas d Ai and d A j which are at a distance r from each other
This results in the following linear system of equations with the unknowns Bi ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ B1 E1 1 − 1 F1,1 −1 F1,2 . . . −1 F1,n ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜E ⎟ ⎜ − F ⎟ ⎜ ⎟ 2 2,1 1 − 2 F2,2 . . . −2 F2,n ⎟ ⎜ B2 ⎟ ⎜ 2⎟ ⎜ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ · ⎜ ⎟ = ⎜ ⎟. (9.11) ⎜ ⎟ .. .. .. .. ⎜ .. ⎟ ⎜ ⎟ ⎜ .. ⎟ ⎜ . ⎟ ⎜ ⎟ ⎜ . ⎟ . . . . ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ −n Fn,1
−n Fn,2 . . . 1 − n Fn,n
Bn
En
This system of equations must be solved for each of the three primary colours red, green and yellow. The number of equations is equal to the number of area pieces considered, i.e., usually the number of triangles in the scene plus light sources. If the triangles are very large, they have to be tessellated in advance, otherwise inaccurate estimates will be made, increasing the number of area pieces. However, the number of light sources will be negligible in relation to the triangles. It is, therefore, a system of equations with hundreds or thousands of equations and unknowns. For typical scenes, it will be a thin system of equations. This means that for the most surfaces will be the form factor Fi j = 0, because they are either at a wrong angle to each other or are too far apart. To calculate the form factor of the surface Ai to surface A j , the two surfaces are divided into infinitesimally small surface pieces d Ai and d A j with distance r and summed, i.e., integrated over these surface pieces. If the area piece d A j is visible from d Ai , one get the differential form factor with the designations from Fig. 9.43 cos(θi ) · cos(θ j ) · d Aj. π · r2 It decreases quadratically with the distance between the two surface pieces, corresponding to the damping. Also, the angle between the two surfaces plays an important d Fdi ,d j =
9.8 Radiosity Model
351
role. If the surface pieces are facing each other, the form factor is the largest because then the normal vectors of the face pieces are parallel to the connecting axis of the centres of the face pieces. As the angle increases, the shape factor decreases until it reaches zero at an angle of 90◦ . For angles greater than 90◦ , the surface sections do not illuminate each other. The Cosine would take negative values. To exclude this case, the factor 1 if d A j is visible from d Ai Hi j = 0 otherwise is introduced so that cos(θi ) · cos(θ j ) · Hi j · d A j π · r2 applies for the differential form factor. Through integration, one get the form factor from the differential area dAi d Ai to area A j cos(θi ) · cos(θ j ) d Fdi , j = · Hi j d A j . π · r2 Aj d Fdi ,d j =
By repeated integration, one finally get the form factor of the area Ai to area A j cos(θi ) · cos(θ j ) 1 Fi, j = · Hi j d A j d Ai . Ai Ai A j π · r2 For small areas, the form factor can be calculated approximately by placing a hemisphere around the centre of gravity of the area. The form factor for determining how much a surface Ai contributes by its light to the illumination of the considered surface piece results from the projection of Ai onto the hemisphere and the subsequent projection onto the circular surface around the centre of the hemisphere. The proportion of the projected area in the circle corresponds to the form factor. This procedure is shown in Fig. 9.44. The proportion of the dark grey area on the circular surface gives the form factor. A simpler, but coarser approximation for the form factor was proposed by Cohen and Greenberg [2]. The hemisphere is replaced by a half of a cube. Also, for the solution of the system of Eq. (9.11) to determine the radiosity values Bi , techniques are used that quickly arrive at good approximate solutions. The stepwise refinement [1] calculated with the help of Eq. (9.10) iteratively updates the values Bi by evaluating the equations. To do this, all values are first set to Bi = E i , i.e., all values except for the light sources are initialised with zero. In the beginning, ΔBi = E i is also defined for the changes. Then the surface Oi0 is selected for which the value ΔBi0 is the highest. In the first step, this is the brightest light source. This means that all Bi are recalculated by Bi(new) = Bi(old) + i · Fi0 i · ΔBi0 . All ΔBi are also updated using ΔBi(old) + i · Fi0 i · ΔBi0 if i = i 0 (new) = ΔBi 0 if i = i 0 .
(9.12)
(9.13)
352
9
Lighting Models
Fig. 9.44 Determination of the form factor according to Nusselt
The light from the surface or light source Oi0 was thus distributed to the other surfaces. Then the surface with the highest value ΔBi0 is selected again, and the calculations according to (9.12) and (9.13) are carried out. This procedure is repeated until there are only minimal changes or until a maximum number of iteration steps has been reached. The advantage of stepwise refinement is that it can be stopped at any time and—depending on the number of iterations—provides a more or less good approximate solution. The radiosity model leads to much more realistic results in lighting. The radiosity models presented here, which only take diffuse reflection into account, can be performed independently of the position of the observer. For fixed light sources and no or only a few moving objects, the radiosity values can be precalculated and stored as textures in lightmaps (see Chap. 10), which are applied to the objects. The real-time calculations are then limited to the display of the textures, specular reflection, and shading of the moving objects.
9.9 Raycasting and Raytracing Beam tracing is another image-space method for determining visibility. A distinction is made between raycasting, which is used for simple raytracing, and raytracing for lighting effects, which means raytracing with reflections. As a rule, raytracing is implemented with refraction and with shadows (white raytracing). For each pixel of the window to be displayed on the projection plane, a ray is calculated, and it is determined which object the ray intersects first, thus determining the colour of
9.9 Raycasting and Raytracing
353
Fig. 9.45 Beam tracing
the pixel. Beam tracing is suitable for parallel projections, but can also be used in the case of perspective projection without transforming the perspective projection into a parallel projection. In a parallel projection, the rays run parallel to the projection direction through the pixels. In a perspective projection, the rays follow the connecting lines between the projection centre and the pixels. Figure 9.45 illustrates the raytracing technique. In the picture, the pixels correspond to the centres of the squares on the projection plane. For a perspective projection with projection centre in point (x0 , y0 , z 0 ), the beam to the pixel with the coordinates (x1 , y1 , z 1 ) can be parameterized as a straight-line equation as follows: x = x0 + t · Δx,
y = y0 + t · Δy,
z = z 0 + t · Δz
(9.14)
with Δx = x1 − x0 ,
Δy = y1 − y0 ,
Δz = z 1 − z 0 .
For values t < 0, the beam (interpreted as a straight line) is located behind the projection centre, for t ∈ [0, 1] between the projection centre and the projection plane, for t > 1 behind the projection plane. To determine whether and, if so, where the beam intersects a plane polygon, first the intersection point of the shaft with the plane Ax + By +C z + D = 0, which spans the polygon, is determined. Then it is checked whether the point of intersection with the plane lies within the polygon. If the straight-line equation of the beam (9.14) is inserted into the plane equation, the value for t is obtained t = −
Ax0 + By0 + C z 0 + D . AΔx + BΔy + CΔz
If the equation Ax + By +C z + D = 0 describes a plane, the denominator can only become zero if the beam is parallel to the plane. In this case, the plane is irrelevant for the considered pixel through which the beam has passed. To clarify whether the
354
9
Lighting Models
y
x z Fig. 9.46 Projection of a polygon to determine whether a point lies within the polygon
Fig. 9.47 Supersampling
intersection point is within the polygon, the polygon and the intersection point are projected into one of the coordinate planes by omitting one of the three coordinates. Usually, the plane most parallel to the polygon is selected for projection. For this purpose, the plane perpendicular to the coordinate with the highest coefficient of the normal vector (A, B, C) of the plane must be projected. After the projection, using the odd-parity rule (see Sect. 7.5), it can be determined whether the point is within the polygon. Figure 9.46 illustrates this procedure. The beams are usually calculated in parallel due to the efficiency. Therefore, no dependencies between neighbouring pixels should be created in the calculations. Due to the beam tracking, aliasing effects can occur (see Chap. 7) if the far clipping plane is very far away. The background results from objects hit more or less randomly by rays so that for very distant objects, it can no longer be assumed that neighbouring pixels get the same colour. Supersampling can be used to avoid this effect. This is done by calculating several rays for a pixel, as shown in Fig. 9.47, and using the— possibly weighted—average of the corresponding colours. However, supersampling is very complex in practice because each pixel is individually oversampled.
9.9 Raycasting and Raytracing
355
Fig. 9.48 Recursive Raytracing
The presented raycasting method for visibility calculations is a simple beam tracing method whose principle can also be used for illumination. For this purpose, rays emitted by the observer or, in case of parallel projection, parallel rays are sent through the pixel grid. A beam is tracked until it hits the nearest object. Every object in the three-dimensional scene must be tested, which is hugely complex. There, the usual calculations for the illumination are first carried out. In Fig. 9.48, the beam first hits a pyramid. At this point, the light source present in the scene is taken into account, which contributes to a diffuse reflection. The beam is then followed in the same way as with specular reflection, but in the opposite direction from the observer and not from the light source. If it hits another object, the specular reflection is again tracked “backward” until a maximum recursion depth is reached, a light source or no other object is hit. It should be noted that this procedure is applied to flat or light sources with an expansion ahead. A ray will usually not hit a point in space. Finally, the light components of the illumination of the object hit first must be taken into account, which is caused by specular reflections on the other objects along the beam that continues to follow. This standard procedure was developed by Turner Whitted and uses the light sources exclusively for local illumination. This type of beam tracing is called raytracing, as opposed to simple beam tracing.
356
9
Lighting Models
9.10 Exercises Exercise 9.1 In OpenGL, define one spotlight each in the colours red, green and blue. Aim the three headlights at the front side of a white cube in such a way that a white colouration appears in the middle of the surface and the different colours of the headlights can be seen at the edges. Exercise 9.2 Illuminate an area of a three-dimensional scene. Calculate the specular portion of illumination for one point of the surface. The point to be illuminated lies at the Cartesian coordinates (1, 1, 3). Assume that you have already mirrored the vector from the point to the light source at the normal and obtained the vector (0, 1, 0). The eyepoint is located at the coordinates (4, 5, 3). Use the formula discussed in this chapter to determine the specular light component I I = I L · ksr · cos α n . Assume, that ksr = 1, I L = 21 , and n = 2 applies. Make a sketch of the present situation. Calculate I using the information provided. Exercise 9.3 A viewer is located at the point (2, 1, 4) and looks at the (y, z) plane, which is illuminated by a point light source at infinity. The light source lies in the direction of the vector (1, 2, 3). At which point of the plane does the observer see specular reflection? Exercise 9.4 Model a lamp as an object in the OpenGL and assign a light source to it at the same time. Move the lamp with the light source through the scene. Exercise 9.5 The square of Fig. 9.49 should be coloured with integer grey values. Remember that grey values have identical RGB entries, so the figure simply contains one number each. For the grey value representation of the individual pixels, 8 bits are available. For the corner points of the rectangle, the grey values given in each case have already been calculated by the illumination equation. a) Triangulate the square and colour it using the flat shading method. Write the calculated colour values in the boxes representing each fragment. Note: You can decide whether to colour the fragments on the diagonal between two triangles with the colour value of one or the other triangle. Fig. 9.49 Square for the application Flat- or Gouraud-Shadings
References
357
b) Triangulate the square and colour it using the Gouraud shading method. Write the calculated colour values in the boxes representing each fragment. Round off non-integer values. Exercise 9.6 In the radiosity model, should the backside distance be applied before setting up the illumination Eq. (9.11)? Exercise 9.7 Let there be the eyepoint (x0 , y0 , z 0 )T with x0 , y0 , z 0 ∈ R and a pixel (x1 , y1 , z 1 )T with x1 , y1 , z 1 ∈ N on the image area. Furthermore, the raycasting ray as a straight line ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ x1 − x0 x x0 ⎝ y ⎠ = ⎝ y0 ⎠ + t · ⎝ y1 − y0 ⎠ z z0 z1 − z0 and a bullet (x − a)2 + (y − b)2 + (z − c)2 = r 2 with the centre (a, b, c) ∈ R3 and the radius r ∈ R. a) Develop the raycasting calculation rule, which can be used to calculate the intersection of the raycasting beam with the sphere surface. Sketch the situation and give the necessary steps for the general case. b) How is the associated normal vector to the point of intersection obtained? Let there eyepoint in the coordinate √ origin and a billiard ball with the centre √ be√the √ point ( 3, 3, 3) and the radius r = 3. Determine the visible intersection, if any, of the billiard ball surface with the raycasting beam, each passing through the pixel (x1 , y1 , z 1 )T defined below. Determine the normal vector for each intersection point visible from the eyepoint. c) Specify the visible intersection and the normal vector for the pixel (x1 , y1 , z 1 )T = (1, 1, 1)T . d) Specify the visible intersection and the normal vector for the pixel (x1 , y1 , z 1 )T = (0, 1, 1)T .
References 1. M. F. Cohen, S. E. Chen, J. R. Wallace and D. P. Greenberg. “A Progressive Refinement Approach to Fast Radiosity Image Generation”. In: SIGGRAPH Comput. Graph. 22.4 (1988), pp. 75–84. 2. M. F. Cohen and D. P. Greenberg. “The Hemi-Cube: A Radiosity Solution for Complex Environments”. In: SIGGRAPH Comput. Graph. 19.3 (1985), pp. 31–40. 3. C. M. Goral, K. E. Torrance, D. P. Greenberg and B. Battaile. “Modeling the Interaction of Light between Diffuse Surfaces”. In: SIGGRAPH Comput. Graph. 18.3 (1984), pp. 213–222. 4. H. Gouraud. “Continuous Shading of Curved Surfaces”. In: IEEE Transactions on Computers C-20 (1971), pp. 623–629.
358
9
Lighting Models
5. Mark J. Kilgard. OpenGL/VRML Materials. Tech. rep. Abgerufen 26.8.2021. Silicon Graphics Inc, 1994. URL: http://devernay.free.fr/cours/opengl/materials.html. 6. A. Nischwitz, M. Fischer, P. Haberäcker and G. Socher. Computergrafik. 3. Auflage. Vol. 1. Computergrafik und Bildverarbeitung. Wiesbaden: Springer Fachmedien, 2011. 7. T. Nishita and E. Nakamae. “Continuous Tone Representation of Three-Dimensional Objects Taking Account of Shadows and Interreflection”. In: SIGGRAPH Comput. Graph. 19.3 (1985), pp. 23–30. 8. B. T. Phong. “Illumination for Computer Generated Pictures”. In: Commun. ACM 18.6 (1975), pp. 311–317. 9. D. R. Warn. “Lighting Controls for Synthetic Images”. In: SIGGRAPH Comput. Graph. 17.3 (1983), pp. 13–21.
Textures
10
Textures are images or graphics that are applied to the surface of an object. These are used to define or change the appearance of the surface of an object. Textures can influence both the colour design and the geometry of an object. These can be twodimensional, most commonly in flat rectangular form, or three-dimensional in curved form. This curvature of the surface is approximated in OpenGL as a tessellation of planar triangles; see Chap. 4. The aim is to efficiently represent scenes photorealistically with a very high level of detail according to the situation without increasing the complexity of the geometry of the surface. In addition, this technique allows geometry to be visualised with different textures. Textures can also be applied to different geometries. In this chapter, only two-dimensional textures are considered, which are two-dimensional raster graphics mapped onto a three-dimensional geometry. In the simplest case, textures serve to use a colour gradient or pattern instead of a single colour. For example, a patterned wallpaper would use a corresponding texture that is placed on the walls. In this case, the texture would be applied several times to fill the walls. A single picture hanging on a wall could also be realised by a texture, but applied only once.
10.1 Texturing Process Textures are often used to represent grains or structures. A woodchip wallpaper gets its appearance from the fact that, in contrast to smooth wallpaper, it has a threedimensional structure that is fine but can be felt and seen. The same applies to the bark of a tree, a brick-built wall or the drape of a garment. Theoretically, such tiny three-dimensional structures could also be approximated by even smaller polygons of detail. The modelling effort and the computational effort for the representation of the scene would be unacceptable. Textures can be any images. Usually, they are images of patterns or grains, for example, a wood grain. If a surface is to be filled with a texture in the form of a rectangular image, the position where the texture is to be created must be defined. The © Springer Nature Switzerland AG 2023 K. Lehn et al., Introduction to Computer Graphics, Undergraduate Topics in Computer Science, https://doi.org/10.1007/978-3-031-28135-8_10
359
360
10 Textures
Fig. 10.1 Filling an area with a texture
area to be filled need not be rectangular. When drawing, clipping must be performed on the area to be textured. In many cases, a single rectangular texture is not sufficient to fill a larger area. In this case, the area should be filled by using the same texture several times as with a tile pattern. Also, in this case, clipping must be performed on the area to be textured. The positioning of the texture is specified in the form of an anchor. The anchor defines a point from which the rectangular texture is laid in all directions, like tiles. Figure 10.1 shows texture in the form of a brick pattern on the left. On the right, in the figure, a rectangular area has been filled with this texture. The black circle marks the anchor. Note that interpolation is used to efficiently transfer the correct colour values of the texture to the inside of the polygon. When a texture is used several times to fill a surface, there are sometimes clearly visible transitions at the edges where the textures are joined together. This effect is also shown in Fig. 10.1, especially at the horizontal seams. To avoid such problems, either a texture must be used where such effects do not occur, or the texture to be used must be modified so that the seams fit together better. How this is achieved depends on the type of texture. For highly geometric textures, such as the brick pattern in Fig. 10.1, the geometric structure may need to be modified. For somewhat unstructured grains, for example, marble surfaces, simple techniques such as colour interpolation at the edges, are sufficient as described in Sect. 6.4. A texture is, therefore, a mostly rectangular raster graphic image (see Chap. 7), which is applied to a (three-dimensional) object surface, as shown in Fig. 10.2. The steps of this texturing process are performed as follows for a point on the surface of the object. The procedure is then explained: 1. The object surface coordinates (x , y , z ) of the point are calculated first. 2. The (polygon) area relevant for (x , y , z ) is determined. From this, surface coordinates (x, y) result relative to this parameterised two-dimensional surface. 3. From now on, the so-called texture mapping is applied: a. A projection is made into the parameter space to calculate u and v. b. The texture coordinates s and t are derived from u and v, with consideration of correspondence functions k x and k y , if necessary.
10.1 Texturing Process
361
Fig. 10.2 Using a texture
4. The texture coordinates (s, t) are adjusted with the next texel or bilinear interpolation of the four adjacent texels if (s, t) does not fall precisely on a texel. 5. The appearance is modified with the extracted texture values of the texel (s, t). The parameterisation and the subsequent texture mapping will be considered below using the example of a globe, which is a special case. The globe is tessellated along the lines of latitude and longitude. This results in triangular areas around the two poles (two-dimensional), as shown in Fig. 10.3, whereas the remaining polygons are (two-dimensional) rectangles. It should be noted that other types of tessellations of a sphere also exist. The radius r of the sphere is constant, and the centre lies in the coordinate origin. It should be the texture of a globe that can be mapped onto this sphere; see Figs. 10.3 and 10.5. The resulting vertices on the tessellated surface of the sphere are described continuously and iteratively as parameterised two-dimensional coordinates so that they do not have to be specified individually. For this purpose, the angles α and β of the latitude and longitude circles of the tessellation are used as follows (see Fig. 10.4): Each point on the surface can be described with a suitable angle 0 ≤ α ≤ π and 0 ≤ β ≤ 2π in the form (x , y z ) = (r · sin(α) · cos(β), r · sin(α) · sin(β), r · cos(α)). This means that each corner point (resulting from the tessellation) is parameterised in two-dimensional coordinates (x, y) = (β, α) (see also polar or spherical coordinates). In order to map the texture onto this sphere, certain texels can be assigned to the vertices of the sphere. The texture is not repeated in either the x- or y-direction (k x = 1 and k y = 1 for the correspondence functions). The following applies: β β β x · k x = 2π · 1 = 2π and there with u ∈ [0, 1] also s = 2π . Analogously u = xwidth y α α applies to v = yheight · k y = π · 1 = π = t. This results in the texture coordinates (s, t) of the vertices. To get the texture coordinates of the points within a triangle or a square of the sphere, they are interpolated in relation to the texture coordinates (s, t) of the vertices. The procedure is identical to the other properties of the vertices during rasterising. As discussed in Chap. 7, vertices (besides the texture coordinates) are assigned specific additional properties, such as position coordinates, colour and normals. In the rasterising
362
10 Textures
Fig. 10.3 Tessellation of a sphere
Fig. 10.4 Illustration of the angles α and β that are used in the coordinates of the surface points of the sphere
process, the interpolated texture coordinates (s, t) (the same as for interpolated position coordinates and, if necessary, interpolated colour and normals) are determined for all fragments whose centre is located within such squares or triangles. From this, the corresponding texel, the texture’s pixel, is calculated and utilised. For squares, this interpolation is bilinear, and for triangles barycentric. The interpolation can also be adjusted to consider the perspective projection. u and v do not necessarily have to be between 0 and 1, as in this example (for another example, see Fig. 10.12). In the following, therefore, the cases with u and v are not considered in the interval [0, 1]. Also, the calculated texture coordinates s and t do not have to correspond precisely to the coordinates of a texel. This means that in order to obtain a colour value or material properties of (s, t), they must either be taken from the closest texel or interpolated between the colour values or material properties of the four directly adjacent texels. This will be explained later in this section.
10.1 Texturing Process
363
Fig. 10.5 Textur NASA’s Blue Marble (land surface, ocean colour, sea ice and clouds) [5]: visualised with angles α and β
If the values for u and v are outside the interval between 0 and 1, there are possibilities to continue the textures beyond this range of numbers. The texture can be applied in the x- or y-direction according to the following choices: • Repeat, see example in Fig. 10.6, • Mirrored repeat, see example in Fig. 10.7, • Continuation of the values at s = 1 or t = 1 for values outside these (clamp), see example in Fig. 10.8 or • Any assignment of value so that a border is created, see example in Fig. 10.9. In Fig. 10.8, the y-direction is repeated and the continuation clamp is used in the x-direction, while in Fig. 10.9 a colour is used as a border in the x-direction. The exact calculation of the responsible texel per fragment is achieved using so-called correspondence functions. In the following, this is illustrated through the repeated application of the texture (repeat). As shown in Fig. 10.10, the texel is calculated to the (parameterised two-dimensional) point (x, y), which is projected into the parameter space with u and v. As shown in Fig. 10.6, the texture should be repeated in the x- and y-directions. In this example, the texture is repeated four times in the x-direction and twice in the y-direction. Thus, the correspondence functions are derived in the x-direction with k x = 4 and in the y-direction with k y = 2. In the following, the correct s, t-values for (x, y) are calculated. A detour via the parameters u and v in the so-called area space is done as follows:
364
10 Textures
Fig. 10.6 Texture mapping: Repeated mapping of the texture on the (parameterised twodimensional) surface, twice in the x- and the y-direction (Image source [4])
Fig. 10.7 Texture mapping: Repeated mirror image of the texture on the (parameterised twodimensional) surface, two times in the x- and y-directions (Image source [4])
Fig. 10.8 Texture mapping: Repeated mapping of the texture on the (parameterised twodimensional) surface in the y-direction, then in the remaining x-direction, application of the texture values at s = 1 is performed (Image source [4])
10.1 Texturing Process
365
Fig. 10.9 Texture mapping: Repeated mapping of the texture on the (parameterised twodimensional) surface in the y-direction, followed by the application of any colour value as a border in conjunction with the x-direction (Image source [4])
Fig. 10.10 Texture mapping: Mapping the texture coordinates (s, t) to the (parameterised twodimensional) surface coordinates (x, y) (Image source [4])
u = and v =
x xwidth
· kx
y · ky . yheight
(10.1)
(10.2)
If the (parameterised two-dimensional) surface has a width of xwidth = 400 and height yheight = 200 and (x, y) = (320, 160) for the (parameterised twodimensional) point, formulas 10.1 and 10.2 show that u = 3.2 and v = 1.6 s and t can easily be determined from u and v, one only needs to look at the decimal places. Thus s = u mod 1 = 0.2 and t = v mod 1 = 0.6. If (s, t) corresponds to the texture coordinates of a texel, its colour or material properties for the (parameterised two-dimensional) point (x, y) are taken over. Identifying the corresponding texel is accessible as soon as (s, t) coincides with it. However, this is generally not the case.
366
10 Textures
Fig. 10.11 Bilinear interpolation
For texture mapping, it is essential to define which procedure should be performed as soon as the calculated (s, t) is not mapped to a texel but lies somewhere between four texels. There are two procedures in question. Either one chooses the colour value of the texel closest to (s, t) (nearest neighbour) or one interpolates between the colour values of the four texels directly adjacent to (s, t) with bilinear interpolation. The calculation of the nearest neighbour is very efficient. In the bilinear interpolation method, the colour values of all four neighbouring texels of (s, t) are interpolated (see Fig. 10.11). Let (si , t j ), (si , t j+1 ), (si+1 , t j ), (si+1 , t j ) and (si+1 , t j+1 ) with i, j ∈ N be these four neighbouring texels. In addition, let C(sk , tk ) be the colour value belonging to texel (sk , tk ) with k ∈ N. Then C(s, t) is first determined by interpolation in the sand then in the t-direction, i.e., bilinear, as follows. In the s-direction, the result is si+1 − s s − si · C(si , t j ) + · C(si+1 , t j ) si+1 − si si+1 − si si+1 − s s − si C( p2 ) = · C(si , t j+1 ) + · C(si+1 , t j+1 ) si+1 − s j si+1 − si C( p1 ) =
and in the t-direction C(s, t) =
t j+1 − t t − tj · C( p1 ) + · C( p2 ). t j+1 − t j t j+1 − t j
The second procedure is more complicated than the first, but the advantage is not to be underestimated. While the first method produces hard edges, the second
10.1 Texturing Process
367
Fig. 10.12 Texture mapping: Unique size-mapping of the texture onto the (parameterised twodimensional) surface (Image source [4])
method works like a blur filter because of the interpolation, blurring the transitions and therefore drawing them smoothly into each other. The second procedure, interpolation, thus reduces the staircase effect at edges. In this way, the primary colour of the surface induced by the texture is obtained at the point under consideration. Figure 10.10 visualises this relationship. This value must then be added to the lighting (see Chap. 9). Here, one can additionally consider whether the texture should be glossy or matt. It should be added that in addition to the individual assignment of texture coordinates, there are also procedures for the automatically calculated transfer of a texture. In this case, the object is wrapped with simple primitive geometry, for example, a cuboid, a sphere or a cylinder. Texture mapping is carried out on this primitive geometry to map it suitably onto the object. It is also possible to calculate the texture with a mathematical function.
10.1.1 Mipmap and Level of Detail: Variety in Miniature Perspective projection makes nearby objects appear larger, while more distant objects appear smaller. The same applies to the corresponding polygons. If one wants to apply textures, one obviously needs textures with a higher resolution on close polygons, while those textures further away are not very useful. Note that with texture mapping on distant polygons, aliasing can still occur despite the use of bilinear interpolation. The challenge that arises with more distant polygons is that many texels are mapped to the surroundings of a fragment by perspective projection, and thus the choice of only four colour values of directly adjacent pixels is not sufficient for the calculation of the colour value. In this case, in order to calculate the correct colour value, not only these four but all the texels in the vicinity would have to be taken into account in the interpolation. If, on the other hand, precisely one texture in a given resolution was chosen, many pixels would meet a few texels for close objects (so-called magnification), and many texels would meet a single pixel for more distant objects (so-called minification). This would be a very complex calculation.
368
10 Textures
Fig. 10.13 Simple example of the different resolutions for carrying out a Mipmapping: On the left is the highest resolution level visualised from which, looking from left to right, the three further hierarchical levels are created by interpolation
Instead of calculating it precisely, one chooses the approximation method. It can be seen that at the optimal resolution, one pixel should fall precisely on one texel or, in the case of bilinear interpolation, only the colour values of the directly adjacent pixels should be of importance. This is the reason for the so-called mipmapping, where mip is derived from Multum in parvo (variety in miniature). There is a mipmap in the form of a Gaussian pyramid, in which different resolutions of texture are stored. For a polygon, the one that contains the most useful resolution is chosen. Meaningful in this context means that both procedures (colour value determined according to a nearest neighbour or bilinear interpolation) provide consistent results. Figure 10.13 describes such a (here simplified) representation of different resolutions for the realisation of the level of detail. There are different types of implementations. For example, if you start with the highest resolution and delete every second row and column in every step, you save the resulting resolution to the next hierarchy level of the pyramid. The conversion in Fig. 10.13 is done by dividing the fragments into contiguous groups of four (delimited by broad lines), so-called quads, whose interpolation, here averaging the RGB values, is stored in the corresponding pixel of the next hierarchical level, also called mipmap level. This corresponds to low-pass filtering and reduces aliasing. A mipmap is, therefore, a pyramid of different resolutions, whose resolution of the next hierarchy level (mipmap level) differs from the previous resolution by exactly half. With this procedure, one can now use a lower resolution for texture mapping of distant polygons. This means that the calculation of the colour values must be carried out on a hierarchy level (mipmap level), whose resolution is just coarse enough that the nearest neighbour or bilinear interpolation delivers meaningful results. This is achieved by the approximation of the most significant gradient increase in the x- or y-direction, i.e., the approximation of the first (partial) derivative in the x- and y-directions. This requires only a simple colour value difference in the quads in the
10.1 Texturing Process
369
x- or y-direction. The higher the difference is, the lower the resolution level (mipmap level) that must be selected.
10.1.2 Applications of Textures: Approximation of Light, Reflection, Shadow, Opacity and Geometry Textures have many applications. In the following, various applications are briefly presented. One can define a background texture, for example, to create a skydome. More complex, more realistic lighting techniques, like the radiosity model presented in Sect. 9.8, are sophisticated in the real-time calculation. Under certain circumstances, however, they can be used to precalculate at least diffuse reflection and apply it to the surfaces as a texture—a so-called lightmap—so that afterward, only mirror reflection must be taken into account. A similar approach can be implemented for mirroring and mapped as a gloss map. Another similar method can be used for the calculation of shadows, in the socalled shadow map. Here, the modified z-buffer algorithm for calculating shadows in relation to a light source is performed in advance. The distances to the light source are written as depth values into the depth buffer. These values are synonyms for the fragments that are directly illuminated by the light source. All other fragments of the same pixel, which are further away, must, therefore, be in the shadow of this light source. Textures can have an alpha channel. When they are drawn, the visibility order must, therefore, be respected. Alpha mapping takes opacity into account. The following values are assigned: 0 for completely transparent, 255 for opaque and values in between depending on their intensity for partially transparent fragments. When using alpha mapping, the order in which the fragments of a pixel are drawn is of utmost importance to achieve the desired result. Opacity is discussed in detail in the colour chapter (see Chap. 6). Environment or reflection mapping is a method for displaying reflective surfaces, for example, smooth water surfaces or wall mirrors. The viewer is mirrored on the plane defined by the reflecting surface. This point is used as a new projection centre for a projection onto the mirror plane. The resulting image is applied to the mirror as a texture to create the scene with the original to project the observer’s point of view as a projection centre. Figure 10.14 illustrates this procedure. Figure 10.15 shows an example. When textures are used to represent relief-like structures, the reliefs appear flat, especially when illuminated by a concise light source, because no real threedimensional information is contained in the texture. If this texture is applied to a cuboid, for example, the lighting calculation cannot create a convincing threedimensional illusion by shading. This is because only one normal vector per side surface of the box can be included in the illumination calculation. In fact, there are four normals, which are located in the corners of the cuboid surface (all four are equally oriented). Fine surface structures of the texture fall additionally within a surface triangle of the rough tessellation of a cuboid side.
370
10 Textures
Fig. 10.14 Displaying a mirror using reflection mapping
Fig. 10.15 Textured Utah teapot: environment mapping
Note that up to now, the illumination calculation has been based on the interpolated normals of the corresponding normal vectors of the vertices of a polygon (usually a triangle). Also, note that potential normals adapted to a three-dimensional relief structure (which is visualised by the texture) usually have a different orientation than the interpolated normals of the vertices. The challenge here is that the lighting calculation takes place on a flat (polygon) surface, thus suggesting a flat surface while the texture inconsistently visualises a high level of detail of the surface structure. This leads to a striking discrepancy.
10.1 Texturing Process
371
To create a better three-dimensional effect of such a texture on the (flat) surface, bump mapping [1] is used. The surface on which the texture is applied remains flat. In addition to the colour information of the texture, the bump map stores information about the deviations of the normal vectors in order to calculate the perturbed normal vectors belonging to the surface of the relief structure. In the bump map a disturbance value B(s, t) is stored at each texture point (s, t), which is the corresponding point P on the surface to which the texture is applied, in the direction of the perturbed normal vector. If the surface is given in the parameterised form and the point to be modified P = (x, y) with corresponding texture coordinates (s, t), the nonnormalised perturbed normal vector at (s, t) for the parameterised point P results from the cross product of the partial derivatives to s and t in the texture (note that the partial derivative can be approximated via the colour value differences in the s- and the t-direction): ∂P ∂P × . n = ∂s ∂t If B(s, t) is the corresponding bump value at (s, t) for P, the new (with the disturbance value B(s, t) shifted in direction n) parameterised point P is obtained P = P + B(s, t) ·
n n
on the (parameterised) surface. A good approximation for the new perturbed normal vector n in this point P then provides ∂B ∂B ∂P ∂P n+d ∂s · n × ∂t − ∂t · n × ∂s n = mit d = . n+d n Bump mapping allows varying normal vectors to be transferred to an otherwise flat or smooth plane. The bump mapping, therefore, changes the (surface) normals. Figure 10.16 illustrates how the perturbed normal vectors belonging to a small trough are mapped to a flat surface. Note, however, that no height differences are visible from the side view. Also, shadows are not shown correctly due to the fine structures. Figure 10.17 shows a box with and without bump mapping.
Fig. 10.16 Bump mapping
372
Fig. 10.17 A box without (top) and with (bottom) bump mapping
10 Textures
10.1 Texturing Process
373
The new perturbed normals are used for the calculations in the Blinn–Phong lighting model, as already mentioned. This changes the diffuse and specular light components. Figure 10.18 clearly shows this change in the diffuse and specular light portion. The superposition (which is shown as an addition in this figure) results in the boxes in Fig. 10.17. The illusion of the three-dimensional relief structure is visible on each side of the block. Such effects work well on flat surfaces. Further optimisation of the bump mapping is the so-called normal mapping. The approach is identical to bump mapping, with the difference that in normal mapping, the perturbed normals are saved in texture, the so-called normal map, and used in the subsequent illumination calculation. This makes access to the new perturbed normals efficient. The procedure is explained in the following. A normal N is usually normalised, which means that −1 ≤ x, y, z ≤ 1 for the normal coordinates (with a length of normal equal to one) is valid. The normalised normals are stored per pixel in a texture with RGB colour channels in the following form: r = (x + 1)/2, g = (y + 1)/2 and b = (z + 1)/2. When reading out from the normal map, this mapping is undone (i.e., x = 2r −1, y = 2g −1 and z = 2b −1), and these standards are then used in the lighting calculation. The normal map can be easily created from a colour texture with an image processing program such as Photoshop or GIMP, and can also be used as a texture in OpenGL. The colour texture to be used is first converted into a two-dimensional gradient field. Gradients can be interpreted by approximating the first (partial) derivative in the xand y-directions, in the form of a height difference of neighbouring pixels in the corresponding digital brightness or grey value image. Approximately, this brightness or grey-value image can thus be understood as a heightmap. The local differences of the brightness or grey values of near fragments in the x- or y-direction in this heightmap are thus approximately calculated as gradients in the x- or y-direction. These can be calculated as tangents in the x- and y-directions and calculate the respective normal with the cross product (see Sect. 4.7). After the normalisation, these standards are stored as RGB colour values in a standard map, as described above. Algorithms for creating a normal map are explained in [3] and [6]. Figure 10.20 shows the height and normal map of the texture visualised in Fig. 10.19. The lighting conditions do not seem to be consistent, no matter whether one uses bump or normal mapping. This is due to the curved outer surface that is to be simulated on the Utah teapot. This means that the interpolated normal vectors per corner point of a triangular surface differ and are used for the calculation per fragment that can be interpolated from these. However, these calculated perturbed normal vectors of the bump and normal map refer to a flat surface. This does not generally apply. The curvature behaviour must, therefore, be additionally mapped in order to create the right illusion on curved surfaces. In order to map the curvature behaviour correctly to the three vectors of the illumination calculation per pixel (light vector L, the direction of view vector V and normal vector N ; see Chap. 9), a separate tangent coordinate system is installed for each fragment. For each fragment, a new coordinate system is created, which consists of two tangent vectors and the normal vector, the so-called shading normal, which consists of the corresponding vertices of the (polygon) surface
Fig. 10.18 A box and its light components of the Blinn–Phong lighting model without (top) and with (bottom) bump mapping with the result shown in Fig. 10.17
374 10 Textures
10.1 Texturing Process
375
Fig. 10.19 Old stone brick wall as texture
interpolated from the normals. Both tangent vectors can be approximated in any fragment from the previously discussed two gradients in the x- and the y-direction of the heightmap. Note that these coordinate systems do not necessarily have to be orthogonal due to the curvature of the map. However, one can approximately assume that they are “almost” orthogonal. For this reason, orthogonality is often heuristically assumed. This simplifies some considerations, especially those of the coordinate system change because the inverse matrix of the orthonormal matrices is the transposed matrix. Nevertheless, care must be taken, because any approximation can also falsify the result, if one expects accuracy. These calculations are carried out either in the model coordinate system or in the tangential coordinate system, and for this purpose, either the (interpolated) surface normal or the perturbed normals are transformed. The two vectors L and V of the illumination calculation must then be transformed by coordinate system transformation (see Sect. 5.13) into the coordinate
376
10 Textures
Fig. 10.20 Heightmap (top) and normal map (bottom) of the texture from Fig. 10.19
system in which the perturbed normal vector N is present. These are thus included in the lighting calculation, including the curvature behaviour. The tangential coordinate system Is based on the assumption that the nonperturbed (i.e., interpolated) normal vector n is mapped to (0, 0, 1), i.e., the z-axis of the tangential coordinate system. Each perturbation applied causes a deviation of this vector (0, 0, 1) in the x- or the y-direction. Since it is assumed that the normal vectors are normalised, a deviation can only occur in the hemisphere with 0 ≤ z ≤ 1. Apparently, it is sufficient to consider the difference in the x- and the y-direction. This difference can be calculated exactly from values read from the bump map or normal map. Since in a normal map, the RGB values range from 0 to 255, i.e., normalised from 0 to 1, the normalised RGB colour value 0 corresponds to the normal coordinate −1. The normalised RGB colour value 1 corresponds to the normal coordinate 1. The normal coordinates are linearly interpolated as RGB colour values between 0 and 1. The normal vector (0, 0, 1), therefore, is displayed as RGB colour value (0.5, 0.5, 1.0). If the illumination calculation takes place in this tangential coordinate system with the transformed three vectors, it represents the illumination conditions as intensities between 0 and 1. The transformation from the tangential coordinate system into the model coordinate system can be performed by coordinate system transformation (see Sect. 5.13) and thus by matrix multiplication with the
10.1 Texturing Process
377
following matrix. In homogeneous coordinates, the three basis vectors T , B and N are used as columns, and (0, 0, 0) as the coordinate origin: ⎛ ⎞ Tx Bx N x 0 ⎜ Ty B y N y 0 ⎟ ⎜ ⎟ ⎝ Tz Bz Nz 0 ⎠ 0 0 0 1 with N = (N x , N y , Nz )T the normal vector, T = (Tx , Ty , Tz )T the corresponding tangent vector and B = (Bx , B y , Bz )T the corresponding bitangent vector. This matrix is called TBN matrix. The mathematical considerations necessary for this can be found under [3]. Notice that these three basis vectors T , B and N usually differ per fragment. This means that for each fragment, this matrix has to be set up anew, and the three vectors in question have to be accordingly in order to be able to carry out the lighting calculation only afterward. Note also that the inverse of the TBN matrix is required to invert the mapping. Since it is an orthonormal matrix, the TBN matrix has to be simply transposed. Based on this fact, besides the model coordinate system, it can also be converted mathematically into the world coordinate system, as it can also be transformed into the camera coordinate system. From which of the three coordinate systems (model, world and camera) one transfers into these tangential coordinate systems depends on the coordinate system in which the three vectors V , L and N are present; see Figs. 10.21 and 10.22. While the bump or normal mapping changes the normals and not the geometry (but creates this illusion), the displacement mapping manipulates the geometry. This can be exploited by giving an object only a rough tessellation to refine the surface in combination with a displacement map. For example, the coarse tessellation of the Utah tessellation in Fig. 10.23, which is only four per cent of the sophisticated tessellation, could be sufficient to create the same fine geometry when combined with an appropriate displacement map. This displacement map thus contains information about the change of the geometry. With this technique, one only needs different displacement maps to create a differently finely structured appearance of such a coarsely tessellated object. This means that applying multiple displacement maps to the same rough geometry requires less memory than if one had to create multiple geometries. In conclusion, it can be said that more realism can be achieved by combining texture mapping techniques. Also, several textures can be linked per surface. It has to be checked if the order of linking and the kind of mathematical operations used play an important role. Besides two-dimensional textures, there are also three-dimensional ones. Texture coordinates are extended by a third dimension r besides s and t. Three-dimensional textures are used, for example, in medical imaging procedures.
378
10 Textures
Fig. 10.21 Possibilities to transform into the tangent space using the TBN matrix
Fig. 10.22 Tangential coordinate systems on a curved surface in three points corresponding to three pixels
10.2 Textures in the OpenGL
379
Fig. 10.23 Different coarse tessellations of the Utah teapot
10.2 Textures in the OpenGL This section provides an introductory look at texture mapping. The OpenGL specifications are available at [7]. In the fixed-function pipeline, the texture coordinates are set per corner point with the command glTexCoord2f. These are applied per corner points in the following way:
380
10 Textures
g l B e g i n ( GL_POLYGON ) ; // vertex 0 g l T e x C o o r d f ( 0 . 2 5 f , 0 . 2 5 f ) ; g l . g l V e r t e x 3 f ( −0.5 f , −0.5 f , 0 . 0 f ) ; // vertex 1 glTexCoordf ( 0 . 5 0 f , 0 . 2 5 f ) ; g l V e r t e x 3 f ( 0.5 f , −0.5 f , 0.0 f ) ; // vertex 2 glTexCoordf (0.50 f , 0 . 5 0 f ) ; glVertex3f ( 0.0 f , 0.5 f , 0.0 f ) ; // additional vertices ... glEnd ( ) ;
In the programmable pipeline, the positions of the vertices, their normals and texture coordinates of the object must be stored in the buffer of the graphics processor. In the following, the texture coordinates are prepared for the vertex shader during the initialisation of the object with an ID, similar to the other properties of the vertices. In the following example, ID 3 is chosen because the other properties (positions, normals and colours) already occupy the IDs 0 to 2. In the vertex shader, for each vertex, this ID is accessed with location == 3, and so are the corresponding texture coordinates: gl . glEnableVertexAttribArray ( 3 ) ; g l . g l V e r t e x A t t r i b P o i n t e r ( 3 , 2 , GL . GL_FLOAT , f a l s e , 1 1 ∗ 4 , 9 ∗ 4 ) ;
Also, the texture must be loaded from a file and transferred to the buffer. The background is that the normal and texture coordinates are first written into the main memory of the CPU so that the graphics processor can use them for later calculations. The driver decides when to copy from the main memory to the graphics memory. F i l e t e x t u r e F i l e = new F i l e ( t e x t u r e P a t h + t e x t u r e F i l e N a m e ) ; t e x t u r e = TextureIO . newTexture ( t e x t u r e F i l e , true ) ;
The determination of the colour values according to nearest neighbour or bilinear interpolation can be realised with the parameters GL_NEAREST or GL_LINEAR. In OpenGL, the GL_TEXTURE_MIN_FILTER or GL_TEXTURE_MAG_FILTER is set to either GL_NEAREST or GL_LINEAR. GL_TEXTURE_MAG_FILTER sets the filter for magnification and GL_TEXTURE_MIN_FILTER for minification. For GL_TEXTURE_MIN_FILTER, the following additional parameters are available since mipmapping is only useful in this case: • GL_NEAREST_MIPMAP_NEAREST: GL_NEAREST is applied, and the mipmap level is used, which covers approximately the size of the pixel that needs to be textured. • GL_NEAREST_MIPMAP_LINEAR: GL_NEAREST is applied, and the two mipmap levels are used, which cover approximately the size of the pixel that needs to be textured. This results in one texture value per the mipmap level. The final texture value is then a weighted average between the two values.
10.2 Textures in the OpenGL
381
• GL_LINEAR_MIPMAP_NEAREST: GL_LINEAR is applied, and the mipmap level is used, which covers approximately the size of the pixel that needs to be textured. • GL_LINEAR_MIPMAP_LINEAR: GL_LINEAR is applied and both mipmap levels are used, which cover approximately the size of the pixel that needs to be textured. This results in one texture value per mipmap level. The final texture value is a weighted average between the two values. To activate and calculate mipmapping, the parameters must be set as follows: t e x t u r e . s e t T e x P a r a m e t e r i ( g l , g l . GL_TEXTURE_MIN_FILTER , g l . GL_LINEAR ) ; t e x t u r e . s e t T e x P a r a m e t e r i ( g l , g l . GL_TEXTURE_MAG_FILTER , g l . GL_LINEAR ) ;
In addition, GL_NEAREST or GL_LINEAR can be set as parameters, i.e., the selection of the colour value of the next pixel or the bilinear interpolation of the colour values of the directly adjacent pixels (usually the four directly adjacent neighbouring pixels that have one side in common). After that, rules are defined, which deal with the well-known problem of texture mapping as soon as the texture coordinates are outside the values 0 and 1 (see Sect. 10.1). The behaviour can be modified using texture.setTexParameteri in the following way. Among others, the following parameters are available: • GL_REPEAT: Repeat, • GL_MIRRORED_REPEAT: Repetitive mirroring, • GL_CLAMP: Continuation of the values at s = 1 or t = 1 for values outside these or • GL_CLAMP_TO_BORDER: Any assignment of value so that a border is created. It describes the texture mapping (GL_TEXTURE_2D) with a single repetition (GL_REPEAT) in the s-direction (GL_TEXTURE_WRAP_S). The t-direction is converted analogously. t e x t u r e . s e t T e x P a r a m e t e r i ( g l , g l . GL_TEXTURE_WRAP_S , g l . GL_REPEAT ) ; t e x t u r e . s e t T e x P a r a m e t e r i ( g l , g l . GL_TEXTURE_WRAP_T , g l . GL_REPEAT ) ;
Thus, the behaviour of the texture during its repetition in the s- and t-directions can be designed independently of each other. The following source code t e x t u r e . s e t T e x P a r a m e t e r i ( g l , g l . GL_TEXTURE_WRAP_T , g l . GL_CLAMP ) ;
represents another option of repetition in the t-direction. Furthermore, one can define how the texture should be mixed with other textures or colour values. One can also define how, for example, two textures should be merged into each other or how the alpha value should be used for opacity with the
382
10 Textures
Table 10.1 Calculation of the parameters GL_REPLACE, GL_ADD, GL_BLEND, G_MODULATE, GL_DECAL, GL_COMBINE GL_REPLACE GL_ADD
GL_RGB
GL_RGBA
Cv = Cs
Cv = Cs
Av = A p
Av = As
Cv = C p + Cs
Cv = C p + Cs
Av = A p
Av = A p · As
GL_BLEND
Cv = C p · (1 − Cs ) + Cc · Cs Cv = C p · (1 − Cs ) + Cc · Cs Av = A p
Av = A p · As
GL_MODULATE
Cv = C p · Cs
Cv = C p · Cs
Av = A p
Av = A p · As
GL_DECAL
Cv = Cs
Cv = C p · (1 − As ) + Cs · As
Av = A p
Av = A p
colour value to be used. This is done by the function glTexEnvf, which can be used in the fixed-function pipeline as well as in the programmable pipeline. g l . g l T e x E n v f ( GL2 . GL_TEXTURE_ENV , GL2 . GL_TEXTURE_ENV_MODE , GL2 . GL_REPLACE ) ;
The parameters shown in Table 10.1 can be selected. Besides GL_REPLACE, GL_ADD, GL_BLEND, G_MODULATE, GL_DECAL or GL_COMBINE can be used. As explained on the OpenGL reference page of Khronoss [2], the formulae explained in Table 10.1 emerge. In the following, C p is the current pixel colour, A p the current texture alpha value of the fragment, Cs the texture colour, Cc the texture environment colour and As the texture alpha value. Furthermore, Cv is the resulting texture colour, and Av is the resulting texture alpha value. The texture can be in RGB format (the alpha value is then set to 1) or combined with the alpha value, which lies between 0 and 1, the so-called RGBA format (see Chap. 6). Finally, with texture.enable(gl) and gl.glActiveTexture (GL_TEXTURE0) texture mapping with type GL_TEXTURE_2D of the texture (further options are GL_TEXTURE_1D, GL_TEXTURE_3D, GL_TEXTURE_CUBE_MAP) activated in 0, the texture is bound to a texture unit with gl.glBindTexture (GL_TEXTURE_2D, texture.getTextureObject(gl)), and loaded into the texture buffer with calls like glTexImage2D. A texture unit is a functional unit (hardware) on the GPU. This concept describes the access to the texture image. One selects these units with gl.glActiveTexture (GL_TEXTURE0) and makes it an active buffer for future operations. For more effects, one can also use GL_TEXTURE1, GL_TEXTURE2, etc. With a sample, one can bind several texture units at the same time and then apply multi-texturing (the possibility to draw several textures on one primitive), for example.
10.2 Textures in the OpenGL
383
texture . enable ( gl ) ; g l . g l A c t i v e T e x t u r e ( GL_TEXTURE0 ) ; g l . g l B i n d T e x t u r e ( GL_TEXTURE_2D , t e x t u r e . g e t T e x t u r e O b j e c t ( g l ) ) ;
In the vertex shader, as mentioned at the beginning of this section, one can access the corresponding texture coordinates per vertex using l a y o u t ( l o c a t i o n = 3 ) i n v e c 2 vInUV ;
The variable vInUV enters the vertex shader as in variable and leaves it as out variable to be able to be included in the fragment shader again as in variable. Variables, intended from the vertex shader for further processing in the fragment shader, can be defined as out variables. The naming of the out variable can be different from the in variables. Furthermore, the fragment shader should have access to this texture. This is done by the predefined type sampler2D for two-dimensional texture use: l a y o u t ( b i n d i n g = 0) uniform sampler2D t e x ;
In fragment shaders, textures are accessed via so-called samples. Depending on the dimension, there are different variants: sample1D, sample2D, sample3D, sampleCube, etc. Samples are used as uniform variables. Figures 10.24–10.26 show the corresponding source code. In the following, the implementation of bump mapping with normal maps (normal mapping) in OpenGL is explained. On flat surfaces, the perturbed normal vectors can be extracted from the normal map with the following source code in GLSL: v e c 3 n o r m a l m a p = t e x t u r e ( n t e x , f s _ i n . vUV ) . x y z ; / / t r a n s f o r m p e r t u r b e d normal from [ 0 , 1 ] t o [ −1 ,1] v e c 3 PN = n o r m a l i z e ( 2 ∗ n o r m a l m a p − 1 ) ;
These normal vectors only have to be converted from the RGB colour values of the normal map into the range [−1, 1] (see Sect. 10.1.2). If the other vectors for Blinn–Phong illumination calculations are available in the camera coordinates, the perturbed normal vectors must also be transformed from the world coordinate system into the camera coordinate system. Then the normal vectors (instead of the interpolated normal vectors) must be mapped with the transposed inverse of the total matrix that calculates the world coordinate system to the camera coordinates system. The transformed perturbed normal vectors can then be included with the other vectors in the illumination calculation. To texture the Utah teapot, the tangent coordinate systems must be created to achieve the illusion of a curved surface. Since both the viewing vector and the light direction vector are present in the camera coordinate system, the disturbed normal vectors per fragment are transformed from their local tangent coordinate system into the camera coordinate system. For this purpose, the TBN matrix of the last Sect. 10.1 per fragment is applied. This is shown in the source code of Fig. 10.28, and the lighting calculation is also performed. The method for creating the TBN matrix is shown in Fig. 10.27.
384
10 Textures
Fig. 10.24 Initialisation of a texture calculation (Java)
The following call is used to pass the modified normals that are included in the lighting calculation: Here, the vector order projection direction V , the normal vector N and the texture coordinates vU V . v e c 3 PN = p e r t u r b e d _ n o r m a l ( V , N , f s _ i n . vUV ) ;
Since displacement mapping changes the geometry in the form of a more complex tessellation, it is not used in the vertex or fragment shader but in the so-called tessellation unit. The tessellation unit contains the tessellation control shader and the tessellation evaluation shader (see Chap. 2). The corresponding commands in OpenGL for three-dimensional textures are similar to the two-dimensional ones. Usually, only the exchange of 2D with 3D is done in the function name.
10.2 Textures in the OpenGL
Fig. 10.25 Implementation of a texture calculation in the vertex shader (GLSL)
385
386
Fig. 10.26 Implementation of a texture calculation in the fragment shader (GLSL)
10 Textures
10.3 Exercises
387
Fig. 10.27 Creating the TBN matrix for transformation into tangent space in the fragment shader (GLSL)
Fig. 10.28 The calculation of the disturbed normals from the normal map and their transformation into the tangent space in the fragment shader (GLSL): These normal vectors are further used in the fragment shader in the illumination calculation
10.3 Exercises Exercise 10.1 Use a custom image in jpg format as a texture for several cylinders of different sizes. Write an OpenGL program that implements this.
388
10
Exercise 10.2 An image used as a background remains unchanged even if the viewer moves. Place a background image on the front surface of a distant, large cube so that the viewer experiences a change in the background when he moves, at least within specified limits. Write an OpenGL program that implements this.
References 1. J. F. Blinn. “Simulation of Wrinkled Surfaces”. In: SIGGRAPH Computer Graphics 12.3 (1978), pp. 286–292. 2. The Khronos Group Inc. “OpenGL 2.1 Reference Pages”. Abgerufen 03.04.2019. 2018. URL: https://www.khronos.org/registry/OpenGL-Refpages/gl2.1/xhtml 3. E. Lengyel. Mathematics for 3D Game Programming and Computer Graphics. Boston: Course Technology, 2012. 4. NASA. “Taken Under the ‘Wing’ of the Small Magellanic Cloud”. Abgerufen 08.02.2021, 19:21h. URL: https://www.nasa.gov/image-feature/taken-under-the-wing-of-thesmall-magellanic-cloud. 5. NASA. “The Blue Marble: Land Surface, Ocean Color, Sea Ice and Clouds (Textur)”. Abgerufen 07.03.2019, 22:51h. URL: https://visibleearth.nasa.gov/view.php?id=57735. 6. A. Nischwitz, M. Fischer, P. Haberäcker and G. Socher. Computergrafik. 3. Auflage. Vol. 1. Computergrafik und Bildverarbeitung. Wiesbaden: Springer Fachmedien, 2011. 7. M. Segal and K. Akeley. The OpenGL Graphics System: A Specification (Version 4.6 (Core Profile)–October 22, 2019). Abgerufen 8.2.2021. The Khronos Group Inc, 2019. URL: https:// www.khronos.org/registry/OpenGL/specs/gl/glspec46.core.pdf
Special Effects and Virtual Reality
11
This chapter contains selected special topics in computer graphics. Since virtual reality (VR) applications are an important application area of computer graphics, factors are explained that can create a high degree of immersion in such applications so that a user feels truly present in a virtual environment. Simulations of fog, particle systems or dynamic surfaces can create realistic effects in computer graphics scenes. For interactive computer graphics, the selection of objects and the detection and handling of collisions are important. This allows users to explore and manipulate three-dimensional virtual worlds. For these topics, this chapter contains the technical basics, supported by OpenGL examples for most topics. Since the sense of hearing contributes greatly to immersion and thus presence, this chapter presents some important findings and technical principles for auralising acoustic scenes. The last part of this chapter contains a summary of important factors for the creation of a visual impression of depth in a scene. The focus is on seeing with both eyes (binocular vision) and the technical reproduction through stereoscopic output techniques. These techniques are used in 3D television and virtual reality headsets.
11.1 Factors for Good Virtual Reality Applications For the simulation of a virtual environment, the stimulation of the sense of sight through images and image sequences is crucial, as this sense is the dominant human sense. (1) The more the observer feels surrounded by the stimulation, (2) the clearer the stimulation is and (3) the more interactions with the scene and the
Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/978-3-031-28135-8_11.
© Springer Nature Switzerland AG 2023 K. Lehn et al., Introduction to Computer Graphics, Undergraduate Topics in Computer Science, https://doi.org/10.1007/978-3-031-28135-8_11
389
390
11 Special Effects and Virtual Reality
objects are possible, the more the observer gets involved in the virtual world. These points are three factors of the so-called immersion. Two other factors of immersion are (4) the number of senses involved, such as sight, hearing, touch, smell, taste, balance, body sensation, temperature sensation and pain sensation and (5) the degree of correspondence between these sensory modalities. Another important factor for immersion is (6) the consistency of the possible actions enabled by the virtual representation. In other words: “Immersion is the objective degree to which a VR system and application projects stimuli onto the sensory receptors of users in a way that is extensive, matching, surrounding, vivid, interactive, and plot conforming.” [19, p. 45 after Slater and Wilbur 1997]. If there is a high enough degree of immersion, the user can feel present in the virtual world, which is the perception of actually “being there” in the simulated world. In order to experience presence, the user himself plays a very important role. Only when he or she willingly engages with the situation in the simulation true presence can arise. Immersion is therefore the important technical prerequisite for experiencing presence, so that the user feels present [19, p. 46–47]. For the sake of completeness, it should be noted that the described monotonic correlation between the degree of immersion and presence applies in principle, but partially breaks down for renderings and animations of human characters. This breakdown is described by the so-called uncanny valley. When artificial human characters or robots in the real world become very similar to real humans, but not yet similar enough, a sudden rejection occurs in the perceiver. This effect can be seen, for example, when viewing or coming into contact with prosthetic limbs or cadavers. Only when the similarity increases again and the simulation is very close to the human (behaviour), the acceptance increases again [22]. For this reason, first animated films in which real people are animated were not very successful, because the simulation was close but not close enough to the human (behaviour). Solutions are, for example, the introduction of a higher degree of abstraction, as is the case in cartoons, or very detailed and natural renderings and animations. Immersion can be increased, for example, by enabling the user to interact with the virtual scene as fully as possible. This includes free movement possibilities in the virtual world. This requires position and orientation sensors (tracking systems) and possibly eye-tracking systems, through which the position, orientation and viewing direction of the user can be recorded. The free navigation in a scene ultimately requires the change of the viewer position, which has already been explained in the context of projections in Sect. 5.8 and already made possible in most sample programs offered with this book. For this, the mouse must be used instead of moving freely. However, the main computation techniques do not differ between virtual reality applications and standard computer graphics. The same applies to interactions with objects in the scene (see Sects. 11.7 and 11.8). Even without special input devices, such as flysticks or data glove, this interaction can be simulated using the mouse. Interaction also means that objects in the scene can be influenced, for example, moved or scaled. The underlying principles can be understood using two-dimensional displays, like computer monitors. Although commercial applications for designing virtual worlds directly in the virtual three-dimensional world have been available for
11.2 Fog
391
the last few years, the design and development of virtual reality applications is still largely done using two-dimensional displays and mouse and keyboard interaction. Furthermore, the immersion of virtual reality applications can be significantly increased by the possibility of stereoscopic viewing by the user (see Sect. 11.12), called stereoscopy. For this purpose, special output devices such as stereoscopic displays using special glasses (colloquially “3D glasses”) or head-mounted displays (colloquially “VR glasses”) are available. For today’s virtual reality applications, the presentation of visual and auditory stimuli is the most advanced compared to the stimulation of other human senses. Moreover, interaction with the scene is almost always possible. Since the sense of hearing is the second most important human sense after the sense of sight, the foundation of auralisation of virtual acoustic scenes is given in Sect. 11.11. Haptic stimulation occurs little or in simple form in such applications, for example, as vibration in a hand control (controller). With certain data gloves, tactile stimulation is possible. Some of these devices can be used as input devices for user interaction. In some more advanced applications, the sense of balance is addressed by treadmills or by a moving surface, called motion platform. All other sensory modalities are very rarely used. When human senses conflict to each other at a high degree of immersion, the perceptual system can become irritated and react with nausea, dizziness, sweating or even vomiting. This occurs, for example, when flying virtually (and only visually) through a scene when the user is at the same time standing or sitting still in the real physical world. In this case, the information from the sense of sight and balance to the brain do not reflect the real-world situation. This situation can lead to the so-called motion sickness, which causes the symptoms mentioned above. A more general term for these types of effects is VR sickness, which covers other situations in virtual reality applications that can trigger symptoms of illness [19, p. 160]. These include, for example, the delay (latency) between the user’s input and the realisation of the effect or change in the virtual world. This latency in interaction is one of the most important triggers of sickness in virtual reality applications. Therefore, it is important that virtual reality applications are realised by powerful and efficient computer graphics systems (software and hardware) that minimise latency.
11.2 Fog When light penetrates a medium, such as smoke, haze, fog or clouds, an interaction of light with the medium occurs through physical effects such as absorption, scattering, reflection, refraction, birefringence, optical activity or photoeffects. Fog consists of fine water droplets formed by condensation of humid and supersaturated air. Unlike clouds, fog is always in contact with the ground. The light is refracted at the surface of the water droplets and reflected by these particles. Since the water droplets have a similar size to the wavelength of visible light, so-called Mie scattering occurs, which triggers the Tyndall effect, in which bundles of light are scattered out of the fog
392
11 Special Effects and Virtual Reality
medium. Only through this do the water droplets in the air become visible in white colour and the light is attenuated by the fog. [25, p. 431] In the context of volume rendering, the path tracing approach is used to visualise such effects, which is an extension of ray tracing (see Sect. 9.9). If the light modelled by rays hits a particle of the medium along a path, then the following four effects can be distinguished according to this approach. • Through absorption, incident photons of light are absorbed and converted into another form of energy, such as heat. • By emission, photons emanate from the particle when the medium reaches a certain temperature. This is the case, for example, with a burning fire. • By scattering away from the particle (out-scattering) scattered photons emanate from the particle. • By scattering to the particle (in-scattering), photons arrive at a particle that were previously scattered away from another particle. Extinction is a measure of attenuation and according to this model is composed of absorption and scattering away from the particle. These effects are in principle wavelength dependent, which is already taken into account in complex modelling approaches, see, for example, [20]. In practical interactive applications, however, a simplification is usually made by performing a separate calculation only for the three colours of the RGB colour model [1, p. 589–591]. More detailed information on this topic can be found, for example, in [21] and [26, Chap. 11]. The simulation of all these described effects, for example, in the context of a path tracing method, requires complex calculations. The sufficiently detailed consideration of the dependence on the wavelengths also increases the complexity. These types of detailed representations are more suitable for non-interactive computer graphics. Therefore, two very simple models for the representation of fog are described below, which have been used successfully in interactive computer graphics for a long time (see also [1, p. 600–602]). These models are available in the OpenGL fixed-function pipeline (see Sect. 11.3). To represent fog, a monotonically decreasing function f b : IR+ 0 → [0, 1] with f b (0) = 1 and lim f b (d) = 0 is needed to determine the fog factor. If d with d ≥ 0 d→∞
is the distance of the object from the observer, Cobject is the colour intensity of the object and Cfog is the colour intensity of the fog, the colour intensity of the object C lying in the fog is given by the following equation. This equation is analogous to the interpolated transparency. (11.1) C = f b (d) · Cobject + (1 − f b (d)) · Cfog Since f b (d) approaches zero with increasing distance d, the colour of the fog dominates at greater distances. This formula can be applied to any colour channel of the RGB colour model. For the fog factor, a function with a linear or exponential decay is usually used. For linear fog the following formula can be used. d1 − d (11.2) f (d) = d1 − d0
11.3 Fog in the OpenGL
393
d0 denotes the distance at which a fog effect should occur. d1 is the distance at which the fog dominates the object. In order to maintain this visual restriction outside these limits and to limit it to the interval [0, 1], the following formula can be applied. f b (x) = min(max(x, 0), 1)
(11.3)
After calculating f (d) according to Eq. (11.2), the result is inserted into Eq. (11.3) for x. This limitation is also called clamping in computer graphics. It follows that the linear fog factor as a function of distance d is given by the following equation. ⎧ if d ≤ d0 ⎪ ⎨1 d1 −d (11.4) f b (d) = d −d if d0 < d < d1 ⎪ ⎩ 1 0 0 if d1 ≤ d This means that there is no visual restriction up to the distance d = d0 . From distance d = d1 the fog completely dominates, so that from this distance no more objects are visible. Between d0 and d1 , the fog effect increases linearly. This simple fog model can be used to generate so-called depth fog or distance fog to highlight within the scene which objects have a greater distance from the observer than other objects. Since distant objects can be obscured by fog, depending on the parameter settings, the sudden appearance and disappearance of background objects can be masked by fog. The more realistic exponential fog is based on an exponential increase in fog by a factor of α > 0 and uses the following function. f (d) = e−α·d
(11.5)
The greater the value α, the denser the fog. Therefore, α is called the fog density. A stronger exponential decay can be achieved with the following function where the exponent is squared. 2 f (d) = e−(α·d) (11.6) Together with the blending function, the fog increases more strongly with increasing distance d. Also when using Eqs. (11.5) and (11.6) for the realisation of exponential fog, a limitation of the result to the interval [0, 1] is performed applying Eq. (11.3). Only this result is used in Eq. (11.1) as the limiting fog factor for computing the colour values per colour channel of the object lying in the fog.
11.3 Fog in the OpenGL In the OpenGL fixed-function pipeline, the realisation of fog is implemented according to the formulae from Sect. 11.2. The core profile does not contain any special functions for the representation of fog. Therefore, this section first illustrates the realisation of fog using shaders (see also [31]). Based on this, the parameter settings for fog rendering in the compatibility profile are explained at the end of this section. Figures 11.1 and 11.2 show a vertex shader and a fragment shader for the representation of fog. The user-defined input variables and uniforms specified in these
394 1
11 Special Effects and Virtual Reality #430 core version // Vertex-Shader for demonstrating fog
3 5 7 9 11 13 15 17 19 21 23 25
// User defined in variables // Position and color of vertex layout (location = 0) in vec3 vPosition; layout (location = 1) in vec3 vColor; // Definition of uniforms // Projection and model-view matrix layout (location = 0) uniform mat4 pMatrix; layout (location = 1) uniform mat4 mvMatrix; // User defined out variable // Color of the vertex out vec4 color; // Vertex position in camera/eye coordinates out vec4 eyecoord_Position; void main(void) { // Position of vertex in camera/eye coordinates eyecoord_Position = mvMatrix * vec4(vPosition, 1.0); // Model-view-perspective transform gl_Position = pMatrix * eyecoord_Position; // Color information to the next pipeline stage color = vec4(vColor, 1.0); }
Fig. 11.1 Vertex shader for the realisation of fog (GLSL)
shaders must be defined by the associated OpenGL application (for example, by a JOGL program) and the corresponding data passed through it (see Sects. 2.9 and 2.10 for the basic approach). The majority of the fog calculation in this shader pair takes place in the fragment shader. The crucial part of the fog calculation in the vertex shader is the determination of the vertex position in camera coordinates eyecoord_Position and forwarding this information to the next stage of the pipeline. Through rasterisation, this position is calculated for each fragment and is available under the same name in the fragment shader (see Fig. 11.2). Since the origin of the camera coordinate system is at the position of the camera, the fragment position eyecoord_Position corresponds to the vector between the viewer and the fragment for which (a fraction of) fog is to be calculated. The distance between the observer and the fragment is thus the length of this vector, which is stored in the variable FogFragCoord. This variable corresponds to d according to Sect. 11.2. The determination of the vector length through FogFragCoord = length(eyecoord_Position);
can be simplified by using the absolute value of the z coordinate as follows: FogFragCoord = abs(eyecoord_Position.z);
11.3 Fog in the OpenGL
2 4 6
#430 core version // Fragment-Shader for demonstrating fog // in // in
User defined in variables vec4 color; // fragment color Fragment position in eye coordinates vec4 eyecoord_Position;
8 10 12 14 16 18
// Uniforms for fog parameters layout (location = 2) uniform int FogMode; layout (location = 3) uniform vec4 FogColor; // Parameters for linear fog layout (location = 4) uniform float FogEnd; layout (location = 5) uniform float FogScale; // Parameter for exponential fog layout (location = 6) uniform float FogDensity; // User defined out variable, fragment color out vec4 FragColor;
20 22 24 26
void main (void) { // Distance between camera/eye and fragment float FogFragCoord; // Exact distance calculation FogFragCoord = length(eyecoord_Position);
28
// Factor for blending with the fog color float fog; if (FogMode == 0){ // fog calculation with linear formula (linear fog) fog = (FogEnd - FogFragCoord) * FogScale; } else if (FogMode == 2) { // fog calculation with squared exponent of // exponential function fog = exp(-FogDensity * FogDensity * FogFragCoord * FogFragCoord); } else { // (FogMode == 1) or other value // fog calculation with exponential formula // (exponential fog) fog = exp(-FogDensity * FogFragCoord); }
30 32 34 36 38 40 42 44
// Clamp to interval [0,1] fog = clamp(fog, 0.0, 1.0);
46 // Blending fragment color with fog color FragColor = vec4(mix(vec3(FogColor), vec3(color), fog), color.a);
48 50
}
Fig. 11.2 Fragment shader for the realisation of fog (GLSL)
395
396
11 Special Effects and Virtual Reality
If the viewer is far enough away from the fragment and the viewer position deviates little from a perpendicular line of sight, the resulting error is hardly noticeable. Using this simplification increases the efficiency for the calculation. A further optimisation can be made by assuming that the distance between the viewer and the fragment is always greater than or equal to zero. In this case fog = clamp(fog, 0.0, 1.0);
can be exchange by fog = min(fog, 1.0);
The next step in the fragment shader is to determine the fog factor fog for mixing the fragment colour with the fog colour. The parameter FogMode is used to select one of the three calculation modes for the fog factor given in Sect. 11.2 (called f there). The necessary parameter values are passed from the OpenGL application to the fragment shader using uniform variables. When applying the linear formula for the fog factor, the variable FogScale is used. The value for this variable can be determined as follows: FogScale = 1.0 / FogEnd - FogStart;
Since this value does not change after the choice of parameters, it is pre-calculated in the OpenGL application and passed to the fragment shader. The formula for FogScale corresponds to the term 1/(d1 − d0 ) according to Sect. 11.2. In a further step, the function clamp is applied to restrict the fog factor to the interval [0, 1], analogous to the calculation of the value f b according to Eq. (11.3). Finally, the colour of the fragment FragColor is determined by mixing the original fragment colour color with the colour of the fog FogColor by linear interpolation using the function mix. The weighting factor for the fog is the fog factor fog. This operation applies Eq. (11.1) (Sect. 11.2) to each of the three RGB colour components. There is no change in the alpha value of the fragment. This procedure first determines the opaque colour of the fragment of an object, which is followed by blending with the colour of the fog. This is therefore volume rendering with a homogeneous medium. In principle, this operation can also be realised by using the alpha channel and the blending functions, which can be executed as per-fragment operations after the fragment shader. The shader pair specified in Figs. 11.1 and 11.2 performs an accurate but at the same time a complex fog calculation. An increase in efficiency can be achieved by performing the calculation of the distance FogFragCoord and the calculation of the fog factor fog in the vertex shader. This reduces the computational effort if there are fewer vertices than fragments. The rasterisation then generates a fog factor for each fragment, if necessary by interpolation, so that only the blending between the fog colour and the fragment colour has to take place in the fragment shader (see last
11.3 Fog in the OpenGL
397
Fig. 11.3 Example scenes with fog created by a renderer in the core profile with the shader pair from Figs. 11.1 and 11.2: Linear fog (FogMode = 0) was used in the upper part. Exponential fog (FogMode = 1) was used in the lower part
command in the fragment shader in Fig. 11.2). This separation of the calculation is useful to increase efficiency for scenes with many small polygons. However, if the scene consists of large polygons, such as in a large landscape, then calculating the fog factor per fragment makes sense. Figure 11.3 shows the result of fog calculation with the shader pair specified in this section. For this example, the bright fog colour FogColor = (0.9, 0.9, 0.9) was chosen, which was also set for the background. For the application of the linear formula (at the top of the figure), the values FogStart = 0 and FogEnd = 18 were set. The exponential fog was calculated using Eq. (11.5) with α = FogDensity = 0.12. The more realistic fog representation by the exponential formula is clearly visible in the figure. As explained at the beginning of this section, calculation and rendering of fog is part of the fixed-function pipeline and thus available in the compatibility profile. In this profile, fog can be enabled by the JOGL command glEnable(gl.GL_FOG). As default values for the fog parameters the fog colour black (0, 0, 0), as fog type
398
11 Special Effects and Virtual Reality
the exponential calculation according to Eq. (11.5) (GL_EXP) and as fog density the value 1 (GL_FOG_DENSITY) are specified. The type of fog calculation according to Eq. (11.6) and a different fog density can be set, for example, as follows in the JOGL source code. gl.glEnable(gl.GL_FOG); gl.glFogf(gl.GL_FOG_MODE, gl.GL_EXP2); gl.glFogf(gl.GL_FOG_DENSITY, 0.2f);
Below is a JOGL example to set the linear fog calculation according to Eq. (11.2) and to set the fog colour and the two fog parameters for the linear calculation. gl.glEnable(gl.GL_FOG); gl.glFogf(gl.GL_FOG_MODE, gl.GL_LINEAR); float[] FogColor = {0.9f, 0.9f, 0.9f}; gl.glFogfv(gl.GL_FOG_COLOR, FogColor, 0); gl.glFogf(gl.GL_FOG_START, 0.1f); gl.glFogf(gl.GL_FOG_END, 10f);
The default values for the start distance of the fog effect is 0 and for the end distance of the fog effect is 1. Whether the distance between the fragment and the camera is calculated by the exact determination of the vector length or whether it is approximated by the absolute value of the z component of the difference vector (see above for the discussion of the advantage and disadvantage) is left to the concrete OpenGL implementation for the GPU. The fog calculation in the compatibility profile does not work if a fragment shader is active. Further details on fog calculation and possible parameter settings can be found in the OpenGL specification of the compatibility profile [32].
11.4 Particle Systems Linear and exponential fog models as presented in Sect. 11.2 model a homogeneous fog of constant density. Individual clouds of fog hanging in the air are not coved by these simple models. Such effects and related ones, such as smoke, can be reproduced by particle systems [28,29]. A particle system consists of very many small objects (particles) that are not controlled individually but by a random mechanism with a few parameters. The individual particles are assigned a basic behavioural pattern that varies individually by the random mechanism. A particle system is often defined by the characteristics outlined below. As an example, consider a sparkler whose sparks are modelled by a particle system (see Fig. 11.4).
11.4 Particle Systems
399
Fig. 11.4 The sparks of a sparkler as an example of a particle system
• The point of origin of a particle: For the sparkler, this would be a random point on the surface. • The initial velocity of a particle. • The direction in which a particle moves: For the sparkler, a random direction away from the surface could be chosen. • The lifetime of a particle: For the sparkler, this corresponds to how long a spark glows. • The intensity of particle emission: The time between the generation of two successive particles. • The appearance of the particles. In many cases, particles are very small elements, such as the sparks of the sparkler. This does not imply that each particle must necessarily be modelled individually as an object. For example, for a sandstorm, if each grain of sand were represented as a single particle or object, the number of particles would be so large that the computations of the movements and the entire rendering would take far too long. Instead, such particles are often grouped together to form a larger abstract particle. Often a larger particle in the form of a simple surface—such as a rectangle or triangle—is used, on which a matching, possibly semi-transparent texture is placed [14]. For the sparkler, a transparent texture could be used with individual bright points representing sparks. For a sandstorm, a semi-transparent sand pattern could be used as a texture. In this way, the number of particles, which are now larger, and thus the animation effort can be significantly reduced. Other aspects can be included in a particle system. For example, particles may have the ability to split into particle of smaller or equal size. Often a force is taken into account that influences the trajectory of the particles. In the simplest case, this
400
11 Special Effects and Virtual Reality
can be gravity, such as in a fountain modelled as a particle system of water droplets. In this case, the trajectory of a particle as it is created could be calculated from its initial velocity and direction. Applying a hypothetical weight acted upon by the gravitational acceleration would result in a parabolic trajectory for the particles. If dynamic forces, such as non-constant wind, are to be taken into account, the trajectories of the particles must be constantly updated during the animation. For such calculations, physics engines [24] are used, which specifically handle the calculations for physical simulations, especially movements and deformations of objects. Swarm behaviour [30], as used for modelling swarms of birds or fish, is related to particle systems. In a swarm, however, interaction and coordination of the individuals with each other play a major role. The individuals are not perfectly synchronised, but they move at approximately the same speed in approximately the same direction. Collisions between individuals are avoided. Therefore, techniques from the field of artificial intelligence are often used to model swarm behaviour. For clouds, a quite realistic animation model is described in [12]. The sky is first divided into voxels, with each voxel representing a cell of a cellular automaton. Depending on how the state transitions of this automaton are defined, different types of clouds can be simulated. Depending on their current state, each individual cell is assigned to a sphere of a certain colour. The resulting structure is mapped onto a sky texture.
11.5 A Particle System in the OpenGL In this section, the realisation of a simple particle system with the help of shaders in OpenGL under Java is explained using the example of a confetti cannon. An implementation of the corresponding OpenGL application in the programming language C is described in [31, p. 470–475]. The confetti cannon is to eject individual particles (confetti) of random colour, which are rendered as points (GL_POINTS). The particles move at a constant speed in a certain direction and are pulled downwards by the Earth’s gravitational field over time. For this simple system, the following properties must be modelled for each individual particle. • The starting position s0 at which the particle is created (three-dimensional coordinate). • The colour of the particle (RGB colour value and an alpha value). • The constant velocity v with which a particle moves within the three spatial dimensions (three-dimensional velocity vector). • The start time t0 at which the particle is created. The gravitational acceleration (acceleration due to gravity), by which a particle is pulled downwards, is to be taken into account with the approximate value of g = 9.81m/s2 . The lifetime t of a particle is the current simulation time minus the start
11.5 A Particle System in the OpenGL
401
time of a particle, i.e., t = tcurrent − t0 . This allows the new position of a particle to be determined using the following formula. 1 (11.7) · g · t2 2 This relationship is in accordance with the laws of motion of mechanics. This formula must be applied to each component of the position vector of a particle. The gravitational acceleration should only act in the y-direction. Therefore, g is zero for the x- and z-components. The downward direction of this acceleration (in the negative y-direction) is already taken into account by the minus sign in the formula above. Figures 11.5 and 11.6 show a vertex shader and a fragment shader for the realisation of the particle animation for the confetti cannon. The input variables defined in the vertex shader correspond to the properties of a particle given in the list above. Since a particle is to be represented by a point (see above), these properties can be defined by exactly one vertex. This vertex data must be generated only once in the OpenGL application (for example, by the init method of a JOGL program) with random values and passed to the OpenGL. The transformation matrices (pMatrix and mvMatrix) are passed as uniform variables from the OpenGL program once per frame as usual (for example, by the display method of a JOGL program). Sections 2.9 and 2.10 show the basic approach of passing data from a JOGL program to shaders. The generation of the random vertex data is left to the reader as a programming exercise. In addition, the current simulation time currentTime is required for this particle animation in the vertex shader. Figure 11.7 shows part of the display method of a JOGL renderer to update the simulation time and pass it to the vertex shader via the layout position 2 by a uniform. The variable currentTime must be declared in the renderer and initialised with zero in the init method. The last command in the source code section triggers the drawing of the particles as points whose data are already in an OpenGL buffer. For each of these points (on the GPU), an instance of the vertex shader is called. In the vertex shader (see Fig. 11.5), it is first checked whether the lifetime of the particle in question is greater than or equal to zero, i.e., whether the particle has already been created (born). If it has already been created, the distance the particle has travelled since its creation is determined. The new particle position can be determined according to Eq. (11.7). As stated earlier in this section, the calculation is done component-wise for each dimension of the particle position. The gravitational acceleration is only taken into account for the y-component, since it should only act in this direction. Furthermore, the colour of the particle is passed on to the next stage of the graphics pipeline without any change. If the particle has not yet been created (lifetime less than zero), the particle remains at its initial position. This newly calculated position of the particle is transformed using the transformation matrices and passed on to the next stage of the graphics pipeline. The lifetime of the particle is also passed on. s = s0 + v · t −
402
2 4 6 8 10 12 14
11 Special Effects and Virtual Reality #430 core version // Vertex-Shader for demonstrating a particle system // User defined in variables // Position, color, velocity and start time of vertex layout (location = 0) in vec3 vPosition; layout (location = 1) in vec4 vInColor; layout (location = 2) in vec3 vVelocity; layout (location = 3) in float vStartTime; // Definition of uniforms // Projection and model-view matrix layout (location = 0) uniform mat4 pMatrix; layout (location = 1) uniform mat4 mvMatrix; // Current simulation time layout (location = 2) uniform float currentTime;
16 18 20 22 24
// User defined out variables // Color and lifetime of particle out vec4 color; out float lifetime; void main(void) { vec4 newvPosition; // Lifetime of particle float t = currentTime - vStartTime;
26 if ( t >= 0.0) // particle is born { // Calculate displacement based on particle velocity newvPosition = vec4(vPosition + vVelocity * t, 1.0); // Subtract the displacement based on gravity newvPosition.y -= 4.9 * t * t; // Forward color to next pipeline stage color = vInColor; } else { // particle is not born yet // particle remains at initial position newvPosition = vec4(vPosition, 1.0); } // Calculation of the model-view-perspective transform // based on the new position gl_Position = pMatrix * mvMatrix * newvPosition; // Forward lifetime of particle to the next pipeline stage lifetime = t;
28 30 32 34 36 38 40 42 44
}
Fig. 11.5 Vertex shader for the realisation of a confetti cannon as a particle system (GLSL)
In the fragment shader (see Fig. 11.6), it is checked whether the particle has not yet been created (lifetime less than zero). If it has not yet been created, the fragment in question is discarded by discard and it is not forwarded to the next stage of the graphics pipeline and thus not included in the framebuffer. This means that this fragment is not displayed. The colour of the fragment is passed on to the next stage of the graphics pipeline without any change.
11.5 A Particle System in the OpenGL
2 4 6 8 10 12 14 16 18
403
#430 core version // Fragment-Shader for demonstrating a particle system // User defined in variables // Fragment color in vec4 colour; // Particle lifetime in float lifetime; // User defined out variable, fragment color out vec4 FragColor; void main (void) { // Discard particle (do not draw), // if particle is not born yet if (lifetime < 0.0) discard; // Forward input fragment color to the next pipeline stage FragColor = colour; }
Fig. 11.6 Fragment shader for the realisation of a confetti cannon as a particle system (GLSL) 1
public void display(GLAutoDrawable drawable) { [...] // Source code omitted
3 // Update simluation time currentTime += 0.01f; // Transfer current time via layout position 2 gl.glUniform1f(2, currentTime);
5 7 9 11
gl.glDrawArrays(GL.GL_POINTS, 0, particles.getNoOfParticles()); }
Fig. 11.7 Part of the display method for calculating and passing the system time (Java)
Figure 11.8 shows some frames of the animation of the confetti cannon created by the described particle system. The animation was performed with 30,000 points, all generated at the same starting position with random colours. The vectors (4, 5, 0)T m/s and (6, 12, 5)T m/s served as lower and upper bounds for the randomly generated velocity vectors of the particles. The start times for each particle were randomly chosen from the range of 0 to 10 seconds. This example shows how easy it is to realise a particle system using shaders. The data for the 30,000 particles are generated exactly once and transferred to the OpenGL. The main computations of the animation are done entirely by the GPU. The OpenGL application only determines and transfers some varying data per frame, such as the simulation time and the transformation matrices. Thus, GPU support for particle systems can be realised very efficiently with the help of shaders. By changing the transformation matrices, the scene itself can be manipulated during the particle animation. For example, a rotation or a displacement of the scene is possible.
404
11 Special Effects and Virtual Reality
Fig. 11.8 Some frames of the animation of the confetti cannon as a particle system
The particle system described in this section can be extended in many ways. The spectrum of colours can be narrowed down to, for example, only yellow-red colours, allowing for a spark simulation. The colours could change from red to orange and yellow to white depending on the lifetime of the particles, simulating a glow of the particles. If the lifetime of the particles is also limited, the impression of annealing of the particles is created. The alpha components can also be varied to slowly fade out a particle. The random locations of particle formation can be extended to a surface, for example, the surface of the sparkler considered in Sect. 11.4. As a result, particles appear to emanate from this sparkler. Another possible variation is to give the y-component of the position coordinate a lower limit, so that the particles gather at a certain place on the ground. In addition, the particles could be represented by lines instead of points, which would allow the representation of particles with a tail. Likewise, the particles could consist of small polygonal nets that are provided with (partly transparent) textures in order to simulate more complex particles. Furthermore, in addition to gravitational acceleration, other acceleration or velocity components can be added, which would enable rendering of (non-constant) wind effects, for example.
11.6 Dynamic Surfaces
405
11.6 Dynamic Surfaces In this book, motion of objects is modelled by applying transformations to the objects. These transformations usually describe only the motion of the objects, but not their deformation. For rigid objects such as a vehicle or a crane, this approach is sufficient to describe motion appropriately. If persons or animals move, the skin and the muscles should also move in a suitable way, otherwise the movements will lead to a robotlike impression. Therefore, for surfaces that move in a more flexible manner, more complex models are used. On the other hand, it would be too inefficient and complex to model the movement of the surface by individual descriptions for the movements of its surface polygons. One approach to animate dynamic surfaces is to model the surface in an initial and an end position and, if necessary, in several intermediate positions. The animation is then carried out by an interpolation between the different states of the surface. For this purpose, the method presented in Sect. 5.1.4 for converting the letter D into the letter C and the triangulation-based method from Sect. 6.4 can be adapted to threedimensional structures. These techniques are based on the principle that two different geometric objects are defined by corresponding sets of points and structures—usually triangles—defined by associated points. Then a step-by-step interpolation by convex combinations between corresponding points is carried out, defining intermediate structures based on the interpolated points. In Sect. 5.1.4, the structures formed based on the points are lines, quadratic or cubic curves. In Sect. 6.4, these structures are triangles. In any case, it is important that there is a one-to-one correspondence between the points of the two objects and that the associated structures in the two objects are defined by corresponding points. In the three-dimensional space, two surfaces can be modelled by triangles. It is important to ensure that the number of points in both objects is the same and there must be a one-to-one correspondence between the two groups of points establishing a one-to-one correspondence between the triangles that define the two surfaces. Figure 11.9 shows the intermediate steps (convex combinations for α= 0, 0.2, 0.4, 0.6, 0.8, 1) in the interpolation of two surfaces that are defined by points and triangles with a one-to-one correspondence. Instead of defining several intermediate states for the dynamic surface, motion can be described in terms of a few important points. For example, in a human arm, the motion is essentially constrained by the bones and the joints. The upper arm bone can rotate more or less freely in the shoulder joint, the forearm can only bend at the elbow joint, but not rotate. These observations alone provide important information for modelling arm motion. If only the movements of the bones are considered, a simplified model with only one bone for the upper and one for the lower arm is sufficient. When the hand of the arm carries out a movement, the bones simply follow the hand’s movement under the restrictions that are imposed by the joints. A swaying movement must automatically be carried out in the shoulder joint of the upper arm, as the elbow joint does not allow for rotations. The bones themselves, however, are not visible, so their movement must still be transferred to the surface,
406
11 Special Effects and Virtual Reality
Fig. 11.9 Intermediate steps for the interpolation between surfaces that are defined by points and triangles with a one-to-one correspondence
Fig. 11.10 Representation of a skeleton with a flexible surface (skinning)
in this example to the skin. The position of the arm bones can be clearly determined by three skeletal points—the shoulder, elbow and wrist. If the surface of the arm is modelled by freeform curves or approximated by triangles, each control point or vertex of the freeform curves or triangles can be assigned a weight relative to the three skeleton points. The weights indicate how much a point (vertex) on the surface is influenced by the skeleton points. This weight will usually be greatest for the skeletal point closest to the vertex. Vertices that are approximately in the middle between two skeletal points will each receive a weight of 50% for each of the neighbouring skeletal points. Figure 11.10 shows such a skeleton as it could be used for modelling an arm.
11.7 Interaction and Object Selection
407
The skeleton is shown dashed, the three skeleton points are marked by squares. The greyscale of the vertices of the very coarse triangle grid that is supposed to model the skin indicates how the weights of the vertices are chosen. The vertex at the bottom right is assigned to the right skeleton point with a weight of one. The vertices next to it have already positive weights for the right and the middle skeleton point. When the skeleton is moved, the skeleton points will undergo different transformations. With three skeleton points s1 , s2 , s3 moved by the transformations T1 , T2 (p) (p) (p) and T3 , a vertex point p of the surface with weights w1 , w2 , w3 to the skeleton points would be transformed according to the following transformation. ( p)
( p)
( p)
T p = w1 · T1 + w2 · T2 + w3 · T3 . ( p)
( p)
( p)
It is assumed that the weights form a convex combination, i.e., w1 , w2 , w3 ∈ ( p) ( p) ( p) [0, 1] and w1 + w2 + w3 = 1. This approach implies that the surface essentially follows the skeleton like a flexible hull. If a distinct muscle movement is to be modelled, the tensing and relaxing of the muscles results in an additional inherent movement of the surface. In this case, it is recommended to describe the surface of the object in different elementary positions—for example, with the arm stretched and bent—and then apply convex combinations of elementary positions for the movement of the skin. Instead of such heuristics, mathematical models for the surface movements can also be specified. For example, [37] shows cloth modelling for virtual try-on based on finite element methods.
11.7 Interaction and Object Selection A simple way for the viewer to interact with the three-dimensional world modelled in the computer is to navigate through the virtual world. When the viewer moves in a virtual reality application, special sensors are used to determine the viewer’s position and orientation. From the position of the viewer, which is determined by these sensors or manually by mouse control, a projection of the scene must be calculated using the viewer position as the projection centre. This topic is covered in Sect. 5.8. If the viewer should be able to interact with objects in the scene, suitable techniques are required to choose and select objects for the interaction. In a scene graph, it should be made clear which objects can be selected on which level and what happens when an object is selected. As a simple example, the three-dimensional model in the computer can serve as a learning environment in which the user explores a complex technical object, such as an aircraft. When an object is selected, the corresponding name and function is displayed. If the viewer is to trigger dynamic changes in the scene by selecting certain objects, for example, operating a lever, the corresponding movements must be implemented and executed when the user selects the corresponding object. The selection of an object in a scene is called picking in
408
11 Special Effects and Virtual Reality
computer graphics, especially when the scene is rendered on a two-dimensional screen and the mouse is used for selection. When object picking is carried out with the mouse, the problem of finding out which object has been picked must be solved. The mouse can only specify a twodimensional point in device coordinates (window coordinates) on the projection plane. This means, the object must be found in the three-dimensional scene to which the projected point belongs. This section describes some of the different selection methods that exist. One method of object picking is colour picking. With this method, the objects or even the individual polygons of the scene are coloured with a unique colour and stored in an invisible framebuffer. If this coloured scene is a copy of the visible scene, then the two-dimensional device coordinate of the mouse click can be used to read the colour at the same location from the invisible framebuffer. Since the colour is uniquely assigned to an object or polygon, this solves the problem of assigning the mouse click to an object. The main disadvantage of this method is the need to draw the scene twice. However, this can be done with the support of the GPU. Another method works with the back projection of the window coordinates of the mouse click into the model or world coordinate system. For this purpose, a back transformation of the viewing pipeline (see Sect. 5.36) must be carried out. Since no unique three-dimensional point can be determined from a two-dimensional coordinate, additional assumptions or information must be added. For this purpose, the depth information of a fragment can be read from the depth buffer (z-buffer). Equation (11.14) contains the mathematical steps for such a back transformation, which are explained in more detail with a source code example in Sect. 11.8. In the world coordinate system, it can be determined whether the point is inside an object or not. Alternatively, this comparison can also be made in the model coordinate system. To do this, the point for each object must be transformed again into the model coordinate system of the respective object before a comparison can take place. Instead of back-projecting a single point, the ray casting technique presented in Sect. 9.9 can be used. This technique involves sending a ray (or beam) into the scene to determine which object or objects are hit by the ray. The object closest to the camera is usually selected. The required comparison is usually made in the world coordinate system. For this purpose, the ray must be created in this coordinate system or transferred to this coordinate system, starting from the window coordinate of the mouse click. Determining whether a point lies within a scene object or is intersected by a ray can become arbitrarily complex depending on the object geometry and accuracy requirements. Determining the intersection points with all the triangles that make up the objects in a scene is usually too complex for realistic scenes. Therefore, in computer graphics, bounding volumes are used to increase efficiency. These bounding volumes represent or better enclose (complex) objects for the purpose of object selection by much less complex objects. This allows an efficient intersection calculation or an efficient test if a point lies within the bounding volume. As a further prerequisite, the object must be enclosed as completely as possible so that clicking on the outer points of the object also leads to the desired object selection. On the
11.7 Interaction and Object Selection
409
other hand, the object should be wrapped as tightly as possible so that a selection is only made when clicking on the object and not next to it. However, if the geometry of the selected bounding volume is very different from the geometry of the enclosed object, this is not always satisfactorily possible and a compromise between complete and tight enclosure may have to be found. For example, if a very thick and long cuboid is completely enclosed by a bounding sphere, the bounding volume must be significantly larger than the enclosed object. In this case, a wrong selection may occur if the bounding volume intersects an adjacent object that does not intersect the enclosed object. If the bounding sphere is made smaller so that it does not completely enclose the cuboid, then the range in which a wrong selection can occur when clicking next to the object can be reduced. However, a selection by clicking on certain border areas of the enclosed object is then no longer possible. For this example, choosing a different geometry for the bounding volume may make more sense. Some frequently used bounding volumes are explained below. A bounding sphere can be simply represented by its radius and the threedimensional coordinate of its centre. Determining whether a point lies within a sphere or whether it is intersected by a ray is thus relatively easy. Furthermore, this volume is invariant to rotations. When translating the enclosed object, the centre point can simply be moved with the object. For a uniform scaling of all dimensions, only a simple adjustment of the radius with the same factor is necessary. Only scaling with different scaling factors for the different coordinate directions require a more complex adjustment of the size of the bounding volume. Cuboids are often used as bounding volumes. If the edges of such a cuboid are parallel to the coordinate axes, the bounding volume is called an axis-aligned bounding box (AABB). Two vertices that are diagonally opposite with respect to all axes are sufficient to represent such a cuboid. The calculation of whether a point is inside the volume can be done by six simple comparisons of the coordinates. Translation and scaling of an AABB does not cause any difficulties. If the enclosed object is rotated, the AABB cannot usually rotate with it due to its property of parallel alignment to the coordinate axes. The object must rotate within the AABB, and the size of this cuboid must be adjusted. The AABB and the enclosed object may be more different from each other after rotation. The solution to this problem of rotating objects within an AABB is an oriented cuboid called oriented bounding box (OBB). An OOB can be arbitrarily rotated so that more types of objects can be more tightly enclosed than with an AABB. An OOB can be represented by its centre point, three normal vectors describing the directions of the side faces, and the three (half) side lengths of the cuboid. This representation is thus more complex than for the AABB. Likewise, testing whether a ray intersects the OOB or if a point lies within the OOB is more complex. On the other hand, this complexity has the advantage that an OOB can be transformed arbitrarily along with the enclosed object. [1, p. 959–962] provides methods for intersection calculation between a ray and a cuboid. A more complex bounding volume is the k-DOP (discrete oriented polytope). Unlike an AABB or an OOB, this volume can have more than six bounding surfaces, potentially enclosing an object more tightly than with an AABB or OOB. Any two
410
11 Special Effects and Virtual Reality
of these bounding surfaces of a k-DOP must be oriented parallel to each other. These pairs of bounding surfaces are referred to as slabs. The representation of a k-DOP is given by the normalised normal vectors of the k/2 slabs and two scalar values per slab, indicating the distance of the slabs from the origin and their extent. Thus k must only be even. In contrast to cuboidal bounding volumes, the normal vectors of the slabs do not have to be oriented perpendicular to each other. A rectangle in two-dimensional space can be regarded as a special case of a 4-DOP. In the three-dimensional space, a cuboid is a 6-DOP with pairwise normal vectors oriented perpendicular to each other. Since a k-DOP is defined as the set of k bounding surfaces that most closely encloses an object, it represents the best hull—in the sense of being closely enclosed—for a given k. [1, p. 945–946] contains a formal definition of a k-DOP. The most closely enclosing convex bounding volume for a given object is its (discrete) convex hull. In this case, the bounding surfaces are not necessarily parallel to each other and their number is unlimited. This results in a higher representational effort and a higher computational effort for testing whether a ray intersects the bounding volume or a point lies within the bounding volume as opposed to a kDOP. Besides the question of whether a bounding volume represents an object better or worse in principle, the problem to be solved is how the parameters of such a volume are concretely determined in order to enclose a given object as tightly as possible. The easiest way to derive an AABB is from the geometry of the object to be enclosed. For this purpose, the minima and maxima of all vertex coordinates of the object can be used to determine the corner points of the AABB. Since a k-DOP can be seen as an extension of an AABB and as it is defined as the bounding volume that, for a given k, most closely encloses the object, its determination for a concrete object is more complex than for an AABB, but nevertheless relatively simple. Determining a suitable bounding sphere is more complex than it may seem. A simple algorithm consists of first determining an AABB. Then, the centre of this geometry is used as the centre of the sphere. The diagonal from this centre to one of the vertices of the AABB can be used to determine the radius. However, this often leads to a not very tightly enclosed object. To improve this, the vertex furthest from the centre can be used to determine the radius. Determining the OBB that encloses the object in an optimal shape is the most complex among the bounding volumes presented in this section. Methods for generating well-fitting bounding volumes for concrete objects, along with further reading, can be found in [1, p. 948–953]. As these considerations on the different types of bounding volumes show, they each have specific advantages and disadvantages that have to be taken into account when implementing a concrete application. The following criteria are relevant for the selection of the optimal bounding volume. • The complexity of representation. • The suitability for a tight enclosure of the target geometry (of the to be enclosed object).
11.7 Interaction and Object Selection
411
• The effort required to calculate a specific bounding volume for an object. • The need and, if applicable, the effort for changing the bounding volume due to object transformations, for example, for motion animations. • The complexity of determining whether a point lies within the bounding volume. • The complexity of the intersection calculation between a ray and the bounding volume. For simple applications or for applications where a very fast selection of objects is required, but at the same time precision is not the main concern, simple bounding volumes that deviate more than slightly from the object geometry are sufficient under certain circumstances. For the example of a bounding sphere, this sphere can be chosen very small compared to its more cuboid-shaped object, in order to minimise a wrong selection of neighbouring objects. This will, in most cases, make the selection of the outer parts of the object impossible, but this may be acceptable. However, modern graphics systems are often capable of realising an accurate and efficient selection of objects in complex hierarchies of bounding volumes. The remainder of this section contains some considerations on bounding spheres. It is shown how to verify whether a point lies within this kind of bounding volume. It is also shown how to determine intersection points between a sphere and a ray. A sphere as a bounding volume can be represented by the position vector to its for any real vector centre c = (cx , c y , cz ) and its radius r ∈ R. Furthermore, q = (qx , q y , qz ) , let the Euclidean norm q = qx 2 + q y 2 + qz 2 be given. As known from mathematics, this norm can be used to find the length of a vector or the distance between two points (between the start and end points of the vector). A point p = ( px , p y , pz ) lies inside the sphere or on its surface if its distance from the centre of the sphere is less than or equal to its radius r . This can be expressed as an inequality as follows: c − p ≤ r. (11.8) Using the equation for the Euclidean norm, the following inequality results. (cx − px )2 + (c y − p y )2 + (cz − pz )2 ≤ r 2
(11.9)
For reasons of efficiency, the square root can be omitted when determining whether a point is inside the sphere or not. In general terms, Eqs. (11.8) and (11.9) for the sphere with centre c and radius r define all points within a sphere or on its surface. If the ray casting method is used, then the sphere must be intersected with a ray. To determine the intersection points with a sphere, let the ray be represented in the following parametrised form. s = o+t d
mit t ∈ R, t > 0
(11.10)
The point o is the starting point of the ray and the vector d is the displacement vector that determines the direction of propagation of the ray. The parameter t expresses how far a point on the ray is from its starting point in the direction of vector d. Let the sphere be given as above by its centre c = (cx , c y , cz ) and its radius r ∈ R.
412
11 Special Effects and Virtual Reality
Using Eq. (11.8), the following approach for calculating the intersection points is obtained. c − s = r (11.11) Since it is sufficient to determine the intersection points on the surface of the sphere, this equation contains an equal sign. Inserting Eq. (11.10) for the ray into Eq. (11.11) and using the definition h := c−o gives the following equation. h − t d = r Using the Euclidean norm (see above) results in a quadratic equation. (h x − t dx )2 + (h y − t d y )2 + (h z − t dz )2 = r 2
(11.12)
The maximum number of two possible solutions to this quadratic equation is consistent with the observation that a ray can intersect the surface of a sphere at most twice. In this case, the ray passes through the sphere. If only one solution exists, then the sphere is touched by the ray at one point. If no (real-valued) solution exists, then the sphere is not hit by the ray. After solving the quadratic equation (11.12), assuming that the displacement vector d of the ray is normalised to length one, i.e., d = 1, the parameter values of the two intersection points are given by the following equations. √ √ (11.13) t1 = a + b t2 = a − b with a := h x dx + h y d y + h z dz and b := a 2 − (h x 2 + h y 2 + h z 2 ) + r 2 If b becomes negative, then the root terms yield imaginary parts and the results for t1 and t2 are complex numbers. Since in this case there is no real-valued solution to the quadratic equation, the ray does not intersect the sphere at all. Therefore, if only the existence of an intersection point has to be verified, the calculation of b is sufficient. Subsequently, it is only necessary to verify whether b is greater than or equal to zero. If real-valued solutions (b ≥ 0) exist, then the intersection point with the smaller of the two parameter values t1 and t2 is closest to the starting point o of the ray. According to the definition in Eq. (11.10), the t values must not become smaller than zero. In this case, the intersections lie behind the observer. Therefore, it has to be verified whether t1 and t2 are positive. For t values equal to zero, the observer position (starting point of the ray) is exactly on the surface of the sphere and in the intersection point. [1, p. 957–959] presents a more efficient method for determining the intersection points of a ray with a sphere. In a scene with several selectable objects, if only the object with the smallest distance to the starting point of the ray is to be selected, the parameter values t for all intersections with the bounding spheres of the objects must be determined. The object hit by the ray with the smallest t is closest to the starting point o of the ray. If the length of the ray is very long, objects may be incorrectly selected if they are very close to each other and if the geometries of the bounding volumes do not exactly match those of the objects (which is the rule). In this case, the bounding volumes can overlap even though the objects are displayed separately. The more the
11.8 Object Selection in the OpenGL
413
object geometries deviate from the geometries of the bounding volumes, the more likely this is (see above). To increase the precision of selection, the value of the depth buffer at the point of the mouse click can be used. This value usually varies between zero and one. Fragments with large depth values are further away from the viewer than fragments with small depth values. If the value read is equal to one, then there is no fragment at the position viewed, but the background. In this case, object selection should be prevented in order to avoid an incorrect selection. Furthermore, if the depth buffer value is less than one, it can be used to shorten the ray and thus potentially create fewer wrong intersections with neighbouring objects. [18, Chap. 7] contains more robust intersection tests.
11.8 Object Selection in the OpenGL This section presents a JOGL program sketch with the essential elements for the realisation of object selection (picking) in the OpenGL. For this example, let each selectable object in the scene be enclosed by a bounding volume in the form of a sphere. These bounding spheres are represented by their centres and radii, as described in Sect. 11.7. Figure 11.11 shows two frames of an example scene containing cuboid objects, all enclosed by invisible bounding spheres. In the frame on the right, two yellow objects are selected. If the mouse pointer is moved to the scene and a mouse button is pressed, the mouse coordinates within the window must be determined. For this purpose, the
Fig. 11.11 A scene with selectable objects: In the image on the right, some objects are selected (highlighted in yellow). The scene was created by a JOGL renderer
414 1 3 5 7
11 Special Effects and Virtual Reality if (interactionHandler.isMouseClicked()) { int clickLocationWindowCoordinatesX = interactionHandler.getLastMouseClickLocationX(); int clickLocationWindowCoordinatesY = interactionHandler.getLastMouseClickLocationY(); identifyClickedObjects(gl, clickLocationWindowCoordinatesX, clickLocationWindowCoordinatesY); }
Fig. 11.12 Parts of the source code of the display method of a JOGL renderer (Java) for reading the mouse click coordinates if the mouse has been clicked since the last frame
JOGL interface MouseListener can be used.1 The following method of this interface is called when a mouse button is pressed. public void mouseClicked(MouseEvent event)
To be able to retrieve the mouse click information the JOGL renderer must implement this interface and this method and register itself as an observer (listener) for these events. When the mouse is clicked, this method is called and the argument of type MouseEvent2 supplies the two-dimensional coordinate of the position of the mouse pointer when clicked. This position can be queried as follows: int mouseClickLocationX = event.getX(); int mouseClickLocationY = event.getY();
Since the JOGL renderer outlined in Sect. 2.7 already implements the interface GLEventListener and the display method is called for each frame, it makes sense to process the mouse clicks by a separate object, which can also store the last clicked mouse coordinate. For this example, this object is called interactionHandler. Since rendering a frame and clicking the mouse takes place asynchronously, the interactionHandler object must also save whether a mouse click has taken place. Under these preconditions, the display method of the renderer, which is called for each frame, can query from this object whether the mouse has been clicked since the last frame. Figure 11.12 shows the relevant source code for this process. If a mouse click has taken place, the (last) coordinate can be read. The last call in this source code section initiates the process of identifying the object in the scene to be selected by the mouse click.
1 In this example, the JOGL interface MouseListener from the JOGL package com.jogamp.newt.event is used to identify mouse clicks in the GLWindow. A similar interface with the same name is available for other window systems, such as AWT or Swing. 2 For this example, the JOGL class MouseEvent from the JOGL package com.jogamp.newt.event is used.
11.8 Object Selection in the OpenGL
2 4 6
415
private float[] windowCoordinatesToObjectPosition( GL3 gl, float x, float y, PMVMatrix pmvMatrix) // Read current viewport int[] viewport = new int[4]; gl.glGetIntegerv(GL.GL_VIEWPORT, viewport, 0); int viewportOffset = 0;
8
// Transform orientation and origin of y-coordinate float xWin = x; int viewportHeight = viewport[3]; float yWin = (float) viewportHeight - y;
10 12
// Read value from depth buffer FloatBuffer depthValueBuffer = FloatBuffer.allocate(1); gl.glReadPixels((int) xWin, (int) yWin, 1, 1, gl.GL_DEPTH_COMPONENT, GL.GL_FLOAT, depthValueBuffer); // Convert result to Java array and variable float[] depthValueArray = new float[1]; depthValueArray = depthValueBuffer.duplicate().array(); float depthValue = depthValueArray[0];
14 16 18 20 22
// Transform point back from window coordinate system // into world coordinate system float[] objectPosition = new float[3]; boolean pointSuccess = pmvMatrix.gluUnProject(xWin, yWin, depthValue, viewport, viewportOffset, objectPosition, 0); return objectPosition;
24 26 28 30 }
Fig. 11.13 Submethod of the display method of a JOGL renderer (Java) for transforming a point back from the window coordinate system to the world coordinate system using a z-value from the depth buffer
In the next step, the transformation of the window coordinates back into the world coordinate system along the viewing pipeline, as mentioned in Sect. 11.7, must take place. Figure 11.13 shows a method that performs this reverse transformation. First, the position and dimensions of the currently used projection plane (viewport) are read. Section 5.1.2 contains explanations about the view window. After this operation, the array viewport contains—with ascending indices—the x-coordinate, the y-coordinate, the width and the height of the viewport. The x- and y-coordinates represent the position of the lower left corner of the viewport. For the back transformation, the depth value (z-value) of the clicked fragment must be read from the depth buffer. First, however, the origin of the y-coordinates must be transformed from the upper edge of the window to its lower edge. This changes the y-coordinate representation for the Java window to the representation for the OpenGL. The command glReadPixels reads the content of the depth buffer. Since the result is delivered in the efficiently implemented FloatBuffer data structure, the subsequent conversion into a Java array and the Java variable
416
11 Special Effects and Virtual Reality
depthValue is required. With the mouse click coordinates, the z-value from the depth buffer and the properties of the viewport, the reverse transformation can be comfortably performed using the method gluUnProject of the JOGL class PMVMatrix. If the transformation was successful, the result is available in world coordinates in the array objectPosition. The determined three-dimensional coordinate refers to the fragment whose z-value was read from the depth buffer and the result variable is named objectPosition. The back transformation of a two-dimensional coordinate—along the viewing pipeline—into a three-dimensional world coordinate is naturally not possible without additional information such as this z-component. The calculation rule for the method gluUnProject is given by the following equation.3 ⎛ 2(x W in−viewpor t[0]) ⎞ ⎛ ⎞ −1 viewpor t[2] x Obj ⎜ ⎟ ⎜ 2(yW in−viewpor t[1]) − 1 ⎟ ⎜ y Obj ⎟ ⎟ ⎜ ⎟ = (mv Matri x)−1 · ( pMatri x)−1 · ⎜ viewpor t[3] ⎜ ⎟ ⎝ z Obj ⎠ ⎝ 2(depthV alue) − 1 ⎠ w 1 (11.14) In this equation the variable names from Fig. 11.13 are used. The vector on the right describes the transformation back from the window coordinates to the normalised projection coordinate system (NPC). This vector is multiplied by the inverses of the projection matrix and the inverse of the model-view matrix. Section 5.10 provides more details on these matrices. Usually, the model-view matrix will not contain the transformation of the model coordinates of an object into the world coordinates. Thus, the result vector from Fig. 11.13 contains the point of the mouse click in world coordinates, related to the depth value of the fragment read from the depth buffer. If the model coordinates are required, the inverse of the transformation matrix from model coordinates to world coordinates must then be applied to each object. If the point of the mouse click is available in world coordinates, it must be tested for each clickable object whether this coordinate lies within the bounding volume of the respective object. Figure 11.14 shows the Java source code of the method contains of the class BoundingSphere. By calling this method with a threedimensional coordinate as argument, it can be determined whether this coordinate of a point lies within the bounding sphere represented by its centre and radius. This method applies Eq. (11.9). Instead of applying a back transformation, the ray casting method can be used. In this case a ray is generated in the world coordinate system on the basis of the clicked window coordinate in order to subsequently test it for intersections with selectable objects in the scene. Figure 11.15 shows the Java source code of a method with this functionality. Except for the call to the last method before the return expression, this method is identical to the method in Fig. 11.13. The gluUnProjectRay
3 See https://www.khronos.org/registry/OpenGL-Refpages/gl2.1/xhtml/gluUnProject.xml, retrieved 16.3.2019, 13:20h.
11.8 Object Selection in the OpenGL 1 3 5
417
public class BoundingSphere { // Radius and center of bounding sphere private float[] center; private float radius; // Source code omitted
7
public boolean contains(float[] point) { float[] tempVector = new float[3]; // translate point into origin tempVector[0] = point[0] - center[0]; tempVector[1] = point[1] - centre[1]; tempVector[2] = point[2] - center[2];
9 11 13
// Check if within radius (= within sphere) return ((tempVector[0] * tempVector[0]) + (tempVector[1] * tempVector[1]) + (tempVector[2] * tempVector[2])) 0 from the left camera. For use in the OpenGL, the coordinates of the near clipping plane must be determined. Since this plane is aligned parallel to the axes of the camera coordinate system (axis-aligned), the determination of two opposite corner points is sufficient. For the left camera, the x-coordinates x L ,L (left) and x L ,R (right) and the y-coordinates y L ,B
442
11 Special Effects and Virtual Reality
Fig. 11.23 x z-plane of the projection volume for the left camera of the off-axis method
(bottom) and y L ,T (top) must be determined. These values can be used as parameters in the OpenGL command glFrustum (see Sect. 11.12.4). From Fig. 11.23, the following relationship can be taken and converted to l. w (w/2) − (s/2) l s zn = tan ϕ = ⇔l = − zn d 2 2 d Since l is only the distance from the z-axis to the searched coordinate and the x-axis is oriented to the right in the figure, the left coordinate of the near clipping plane x L ,L results as follows: w s zn x L ,L = −l = − + . 2 2 d By a similar observation, the right coordinate x L ,R can be determined as follows: w s zn x L ,R = r = + . 2 2 d Since the camera positions for the left and right eyes differ only in the horizontal direction (x-coordinates), the pupillary distance s is not used to determine the y-coordinates of the near clipping plane. With this simplification, these coordinates can be calculated in a similar way as the x-coordinates and are given as follows: h zn h zn y L ,T = . y L ,B = − 2 d 2 d
11.12 Spatial Vision and Stereoscopy
443
By similar considerations, the x-coordinates and the y-coordinates of the near clipping plane for the right camera can be determined as follows: w w s zn s zn x R,L = − + x R,R = − 2 2 d 2 2 d h zn h zn y R,B = − y R,T = . 2 d 2 d In summary, the pupillary distance s and the distance to the plane for zero parallax d can be used to set the depth effect to be achieved for a stereoscopic presentation using the off-axis method. The width w and the height h of the plane in which the zero parallax occurs are also freely selectable. However, an aspect ratio that differs from the aspect ratio of the viewport would lead to a distorted presentation of the scene.
11.12.4 Stereoscopy in the OpenGL Stereoscopic rendering requires the scene to be rendered from two different camera positions (see explanations in Sects. 11.12.2 and 11.12.3). Figure 11.24 shows a Java method with parameters cameraPosXOffset and centerPosXOffset that cause the camera position to be displaced horizontally and the camera to be pointed to a horizontally displaced point. Calling the gluLookAt command sets the viewer position and the viewing direction by calculating the perspective transformation and model transformation matrices accordingly (see Sect. 2.8). For the toe-in method (see Sect. 11.12.3), centerPosXOffset = 0f must be set. This places the point to which the camera is pointed (specified by the centre vector) to the origin of the world coordinate system. In this case, only the argument for the horizontal shift of the viewer position (cameraPosXOffset) has to be varied in a call of this submethod. If the left camera is shifted to the left (negative value) and the right camera to the right (positive value), the result is an inward alignment of both cameras, which corresponds to the toe-in method.
1 3 5 7 9 11
private void displaySceneWithOffset(GL3 gl, float cameraPosXOffset, float centerPosXOffset) { // Apply view transform using the PMV-Tool pmvMatrix.glLoadIdentity(); pmvMatrix.gluLookAt(cameraPosXOffset, 0f, 4f, // eye centerPosXOffset, 0f, 0f, // center 0f, 1.0f, 0f); // up // draw scene displayObjects(); }
Fig. 11.24 Submethod of the display method (Java) for rendering the scene from a horizontally shifted camera position and different viewing angle
444
11 Special Effects and Virtual Reality
For the off-axis method, choose centerPosXOffset = cameraPosX Offset. A negative value moves a camera parallel to the left and a positive value moves it parallel to the right. This allows to shift the cameras horizontally, align the two cameras in parallel and pointing them to the projection plane with zero parallax according to the off-axis method (see Sect. 11.12.3). A horizontal shift of the camera corresponds to a straight head position with the eyes on a horizontal straight line. If information about the head orientation is available, for example, from a head tracker, then positions of the eyes that are not on a horizontal straight line can be taken into account. In this way, for example, a laterally tilted head can be simulated. To realise this, the specified method must be modified accordingly. The objects in the scene are then rendered as in the case of monoscopic output (non-stereoscopic output). This method from the figure can be used for various stereoscopic rendering and output methods. The simplest way of stereoscopic output when using a two-dimensional screen is to display the image for the left and right eye on the left and right side of the screen respectively, i.e., displaying the images side-by-side (see Sect. 11.12.2). Figure 11.25 shows a Java method which renders the scene into two adjacent viewports according to the toe-in method. Explanations of the viewport can be found in Sect. 5.1.2. By calling the method displaySceneWithOffset (see above), the camera for the left eye is moved (horizontalCamDistance / 2f) to the left and the camera for the right eye is moved the same amount to the right. Since the second argument in this call is zero, both cameras are facing inwards. Before these two render passes, the contents of the colour buffer and the depth buffer are cleared by the command glClear. Since there is depth information in the depth buffer after the calculation of the scene for the left eye, the content of the depth buffer must be cleared before rendering the scene for the right eye. The content of the colour buffer must not be deleted because the colour information of both scenes is written to different positions of the same colour buffer. Deleting the content of the colour buffer at this position in the source code would delete the output for the left eye. The parameters viewportWidth and viewportHeight contain the width and height of the available area of the current viewport. If the variable drawable is of type GLAutoDrawable, these values can be retreived by the commands drawable.getSurfaceWidth() and drawable.getSurfaceHeight() (call not shown). The width for the parts of the viewport for the image for the respective eye is set to half of the available width. The height remains unchanged. The command glViewport transmits these sizes to the GPU and thus determines the current viewport, so that the images for the left eye are displayed on the left half and the images for the right eye on the right half of the entire viewport. After the respective switching of the viewport area and before the render call for the scene content, the projection volume is set by calling the method setFrustum. Since the projection volumes for the left and right cameras are identical in the toe-in method, only this one method is required. Figure 11.26 shows the method setFrustum for setting the symmetrical frustum for the left and right cameras of the toe-in method. In this method, the corresponding projection matrix for the model-view transformation is essentially
11.12 Spatial Vision and Stereoscopy 1 3 5 7 9 11 13
445
private void displaySeparateImagesSideBySide( GLAutoDrawable drawable, float horizontalCamDistance, float fovy, int viewportWidth, int viewportHeight, float zNear, float zFar) { // Retrieve the OpenGL graphics context GL3 gl = drawable.getGL().getGL3(); // Clear color and depth buffer gl.glClear(GL3.GL_COLOR_BUFFER_BIT | GL3.GL_DEPTH_BUFFER_BIT); // Define viewport dimensions for left and right half int halfViewportWidth = viewportWidth / 2; int halfViewportHeight = viewportHeight;
15
// Set the viewport to the left half of the window gl.glViewport(0, 0, halfViewportWidth, halfViewportHeight); // Set the frustum and render left scene setFrustum(fovy, halfViewportWidth, halfViewportHeight, zNear, zFar); displaySceneWithOffset(gl, -horizontalCamDistance / 2f, 0f);
17 19 21
// Set the viewport to the right half of the window gl.glViewport(viewportWidth / 2, 0, halfViewportWidth, halfViewportHeight); gl.glClear(GL3.GL_DEPTH_BUFFER_BIT); // Set the frustum and render right scene setFrustum(fovy, halfViewportWidth, halfViewportHeight, zNear, zFar); displaySceneWithOffset(gl, horizontalCamDistance / 2f, 0f);
23 25 27 29 }
Fig. 11.25 Submethod of the display method (Java) for rendering the scene twice into side-byside viewports using the toe-in method
2 4 6 8 10 12 14
private void setFrustum(float fovy, int viewportWidth, int viewportHeight, float zNear, float zFar) { // Switch the pmv-tool to perspective projection pmvMatrix.glMatrixMode(PMVMatrix.GL_PROJECTION); // Reset projection matrix to identity pmvMatrix.glLoadIdentity(); // Create projection matrix pmvMatrix.gluPerspective(fovy, (float) viewportWidth / (float) viewportHeight, zNear, zFar); // Switch to model-view transform pmvMatrix.glMatrixMode(PMVMatrix.GL_MODELVIEW); }
Fig. 11.26 Submethod of the display method (Java) for creating a projection matrix that results in a projection volume in the form of a symmetric frustum for the toe-in method. This method can be used in identical form for the left and the right camera
446
11 Special Effects and Virtual Reality
created by gluPerspective so that a correct perspective is generated for the scene. As can be seen from the parameter list of the method in Fig. 11.25, values for the following parameters are required for rendering using the toe-in method. • horizontalCamDistance (s): Horizontal distance between the left and right camera (interpupillary distance). • fovy: Field of view (opening angle of the projection volume) in y-direction in degrees. • viewportWidth: Width of the current viewport in pixels. • viewportHeight: Height of the current viewport in pixels. • zNear (z n ): Distance of the near clipping plane to the camera. • zFar: Distance of the far clipping plane to the camera. In brackets are the corresponding mathematical symbols as used in Sect. 11.12.3. Figure 11.27 shows a frame of an example scene in which the image for the left eye was output on the left side and the image for the right eye on the right side of the window using the toe-in method. The values horizontalCamDistance = 0.1f, fovy = 45, viewportWidth = 640, viewportHeight = 480, zNear = 0.1f and zFar = 200f were used. Using the parallel viewing technique (see Sect. 11.12.2), a spatial perception is possible. By swapping the images, an output for cross-eyed viewing can be created. Figure 11.28 shows a frame of the same example scene as in Fig. 11.27, which in this case was created using the off-axis method. The image for the left eye is displayed on the left side of the window and the image for the right eye on the right side. The source code for rendering a scene using the off-axis method can be derived by adapting the mechanism for switching the partial viewports (see Fig. 11.25) for rendering according to the off-axis method for anaglyph images, which is shown in Fig. 11.29 (see below). The values horizontalCamDistance = 0.02f,
Fig. 11.27 Stereoscopic output generated by an OpenGL renderer in side-by-side viewports using the toe-in method: The image for the left eye is on the left side and the image for the right eye is on the right side. Using the parallel viewing technique (see Sect. 11.12.2) creates a spatial visual impression
11.12 Spatial Vision and Stereoscopy
447
Fig. 11.28 Stereoscopic output generated by an OpenGL renderer in side-by-side viewports using the off-axis method: The image for the left eye is on the left side and the image for the right eye is on the right side. Using the parallel viewing technique (see Sect. 11.12.2) creates a spatial visual impression
zeroParallaxDistance = 1f, surfaceWidth = 1f, surfaceHeight = 0.75f, zNear= 0.1f and zFar = 200f were used. These examples show how a stereoscopic presentation can work in the case of simple head-mounted displays that have a continuous screen or in which a smartphone is mounted in landscape format in front of both eyes by a simple construction. A corresponding (smartphone) application must essentially ensure that the images for the left and right eyes are correctly displayed side-by-side on the screen in landscape format and at the correct distance from each other. For relaxed viewing, such simple head-mounted displays usually contain (simple) optics. Another application of sideby-side rendering is the transmission of frames of stereoscopic scenes to an output device when only one image can be transmitted over the transmission channel. The receiving device must ensure that the images are separated again and displayed correctly to the viewer. Another possibility of stereoscopic output is the colour anaglyph technique (see Sect. 11.12), which works with a two-dimensional colour display and special glasses (colour anaglyph glasses). Figure 11.29 shows a Java method for generating a twodimensional colour anaglyph image. The method is similar to the method for rendering side-by-side images (see Fig. 11.25). Instead of rendering the images for the left and right eye into different viewports, the colour anaglyph technique renders the images into different colour channels. Since the OpenGL uses RGBA colour values, it is very easy to create a colour anaglyph image for the colours red and cyan. The command glColorMask determines into which RGBA colour channels of the colour buffer the respective image is written. In the method in the figure, this colour mask is used in such a way that the image for the left eye is coloured cyan6 and the image for the right eye is coloured red. After rendering the two partial images, the colour mask is reset. In this method, too, the content of the depth buffer must be deleted between the rendering passes for the left and right eye. The deletion of the colour buffer content is not necessary at this point, since the output is rendered
6 Cyan
is the colour that results from additive mixing of the colours green and blue.
448
2 4 6 8 10
11 Special Effects and Virtual Reality private void displayColorAnaglyphImage( float horizontalCamDistance, float zeroParallaxDistance, float surfaceWidth, float surfaceHeight, float zNear, float zFar) { // Retrieve the OpenGL graphics context GL3 gl = drawable.getGL().getGL3(); // Clear color and depth buffer gl.glClear(GL3.GL_COLOR_BUFFER_BIT | GL3.GL_DEPTH_BUFFER_BIT);
12 // Render left eye scene in cyan color gl.glColorMask(true, false, false, false); setLeftFrustum(horizontalCamDistance, zeroParallaxDistance, surfaceWidth, surfaceHeight, zNear, zFar); // Display scene with straight view of camera displaySceneWithOffset(gl, -horizontalCamDistance / 2f, -horizontalCamDistance / 2f);
14 16 18 20
// Clear depth buffer gl.glClear(GL3.GL_DEPTH_BUFFER_BIT); // Render right eye scene in red color gl.glColorMask(false, true, true, false); setRightFrustum(horizontalCamDistance, zeroParallaxDistance, surfaceWidth, surfaceHeight, zNear, zFar); // Display scene with straight view of camera displaySceneWithOffset(gl, horizontalCamDistance / 2f, horizontalCamDistance / 2f);
22 24 26 28 30
// Reset color mask gl.glColorMask(true, true, true, true);
32 }
Fig. 11.29 Submethod of the display method (Java) for generating a colour anaglyph image by rendering the scene twice using the off-axis method: The image from the parallel horizontally shifted viewer position to the left is output in cyan (green and blue colour channels). The image from the parallel horizontally shifted viewer position to the right is output in red
into different colour channels and thus no existing values of the left channel can be overwritten by values of the right channel. The other difference between the method shown in Fig. 11.29 and the method in Fig. 11.25 is the use of the off-axis method instead of the toe-in method. For this, the method displaySceneWithOffset is called with the argument (horizontalCamDistance / 2f) so that the position and viewing direction of the left camera is shifted to the left by half of the distance between the cameras. The position and viewing direction of the right camera is shifted to the right by half the distance between the cameras with the second call of this method. This causes the desired parallel alignment of the cameras to the plane for zero parallax and the shift of the camera positions in horizontal directions. In addition, different and asymmetrical projection volumes must be determined and applied for the left
11.12 Spatial Vision and Stereoscopy 1 3 5 7 9
private void setLeftFrustum( float horizontalCamDistance, float zeroParallaxDistance, float surfaceWidth, float surfaceHeight, float zNear, float zFar) { // Switch the pmv-tool to perspective projection pmvMatrix.glMatrixMode(PMVMatrix.GL_PROJECTION); // Reset projection matrix to identity pmvMatrix.glLoadIdentity();
11
float zNearByZeroParallaxDistance = zNear / zeroParallaxDistance; float left = (- surfaceWidth + horizontalCamDistance) / 2f * zNearByZeroParallaxDistance; float right = (surfaceWidth + horizontalCamDistance) / 2f * zNearByZeroParallaxDistance; float bottom = - surfaceHeight / 2f * zNearByZeroParallaxDistance; float top = surfaceHeight / 2f * zNearByZeroParallaxDistance; pmvMatrix.glFrustumf(left, right, bottom, top, zNear, zFar); // Switch to model-view transform pmvMatrix.glMatrixMode(PMVMatrix.GL_MODELVIEW);
13 15 17 19 21 23
449
}
Fig. 11.30 Submethod of the display method (Java) for creating a projection matrix leading to a projection volume for the left camera in the form of an asymmetric frustum for the off-axis method
and right cameras. This is done by calling the methods setLeftFrustum and setRightFrustum. Figure 11.30 shows the method setLeftFrustum to set the asymmetric frustum for the left camera. For this purpose, the command glFrustumf is used, which requires the coordinates of two vertices of the near clipping plane left (x L ,L ), right (x L ,R ), bottom (y L ,B ) and top (y L ,T ). In addition, the distances from the camera position to the near clipping plane zNear (z n ) and to the far clipping plane zFar are needed as arguments. In brackets behind these variables are the mathematical symbols used in Sect. 11.12.3. The section also contains the explanations of the formulae for calculating the coordinates of the near clipping plane, which are implemented in the method setLeftFrustum. Figure 11.31 shows the method setRightFrustum, which sets the asymmetric frustum for the right camera. In this case, the command glFrustumf requires the coordinates of the near clipping plane of the right camera left (x R,L ), right (x R,R ), bottom (y R,B ) and top (y R,T ). Furthermore, the distances from the camera position to the near clipping plane zNear (z n ) and to the far clipping plane zFar are needed as arguments. The correspondences to the mathematical symbols from Sect. 11.12.3 are given in brackets behind the names of the variables. Section 11.12.3 also contains the explanations of the formulae used in this method for calculating the coordinates of the near clipping plane for the right camera.
450 1 3 5 7 9
11 Special Effects and Virtual Reality private void setRightFrustum( float horizontalCamDistance, float zeroParallaxDistance, float surfaceWidth, float surfaceHeight, float zNear, float zFar) { // Switch the pmv-tool to perspective projection pmvMatrix.glMatrixMode(PMVMatrix.GL_PROJECTION); // Reset projection matrix to identity pmvMatrix.glLoadIdentity();
11
float zNearByZeroParallaxDistance = zNear / zeroParallaxDistance; float left = (- surfaceWidth - horizontalCamDistance) / 2f * zNearByZeroParralaxeDistance; float right = (surfaceWidth - horizontalCamDistance) / 2f * zNearByZeroParralaxeDistance; float top = surfaceHeight / 2f * zNearByZeroParralaxeDistance; float bottom = - surfaceHeight / 2f * zNearByZeroParralaxeDistance; pmvMatrix.glFrustumf(left, right, bottom, top, zNear, zFar); // Switch to model-view transform pmvMatrix.glMatrixMode(PMVMatrix.GL_MODELVIEW);
13 15 17 19 21 23 }
Fig. 11.31 Submethod of the display method (Java) for creating a projection matrix leading to a projection volume for the right camera in the form of an asymmetric frustum for the off-axis method
The parameters of the method from Fig. 11.29 can be summarised as follows: • horizontalCamDistance (s): Horizontal distance between the left and right camera (interpupillary distance). • zeroParallaxDistance (d): Distance of the plane in which the zero parallax (no parallax) occurs. • surfaceWidth (w): Width of the plane in which the zero parallax occurs. • surfaceHeight (h): Height of the plane in which the zero parallax occurs. • zNear (z n ): Distance of the near clipping plane from the camera. • zFar: Distance of the far clipping plane from the camera. In brackets are the corresponding mathematical symbols as given in Sect. 11.12.3. Figure 11.32 shows an example scene rendered as a colour anaglyph image using the off-axis method. For this horizontalCamDistance = 0.02f, zeroParallaxDistance = 1f, surfaceWidth = 1f, surfaceHeight = 0.75f, zNear = 0.1f and zFar = 200f were used. The red and cyan colouring of the two partial images is clearly visible. By using colour anaglyph glasses with a red filter on the left and a cyan filter on the right, a spatial visual impression is created when viewing this image. Figure 11.33 shows an example scene rendered as a colour anaglyph image using the toe-in method. The values horizontalCamDistance = 0.1f, fovy = 45, viewportWidth = 640, viewportHeight = 480, zNear = 0.1f
11.12 Spatial Vision and Stereoscopy
451
Fig. 11.32 Stereoscopic output generated by an OpenGL renderer as a colour anaglyph image using the off-axis method. Viewing with colour anaglyph glasses with a red filter on the left side and a cyan filter on the right side creates a spatial visual impression
Fig. 11.33 Stereoscopic output generated by an OpenGL renderer as a colour anaglyph image using the toe-in method. Viewing with colour anaglyph glasses with a red filter on the left side and a cyan filter on the right side creates a spatial visual impression
452
11 Special Effects and Virtual Reality
and zFar = 200f were used. The source code for the calculation of a colour anaglyph image according to the toe-in method can be obtained by applying the calculation for the toe-in method for the output to side-by-side viewports (see Fig. 11.25) to the generation of colour anaglyph images (see Fig. 11.29). In principle, rendering of coloured scenes is possible with this technique. However, if colour information differs greatly due to the perspective shift between the left and the right partial image, then colour distortions may become visible. Therefore, this method is not very suitable for professional applications. On the other hand, it is an easy-to-implement and robust technique for stereoscopic presentations. A stereoscopic output without colour distortions is possible, for example, using the polarisation technique, the shutter technique or via a head-mounted display (see Sect. 11.12). The transmission of stereoscopic images to output devices that work with one of these techniques can in principle take place via only one image per frame, which contains side-by-side sub-images for the left and right channels. For this purpose, the method can be used for output into two side-by-side viewports (see Fig. 11.25). If the parallel transmission of two images per frame is possible, then the partial images for the left and right channels can be transmitted separately. Some very high-quality graphics cards support this function through a stereoscopic output mode using several OpenGL colour buffers. In the OpenGL, up to four colour buffers can be assigned to the default frame buffer (quad buffer), which are referred to by GL_FRONT_LEFT, GL_BACK_LEFT, GL_FRONT_RIGHT and GL_BACK_RIGHT.7 The left colour buffers are used for standard output (monoscopic output). For stereoscopic output, the right colour buffers are added. The front and back colour buffers are intended for the so-called double buffering mechanism, in which the content of the front colour buffer (GL_FRONT) is displayed on the output device. The new frame is built-up in the back colour buffer (GL_BACK) while the current frame in the front colour buffer is displayed. Only when the build in the back buffer is complete, the buffer content of the back colour buffer is transferred to the front colour buffer to be displayed. The OpenGL specification does not require a true buffer swap, only the contents of the back colour buffer must be transferred to the front colour buffer. Using this mechanism, it can be avoided that the image buildup process becomes visible and causes visible interferences. This buffer switching is automated in the JOGL system and does not have to be explicitly triggered. For stereoscopic rendering, the two back colour buffers should be written to and their contents transferred to the respective front colour buffers after buffer switching. In this case, all four colour buffers are used. If four colour buffers are available for the default frame buffer in the GPU used, the stereoscopic output mode with all these colour buffers can be activated in a JOGL system with connection to the windowing system via an object of the type GLCapabilities. This object is passed to the constructor when the GLWindow object (the output window) is created (see Sect. 2.7). The following JOGL source code part enables these colour buffers, if available.
7 Alternative names (aliases) exist for the front buffers and the left buffer buffer, see https://www. khronos.org/opengl/wiki/Default_Framebuffer, retrieved 11.4.2022, 21:10h.
11.12 Spatial Vision and Stereoscopy
2 4 6 8
453
private void displaySeparateImagesToStereoBuffers( GLAutoDrawable drawable, float horizontalCamDistance, float fovy, int viewportWidth, int viewportHeight, float zNear, float zFar) { // Retrieve the OpenGL graphics context GL3 gl = drawable.getGL().getGL3();
10
// Render to left back buffer gl.glDrawBuffer(gl.GL_BACK_LEFT); gl.glClear(GL3.GL_COLOR_BUFFER_BIT | GL3.GL_DEPTH_BUFFER_BIT); setFrustum(fovy, viewportWidth, viewportHeight, zNear, zFar); displaySceneWithOffset(gl, -horizontalCamDistance / 2f, 0f);
12 14 16
// Render to right back buffer gl.glDrawBuffer(gl.GL_BACK_RIGHT); gl.glClear(GL3.GL_COLOR_BUFFER_BIT | GL3.GL_DEPTH_BUFFER_BIT); setFrustum(fovy, viewportWidth, viewportHeight, zNear, zFar); displaySceneWithOffset(gl, horizontalCamDistance / 2f, 0f);
18 20 22 }
Fig. 11.34 Submethod of the display method (Java) for stereoscopic rendering to different colour buffers of the default framebuffer using the toe-in method
GLCapabilities glCapabilities = new GLCapabilities(glProfile); glCapabilities.setStereo(true); GLWindow glWindow = GLWindow.create(glCapabilities);
Figure 11.34 shows a method of a JOGL render to write the images for the left and the right eye into the left and the right back colour buffer respectively according to the toe-in method. The switch to the respective colour buffers takes place using the command glDrawBuffer. Before each render pass of the scene for each of the eyes, the contents of the colour buffer and the depth buffer are deleted. Figure 11.35 shows the output of a stereoscopic projector using the shutter technique on a standard projection screen. The stereoscopic images were generated using the toe-in method. The values horizontalCamDistance = 0.1f, fovy = 45, viewportWidth = 640, viewportHeight = 480, zNear = 0.1f and zFar = 200f were used. Using the shutter technique, the images for the left and right eye are displayed alternately at different times. The photograph in the figure was taken with an exposure time longer than the presentation time for the left and right eye images. Therefore, these images were superimposed in the photograph. Since these partial images are rendered for different viewer positions, they are not identical and the superimposed image appears blurry. By viewing the stereo projection on a screen with suitable special glasses (active shutter glasses), the images for the left and right eye can be recovered and presented to the correct eye (see Sect. 11.12), so that a spatial visual impression is created.
454
11 Special Effects and Virtual Reality
Fig. 11.35 Stereoscopic output generated by an OpenGL renderer using the toe-in method: A photograph of the output of a projector using the shutter technique is shown
The source code for rendering this scene according to the off-axis method can be derived by transferring the mechanism for switching the buffers (see Fig. 11.34) to the method for rendering according to the off-axis method for anaglyph images (see Fig. 11.29).
11.13 Exercises Exercise 11.1 Generate a scene with fog and let an object move into the fog and disappear in the fog. Exercise 11.2 Model a light switch in a scene that when pressed, i.e., when clicked with the mouse, turns on and off an additional light source in the scene. Exercise 11.3 Give a bounding volume in the form of a cuboid and in the form of a sphere for the object in Fig. 11.16. Let the cylinder have a radius of one unit and a height of three units. The cone on top of it is two units high. Exercise 11.4 Write a program in which two cuboids become transparent when colliding. Vary the shape of the bounding volume by alternatively using a sphere, an axis-aligned bounding box (AABB) and an oriented bounding box (OBB). Vary the orientation of the cuboids and contrast the advantages and disadvantages of the different bounding volumes for this application.
References
455
Exercise 11.5 (a) The command gluPerspective takes the parameters fovy, aspect, zNear and zFar. Explain the meaning of these parameters and research www. khronos.org/opengl to find the projection matrix resulting from these parameters for creating a symmetrical frustum. (b) The command glFrustum takes the parameters left, right, bottom, up, nearVal and farVal. Explain the meaning of these parameters and research www.khronos.org/opengl to find the projection matrix resulting from these parameters for the generation of an asymmetric frustum. (c) Compare the projection matrices from (a) and (b) with each other. Then show how the arguments for the call to glFrustum can be derived from the parameters for gluPerspective. What is the significance of these formulae? Exercise 11.6 Section 11.12.4 shows parts of the source code for three stereoscopic output methods using the toe-in method or the off-axis method. (a) Write an OpenGL program that renders a simple scene stereoscopically using side-by-side viewports using the toe-in method (see Fig. 11.25). (b) Modify the program from (a) so that the output is rendered using the off-axis method. (c) Figure 11.29 shows part of the source code for the stereoscopic rendering of a colour anaglyph image using the off-axis method. Write an OpenGL program that replicates this rendering. Then modify the program so that the stereoscopic rendering uses the toe-in method. (d) Figure 11.34 shows part of the source code for stereoscopic rendering to different colour buffers of the default framebuffer according to the toe-in method. Derive a method in pseudo code which renders such stereoscopic output according to the off-axis method. Exercise 11.7 Write an OpenGL program which renders a cube rotating around a sphere. The cube should also rotate itself around its centre of gravity. Render this scene stereoscopically in side-by-side viewports for the left and right eyes. Test the spatial visual impression using the parallel viewing technique. Extend your stereoscopic rendering program for using the colour anaglyph technique and test the spatial visual impression using appropriate colour anaglyph glasses.
References 1. T. Akenine-Möller, E. Haines, N. Hoffman, A. Pesce, M. Iwanicki and S. Hillaire. Real-Time Rendering. 4th edition. Boca Raton, FL: CRC Press, 2018. 2. E. Benjamin, R. Lee and A. Heller. “Is My Decoder Ambisonic?” In: 125th AES Convention. San Francisco, CA, 2008.
456
11 Special Effects and Virtual Reality
3. E. Benjamin, R. Lee and A. Heller. “Localization in Horizontal-Only Ambisonic Systems”. In: 121st AES Convention. San Francisco, 2006. 4. A. J. Berkhout. “A holographic approach to acoustic control”. In: Journal of the Audio Engineering Society 36.12 (Dec. 1988), pp. 977–995. 5. A. Beuthner. “Displays erobern die dritte Dimension”. In: Computer Zeitung 30 (2004), pp. 14–14. 6. J. Blauert. Spatial Hearing: The Psychophysics of Human Sound Localization. Revised. Cambridge: MIT Press, 1997. 7. P. Bourke. “Calculating Stereo Pairs”. retrieved 26.8.2020, 11:30h. url: http://paulbourke.net/ stereographics/stereorender 8. P. Bourke. “Creating correct stereo pairs from any raytracer”. retrieved 26.8.2020, 11:30h. 2001. URL: http://paulbourke.net/stereographics/stereorender. 9. F. Brinkmann, M. Dinakaran, R. Pelzer, J. J. Wohlgemuth, F. Seipel, D. Voss, P. Grosche and S. Weinzierl. The HUTUBS head-related transfer function (HRTF) database. Tech. rep. Abruf 18.10.2020, 21–50h. TU-Berlin, 2019. URL: http://dx.doi.org/10.14279/depositonce-8487. 10. J. Daniel. “Spatial Sound Encoding Including Near Field Effect: Introducing Distance Coding Filters and a Viable, New Ambisonic Format”. In: 23rd International Conference: Signal Processing in Audio Recording and Reproduction. Copenhagen, 2003. 11. DIN 33402-2:2005-12: Ergonomie - Körpermaße des Menschen - Teil 2: Werte. Berlin: Beuth, 2005. 12. Y. Dobashi, K. Kaneda, H. Yamashita, T. Okita and T. Nishita. “A Simple Efficient Method for Realistic Animation of Clouds”. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques. USA: ACM Press/Addison-Wesley Publishing Co., 2000, pp. 19–28. 13. R. Dörner, W. Broll, P. Grimm and B. Jung.Virtual und Augmented Reality (VR/AR): Grundlagen und Methoden der Virtuellen und Augmentierten Realität. 2. Auflage. Berlin: Springer, 2019. 14. D. S. Ebert, F. K. Musgrave, D. Peachey, K. Perlin and S. Worley. Texturing and Modeling: A Procedural Approach. 3rd edition. San Francisco: Elsevier, 2003. 15. L. O. Figura. “Lebensmittelphysik”. In: Taschenbuch für Lebensmittel-chemiker. Ed. by Frede W. Berlin, Heidelberg: Springer, 2004. 16. C. Geiger. “Helft mir, Obi-Wan Kenobi”. In: iX - Magazin für professionelle Informationstechnik (May 2004), pp. 97–102. 17. E. B. Goldstein. Sensation and Perception. 8th edition. Belmont, CA: Wadsworth, 2010. 18. E. Haines and T. Akenine-Möller, eds. Ray Tracing Gems. Berkeley, CA: Apress, 2019. 19. J. Jerald. The VR-Book: Human-Centered Design for Virtual Reality. Morgan & Claypool Publishers-ACM, 2016. 20. P. Kutz, R. Habel, Y. K. Li and J. Novák. “Spectral and Decomposition Tracking for Rendering Heterogeneous Volumes”. In: ACM Trans. Graph. 36.4 (July 2017), Article 111. 21. N. Max. “Optical models for direct volume rendering”. In: IEEE Transactions on Visualization and Computer Graphics 1.2 (June 1995), pp. 99–108. 22. M. Mori, K. F. MacDorman and N. Kageki. “The Uncanny Valley [From the Field]”. In: IEEE Robotics & Automation Magazine 19.2 (June 2012), pp. 98–100. 23. A. Neidhardt and N. Knoop. “Binaural walk-through scenarios with actual self-walking using an HTC Vive”. In: 43. Jahrestagung für Akustik (DAGA). Kiel, 2017, pp. 283–286. 24. G. Palmer. Physics for Game Programmers. Berkeley: Apress, 2005. 25. F. L. Pedrotti, L. S. Pedrotti, W. Bausch and H. Schmidt. Optik für Ingenieure. 3. Auflage. Berlin, Heidelberg: Springer, 2005. 26. M. Pharr, W. Jakob and G. Humphreys. Physically Based Rendering: From Theory To Implementation. 3rd edition. Cambridge, MA: Morgan Kaufmann, 2017.
References
457
27. A. Plinge, S. Schlecht, O. Thiergart, T. Robotham, O. Rummukainen and E. Habets. “Sixdegrees-of-freedom binaural audio reproduction of first-order ambisonics with distance information”. In: 2018 AES International Conference on Audio for Virtual and Augmented Reality. 2018. 28. W. T. Reeves. “Particle Systems - A Technique for Modelling a Class of Fuzzy Objects”. In: ACM Trans. Graph. 2.2 (1983), pp. 91–108. 29. W. T. Reeves and R. Blau. “Approximate and Probabilistic Algorithms for Shading and Rendering Particle Systems”. In: SIGGRAPH Comput Graph. 19.3 (1985), pp. 313–322. 30. C. W. Reynolds. “Flocks, Herds and Schools: A Distributed Behavioral Model”. In: SIGGRAPH Comput. Graph. 21.4 (1987), pp. 25–34. 31. R. J. Rost and B. Licea-Kane. OpenGL Shading Language. 3rd edition. Upper Saddle River, NJ [u. a.]: Addison-Wesley, 2010. 32. M. Segal and K. Akeley. The OpenGL Graphics System: A Specification (Version 4.6 (Compatibility Profile) - October 22 2019. Abgerufen 8.2.2021. The Khronos Group Inc, 2019. URL: https://www.khronos.org/registry/OpenGL/specs/gl/glspec46.compatibility.pdf. 33. G. Sellers, S. Wright and N. Haemel. OpenGL SuperBible. 7th edition. New York: AddisonWesley, 2016. 34. P. Stade. Perzeptiv motivierte parametrische Synthese binauraler Raumimpulsantworten. Dissertation. Abgerufen 17.10.2020. TU-Berlin, 2018. URL: https://depositonce.tu-berlin.de/ handle/11303/8119. 35. A. Sullivan. “3-Deep: new displays render images you can almost reach out and touch”. In: IEEE Spectrum 42.4 (2005), pp. 30–35. 36. M. Vorländer. Auralization: Fundamentals of Acoustics and Modelling, Simulation, Algorithms and Acoustic Virtual Reality. Berlin: Springer, 2008. 37. M. Wacker, M. Keckeisen, S. Kimmerle, W. Straßer, V. Luckas, C.Groß, A. Fuhrmann, M. Sattler, R. Sarlette and R. Klein. “Virtual Try-On: Virtuelle Textilien in der Graphischen Datenverarbeitung”. In: Informatik Spektrum 27 (2004), pp. 504–511. 38. F. Zotter and M. Frank. Ambisonics: A Practical 3D Audio Theory for Recording, Stereo Production, Sound Reinforcement and Virtual Reality. Cham, CH: Springer Open, 2018.
A
Web References
Supplementary material online: A web reference to supplementary material can be found in the footer of the first page of a chapter in this book (Supplementary Information). The supplementary material is also available via SpingerLink: https:// link.springer.com OpenGL: Further information on OpenGL is available on the official OpenGL web pages: Khronos Group: www.khronos.org OpenGL homepage: www.opengl.org OpenGL specifications: www.khronos.org/registry/OpenGL/index_gl.php Here you will also find a web reference to known OpenGL extensions. Java bindings: The two commonly used Java bindings to the OpenGL can be found at the following web references: Java OpenGL (JOGL): www.jogamp.org Lightweight Java Game Library (LWJGL): www.lwjgl.org 3D modelling: In addition to numerous professional CAD modelling tools in the design field, the following two programs for modelling and animating threedimensional objects are very widespread in the areas of computer games development, animation and film creation: Autodesk Maya (Commercial License): www.khronos.org Blender (Free and Open Source License): www.blender.org
© Springer Nature Switzerland AG 2023 K. Lehn et al., Introduction to Computer Graphics, Undergraduate Topics in Computer Science, https://doi.org/10.1007/978-3-031-28135-8
459
Index
A A-buffer algorithm, 273 Abstract Windowing Toolkit (AWT), 34 Accommodation-vergence conflict, 439 Accumulation buffer, 273 Active shutter technique, 437 Adobe RGB colour space, 204 Algorithm of Brons, 227 Aliasing effect, 8, 25, 211, 213, 254 Moiré pattern, 257 Alpha blending, 347 Alpha mapping, 369 Ambient light, 307 Ambisonics, 424, 430 Anaglyph technique, 436 Anchor, 360 Animation, 138, 152 Antialiasing, 25, 211, 254, 259 A-buffer algorithm, 273 accumulation buffer, 273 coverage sample, 276 multisample, 25, 259, 274, 278 supersampling, 270 Approximation, 116 Area sampling, 260 Area subdivision method, 297 Associated data, 22, 25, 215, 250 Atmospheric attenuation, 308 Attribute, 41, 50 Augmented reality (AR), 3 Auralisation, 424 Averaging filter, 270 Axis-aligned bounding box, 409, 419 axis-aligned bounding box, 66
B B-spline, 119 Bézier curve, 118 Bézier point, 118 inner, 119 Bézier surface, 31, 122 Backface culling, 24, 76, 78, 98, 295 Barycentric coordinates, 208, 215, 250 Base primitives, 24, 212 Basic geometric object, 63 area, 64 closed polyline, 64 curve, 64 line, 64 point, 64 polygon, 64 polyline, 64 Bernstein polynomial, 117 Binaural hearing, 426 Binocular cues, 432 Binocular depth cues, 432 Binomial, 271 Bitmask, 234 Black-and-white image, 193 Bounding sphere, 409, 419 Bounding volume, 408, 419 Boxcar filter, 270 Bresenham algorithm for circles, 228 for lines, 224 Brons, algorithm of, 227 Buffer object, 50 frame-, 29 index, 82
© Springer Nature Switzerland AG 2023 K. Lehn et al., Introduction to Computer Graphics, Undergraduate Topics in Computer Science, https://doi.org/10.1007/978-3-031-28135-8
461
462 vertex, 50, 81 Bump mapping, 371 C C for Graphics (Cg), 40 Calibrated colour space, 202 Camera coordinate system, 182 Central processing unit (CPU), 7 Central projection, 8 Centroid interpolation, 275, 281 Centroid sampling, 275, 281 CIE chromaticity diagram, 202 Clipmap, 114 Clipping, 6, 8, 23 three-dimensional, 6, 165 Clipping plane far, 6 near, 6, 441 Clipping volume, 165 Closed polyline, 64 CMY colour model, 198 CMYK colour model, 198 Cohen–Sutherland clipping algorithm, 289 Collision handling, 421 Collison detection, 419 Colorimetric colour space, 202 Colour anaglyph technique, 436 Colour gamut, 202 Colour index mode, 204 Colour model, 196 additive, 197 CMY, 198 CMYK, 198 HLS, 200 HSV, 200 perception-oriented, 201 RGB, 197, 266 sRGB, 27 standard RGB, 27 subtractive, 197 YIQ, 199 YUV, 199 Colour naming system (CNS), 201 Colour picking, 408 Colour space, 196 Adobe RGB, 204 calibrated, 202 CIE L*a*b*, 202 CIELab, 202 CIEXYZ, 202 colorimetric, 202
Index sRGB, 202, 205 standard RGB, 202 Colour sum, 26 Compatibility profile, 20, 43 Compute shader, 32 Computer graphics pipeline, 6 Computer-aided design (CAD), 2 Computer-aided manufacturing (CAM), 2 Constant shading, 23, 25, 42, 254 Constant specular reflection coefficient, 316 Control points, 116 Controllability (of the curve), 116 Convex, 64 Convex combination, 140 Convolution matrix, 271 Core profile, 20, 43 Coverage mask, 274 Coverage sample antialiasing (CSAA), 276 Coverage value, 25, 262 CPU (central processing unit), 7 Cross product, 125 Cross-eyed viewing, 436 CSG scheme, 102 Culling, 28, 75 backface, 24, 76, 78, 98 frontface, 24, 76 hidden face, 6 Cyrus–Beck clipping algorithm, 291 D Data visualisation, 2 Default framebuffer, 29 Deferred shading, 320 Depth algorithm, 297 Depth buffer, 28, 408, 415, 444 Depth buffer algorithm, 269, 282 Depth fog, 393 Device coordinates, 23 Difference of sets, 66 Diffuse reflection, 312 discrete oriented polytope, 409 Displacement mapping, 32, 377 Display list, 59, 81 Distance fog, 393 Dither matrix, 195 Dolby Atmos, 432 Double buffering, 452 Downsampling, 259, 270 E Early fragment test, 28
Index Environment mapping, 30, 369 Eulerian angles, 157 Even–odd rule, 242 Exponential fog, 393 Extinction, 392 Extrinsic view, 158 F Far clipping plane, 6 Far plane, 6 Field of view, 96 Filter kernel, 271 Fixed-function pipeline, 19, 21 Flat shading, 23, 25, 42, 254 Fog, 391 exponential, 393 linear, 392 Folding frequency, 254 Form parameter, 120 Fragment, 8, 24, 215, 262 Fragment shader, 21, 29 Framebuffer, 29 Framebuffer object, 29 Freeform surface, 115, 122 Frontface culling, 24, 76 Frustum, 4, 441 G Gamut, 202 Geometric primitive, 21, 23, 63, 67 Geometric transformation (2D), 129 Geometrical acoustics, 425 Geometrical optics, 424 Geometry shader, 30 Gloss map, 369 Gouraud shading, 25, 254, 339 GPU (Graphics Processing Unit), 7, 17 Graphics output primitive, 63 Graphics pipeline, 6 Graphics Processing Unit (GPU), 7, 17 Graphics processor, 8 Greyscale image, 193 H Halftoning, 194 Head related transfer functions, 427 Head-mounted display, 436, 438 Heightmap, 373 Hidden face culling, 6 High-Level Shading Language (HLSL), 40 Holographic techniques, 438
463 Homogeneous coordinate, 133, 143 Hue, 196 I IBO (index buffer object), 82 Image-space method, 294, 296 Immediate mode, 57 Immersion, 390 Index buffer object (IBO), 82 Information visualisation, 2 Inside test, 249 Intensity (colour), 196 Interaural level difference (ILD), 426 Interaural time difference (ITD), 426 Interpolation, 116 Interpupillary distance, 441 Intersection of sets, 66 Intrinsic view, 158 Irradiance mapping, 349 J Java OpenAL (JOAL), 431 Java OpenGL (JOGL), 9, 15, 19, 33 K k-DOP, 409 Kernel, 271 Knot, 119 L Lenticular display, 438 Level of detail (LOD), 30, 32, 104, 114 Lighting attenuation, 307 directional, 307 parallel incident, 307 phong, 309 Lightmap, 369 Lightmaps, 352 Lightness, 196 Lightweight Java Game Library (LWJGL), 19 Line style, 234 Line thickness, 238 Linear fog, 392 Local lighting, 306 Locality principle (of curves), 116 LOD (level of detail), 30, 32, 104, 114 Low-pass filter, 259 M Magnification, 367
464 Matrix stack, 154 Midpoint algorithm for circles, 228 for lines, 224 Minification, 367 Mipmap, 249, 275, 368 Mipmap level, 368 Mirror reflection, 314 Model coordinate system, 182 Model view matrix, 22 Moiré pattern, 257 Monocular cues, 432 Monocular depth cues, 432 Motion platform, 391 Motion sickness, 391 Moving pen technique, 239 Multisample antialiasing (MSAA), 25, 259, 274, 278 N Near clipping plane, 6, 441 Near plane, 6 Non-uniform rational B-splines (NURBS), 31, 120 Normal mapping, 373 Normal vector, 186 Normalised device coordinates, 23 NURBS (non-uniform rational B-splines, 31, 120 Nyquist frequency, 254 O Object-space method, 296 Octree, 99 Odd parity rule, 242 Off-axis method, 441 Opacity, 346 Opacity coefficient, 346 Open Audio Library (OpenAL), 431 Open Computing Language (OpenCL), 33, 48 Open Graphics Library (OpenGL), 1, 7, 9, 15 OpenGL Extension, 43 OpenGL for Embedded Systems (OpenGL ES), 18 OpenGL graphics pipeline, 21 fixed-function, 19, 21 programmable, 19, 21 OpenGL Shading Language (GLSL), 11, 40, 46 OpenGL Utility Library (GLU), 37 Oriented bounding box, 409, 419
Index Overshading, 273 P Parallax divergent, 440 negative, 440 positive, 440 Parallax barrier, 438 Parallel projection, 8, 168 Parallel viewing, 436 Particle systems, 398 Patch primitive, 31 Path tracing, 392 Perspective division, 23 Perspective projection, 168 Phong shading, 25, 254 Physics engine, 400 Picking, 407, 413 Pixel, 215 Pixel replication, 238 Point, 129 Point sampling, 260 Polarisation technique, 436 Polygon, 64 Polygon mesh, 67 Polyline, 64 Post-filtering, 259 Pre-filtering, 259 Presence, 3, 390 Primitive, 21, 23, 63, 67 Primitive restart, 88 Priority algorithms, 301 Profile, 19 compatibility, 20, 43 core, 20, 43 Programmable pipeline, 19, 21 Projection, 168 Projection center, 168 Projection plane, 168 Provoking vertex, 23, 254 Pupillary distance, 441 Q Quad, 368 Quadtree, 100 Quaternion, 161 R Radiosity model, 349 Raster graphics, 213 Raster-oriented graphics, 213
Index Rasterisation, 8, 24, 211, 213 Ray acoustics, 425 Ray casting, 408 Ray optics, 424 Raytracing, 355 Recursive division algorithms, 297 Reflection mapping, 369 Rendering, 6 Rendering pipeline, 6 Retinal projector, 438 Retinal scan display, 438 RGB colour model, 197, 266 RGBA colour value, 21, 204, 266 Right-handed coordinate system, 143 Rotation, 130, 144, 156 S Sample, 25, 383 Sampling theorem, 254 saturation (colour), 196 Scalar product, 130 Scaling, 130, 144 Scan conversion, 8, 24, 213 Scan line technique, 243 Scene graph, 15, 150 Shader, 9, 11, 17, 19 Shading, 305 constant, 23, 25, 42, 254, 334 flat, 23, 25, 42, 254, 334 Gouraud, 25, 254 normal, 373 Phong, 25, 254, 341 smooth, 254 Shadow, 344 Shadow map, 30, 369 Shear, 130, 131 Shutter technique, 437 Side-by-side, 436, 444 Skeleton, 406 Slab, 410 Smooth shading, 254 Smoothing operator, 205 Smoothness (of curves), 116 Spatial frequency, 255 Spatial partitions, 296 Specular reflection, 314 Specular reflection exponent, 316 SPIR-V, 48 Spotlighting, 308 SRGB colour space (standard RGB colour space), 202
465 Standard Portable Intermediate Representation (SPIR), 48 Standard RGB colour model (sRGB colour model), 27 Standard RGB colour space (sRGB colour space), 202 Stepwise refinement, 351 Stereo imaging, 435 Stereoscope, 436 Stereoscopics, 435 Stereoscopy, 391, 432, 435 Structural algorithm, 226 Subdivision surfaces, 32 Supersampling, 354 Supersampling antialiasing (SSAA), 25, 270 Swarm behaviour, 400 Sweep representation, 102 Swing, 34 Swizzling, 45 Symmetric difference of sets, 66 T TBN matrix, 377 Tessellation, 8, 97 Tessellation control shader, 31 Tessellation evaluation shader, 31 Tessellation unit, 384 Texture, 9, 26, 359 Texture data (texel data), 42, 43 Texture mapping, 42, 360 displacement, 32 environment, 30 Textures, applications shadow map, 30 Toe-in method, 440 Top-left rule, 249 Transform feedback, 31 Transformation group, 150 Transformation matrix, 134 Translation, 130, 132, 143 Translucency, 346 Transparency, 346 filtered, 346 interpolated, 346 Triangulation, 206 True colour, 198 Two-pass depth buffer algorithm, 345 U Uncanny valley, 390 Uniforms, 42, 54
466 Union of sets, 66 Unweighted area sampling, 261 Up vector, 39, 53 Upsampling, 259, 269 V Vanishing point, 172 VAO (vertex array object), 50 VBO (vertex buffer object), 50, 81 Vector, 129 Vector graphics, 211 Vector-oriented graphics, 211 Vertex, 8, 21 Vertex array, 59, 81 Vertex array object (VAO), 50 Vertex attribute, 50 Vertex buffer object (VBO), 50, 81 Vertex buffer objects (VBOs), 50 Vertex shader, 21, 29 Vertex stream, 67 Vertices, 8, 21 View reference coordinate system (VRC), 182 View transformation, 22 Viewing pipeline, 22, 23, 180 Viewport, 23, 136 Viewport transformation, 23
Index Virtual reality (VR), 3 Virtual retinal display, 438 Visibility, 6 Visual analytics, 2 Visual computing, 1 Volume rendering, 392 Voxels, 99 VR sickness, 391 Vulkan, 16, 19, 48 W Warn model, 308 Wave field synthesis, 430 Web Graphics Library (WebGL), 18 Weighted area sampling, 264 Winding order, 24, 75 Window coordinates, 23 World coordinates, 136 Write masking, 45 Z z-buffer, 28 Z-buffer algorithm, 297 Z-fighting, 177 Zero parallax, 440