895 55 3MB
English Pages 500 [592] Year 2018
Lambdas, streams, functional and reactive programming
IN ACTION Raoul-Gabriel Urma Mario Fusco Alan Mycroft
MANNING
Praise for the previous edition, Java 8 in Action, by Raoul-Gabriel Urma, Mario Fusco, and Alan Mycroft. A great and concise guide to what’s new in Java 8, with plenty of examples to get you going in a hurry. —Jason Lee, Oracle The best guide to Java 8 that will ever be written! —William Wheeler, ProData Computer Systems The new Streams API and lambda examples are especially useful. —Steve Rogers, CGTek, Inc. A must-have to get functional with Java 8. —Mayur S. Patil, MIT Academy of Engineering Helpful as a concise, practice-oriented guide to the exciting new features of Java 8. Functional interfaces and spliterators, oh my! —Will Hayworth, Developer, Atlassian
Modern Java in Action LAMBDAS, STREAMS, FUNCTIONAL AND REACTIVE PROGRAMMING RAOUL-GABRIEL URMA, MARIO FUSCO, AND ALAN MYCROFT
MANNING SHELTER ISLAND
For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: [email protected] ©2019 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine.
Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964
Development editor: Technical development editor: Review editor: Project manager: Copy editors: Proofreader: Technical proofreader: Typesetter: Cover designer:
ISBN: 9781617293566 Printed in the United States of America 1 2 3 4 5 6 7 8 9 10 – DP – 23 22 21 20 19 18
Kevin Harreld Dennis Sellinger Aleksandar Dragosavljevic´ Deirdre Hiam Heidi Ward and Kathy Simpson Carol Shields Jean-François Morin Dennis Dalinnik Marija Tudor
brief contents PART 1
FUNDAMENTALS ............................................................1 1
■
Java 8, 9, 10, and 11: what’s happening? 3
2
■
Passing code with behavior parameterization 26
3
■
Lambda expressions 42
PART 2 FUNCTIONAL-STYLE DATA PROCESSING WITH STREAMS ... 79
PART 3
4
■
Introducing streams 81
5
■
Working with streams 98
6
■
Collecting data with streams 134
7
■
Parallel data processing and performance 172
EFFECTIVE PROGRAMMING WITH STREAMS AND LAMBDAS ............................................................199 8
■
Collection API enhancements 201
9
■
Refactoring, testing, and debugging
10
■
Domain-specific languages using lambdas 239
v
216
vi
PART 4
PART 5
PART 6
BRIEF CONTENTS
EVERYDAY JAVA .........................................................273 11
■
Using Optional as a better alternative to null
12
■
New Date and Time API 297
13
■
Default methods 314
14
■
The Java Module System 333
275
ENHANCED JAVA CONCURRENCY ................................355 15
■
Concepts behind CompletableFuture and reactive programming 357
16
■
CompletableFuture: composable asynchronous programming 387
17
■
Reactive programming 416
FUNCTIONAL PROGRAMMING AND FUTURE JAVA EVOLUTION ...............................................................443 18
■
Thinking functionally 445
19
■
Functional programming techniques 460
20
■
Blending OOP and FP: Comparing Java and Scala 485
21
■
Conclusions and where next for Java 500
contents preface xix acknowledgments xxi about this book xxiii about the authors xxviii about the cover illustration xxx
PART 1 FUNDAMENTALS ..................................................1
1
Java 8, 9, 10, and 11: what’s happening? 3 1.1 1.2
So, what’s the big story? 3 Why is Java still changing? 6 Java’s place in the programming language ecosystem 6 Stream processing 8 Passing code to methods with behavior parameterization 9 Parallelism and shared mutable data 10 Java needs to evolve 11 ■
■
1.3
Functions in Java 12 Methods and lambdas as first-class citizens 12 Passing code: an example 14 From passing methods to lambdas 16 ■
■
1.4
Streams 17 Multithreading is difficult 19
vii
CONTENTS
viii
1.5 1.6
2
Default methods and Java modules 21 Other good ideas from functional programming 23
Passing code with behavior parameterization 26 2.1
Coping with changing requirements 27 First attempt: filtering green apples 28 Second attempt: parameterizing the color 28 Third attempt: filtering with every attribute you can think of 29 ■
■
2.2
Behavior parameterization 30 Fourth attempt: filtering by abstract criteria
2.3
31
Tackling verbosity 35 Anonymous classes 36 Fifth attempt: using an anonymous class 36 Sixth attempt: using a lambda expression 37 Seventh attempt: abstracting over List type 38 ■
■
2.4
Real-world examples 39 Sorting with a Comparator 39 Executing a block of code with Runnable 40 Returning a result using Callable 40 GUI event handling 41 ■
■
3
Lambda expressions 42 3.1 3.2
Lambdas in a nutshell 43 Where and how to use lambdas 46 Functional interface
3.3
46
■
Function descriptor
48
Putting lambdas into practice: the execute-around pattern 50 Step 1: Remember behavior parameterization 51 Step 2: Use a functional interface to pass behaviors 51 Step 3: Execute a behavior! 52 Step 4: Pass lambdas 52 ■
■
■
3.4
Using functional interfaces 53 Predicate 54
3.5
■
Consumer
54
■
Function
55
Type checking, type inference, and restrictions 59 Type checking 59 Same lambda, different functional interfaces 61 Type inference 63 Using local variables 63 ■
■
3.6
Method references 64 In a nutshell
3.7
■
65
■
Constructor references 68
Putting lambdas and method references into practice 70 Step 1: Pass code 71 Step 2: Use an anonymous class 71 Step 3: Use lambda expressions 71 Step 4: Use method references 72 ■
■
CONTENTS
3.8
ix
Useful methods to compose lambda expressions 72 Composing Comparators 73 Composing Functions 74
3.9
Composing Predicates 73
■
Similar ideas from mathematics 76 Integration
76
■
Connecting to Java 8 lambdas 77
PART 2 FUNCTIONAL-STYLE DATA PROCESSING WITH STREAMS ..................................................79
4
Introducing streams 81 4.1 4.2 4.3
What are streams? 82 Getting started with streams 86 Streams vs. collections 88 Traversable only once 90
4.4
■
Stream operations 93 Intermediate operations 94 Working with streams 95
4.5
5
External vs. internal iteration 91 Terminal operations
■
95
Road map 96
Working with streams 98 5.1
Filtering 99 Filtering with a predicate 99
5.2
100
Slicing a stream 100 Slicing using a predicate 101 Skipping elements 103
5.3
Filtering unique elements
■
Mapping
■
Truncating a stream 102
104
Applying a function to each element of a stream 104 Flattening streams 105
5.4
Finding and matching 108 Checking to see if a predicate matches at least one element 108 Checking to see if a predicate matches all elements 109 Finding an element 109 Finding the first element 110 ■
5.5
Reducing
111
Summing the elements
5.6
111
Putting it all into practice
■
Maximum and minimum 113
117
The domain: Traders and Transactions
117
■
Solutions
118
CONTENTS
x
5.7
Numeric streams 121 Primitive stream specializations 121 Numeric ranges 123 Putting numerical streams into practice: Pythagorean triples 123 ■
5.8
Building streams 126 Streams from values 126 Stream from nullable 126 Streams from arrays 127 Streams from files 127 Streams from functions: creating infinite streams! 128 ■
■
5.9
6
Overview
132
Collecting data with streams 134 6.1
Collectors in a nutshell 136 Collectors as advanced reductions collectors 137
6.2
Reducing and summarizing
136
■
Predefined
138
Finding maximum and minimum in a stream of values 138 Summarization 139 Joining Strings 140 Generalized summarization with reduction 141 ■
6.3
■
Grouping 146 Manipulating grouped elements 147 Collecting data in subgroups 150
6.4
Partitioning
Multilevel grouping 149
154
Advantages of partitioning 155 prime and nonprime 156
6.5
■
■
Partitioning numbers into
The Collector interface 159 Making sense of the methods declared by Collector interface Putting them all together 163
6.6
Developing your own collector for better performance 165 Divide only by prime numbers 166 performances 170
7
160
■
Comparing collectors’
Parallel data processing and performance 172 7.1
Parallel streams 173 Turning a sequential stream into a parallel one 174 Measuring stream performance 176 Using parallel streams correctly 180 Using parallel streams effectively 182 ■
■
CONTENTS
7.2
xi
The fork/join framework 184 Working with RecursiveTask 184 Best practices for using the fork/join framework 188 Work stealing 189 ■
■
7.3
Spliterator 190 The splitting process 191 Spliterator 192
Implementing your own
■
PART 3 EFFECTIVE PROGRAMMING WITH STREAMS AND LAMBDAS..................................................199
8
Collection API enhancements 201 8.1
Collection factories 202 List factory 203
8.2
Set factory
204
■
Map factories
204
Working with List and Set 205 removeIf 205
8.3
■
replaceAll
■
206
Working with Map 207 forEach 207 Sorting 208 getOrDefault 208 Compute patterns 209 Remove patterns 210 Replacement patterns 211 Merge 211 ■
■
■
■
8.4
Improved ConcurrentHashMap 213 Reduce and Search 213
9
■
Counting 214
Set views 214
■
Refactoring, testing, and debugging 216 9.1
Refactoring for improved readability and flexibility 217 Improving code readability 217 From anonymous classes to lambda expressions 217 From lambda expressions to method references 219 From imperative data processing to Streams 220 Improving code flexibility 221 ■
■
■
9.2
Refactoring object-oriented design patterns with lambdas 223 Strategy 224 Template method 225 Observer 226 Chain of responsibility 229 Factory 230 ■
■
■
9.3
Testing lambdas 232 Testing the behavior of a visible lambda 232 Focusing on the behavior of the method using a lambda 233 Pulling complex lambdas into separate methods 234 Testing high-order functions 234 ■
■
■
CONTENTS
xii
9.4
Debugging 234 Examining the stack trace
10
235
■
Logging information
236
Domain-specific languages using lambdas 239 10.1
A specific language for your domain 241 Pros and cons of DSLs 242 on the JVM 244
10.2
Different DSL solutions available
■
Small DSLs in modern Java APIs 248 The Stream API seen as a DSL to manipulate collections 249 Collectors as a DSL to aggregate data 250
10.3
Patterns and techniques to create DSLs in Java 252 Method chaining 255 Using nested functions 257 Function sequencing with lambda expressions 259 Putting it all together 261 Using method references in a DSL 263 ■
■
10.4
Real World Java 8 DSL 266 jOOQ 266
■
Cucumber
267
■
Spring Integration 269
PART 4 EVERYDAY JAVA ...............................................273
11
Using Optional as a better alternative to null 275 11.1
How do you model the absence of a value?
276
Reducing NullPointerExceptions with defensive checking 277 Problems with null 278 What are the alternatives to null in other languages? 279 ■
11.2 11.3
Introducing the Optional class 280 Patterns for adopting Optionals 281 Creating Optional objects 281 Extracting and transforming values from Optionals with map 282 Chaining Optional objects with flatMap 283 Manipulating a stream of optionals 287 Default actions and unwrapping an Optional 288 Combining two Optionals 289 Rejecting certain values with filter 290 ■
■
■
■
■
11.4
Practical examples of using Optional
292
Wrapping a potentially null value in an Optional 292 Exceptions vs. Optional 293 Primitive optionals and why you shouldn’t use them 294 Putting it all together 294 ■
■
CONTENTS
12
xiii
New Date and Time API 297 12.1
LocalDate, LocalTime, LocalDateTime, Instant, Duration, and Period 298 Working with LocalDate and LocalTime 299 Combining a date and a time 300 Instant: a date and time for machines 301 Defining a Duration or a Period 301 ■
■
12.2
Manipulating, parsing, and formatting dates 303 Working with TemporalAdjusters 305 date-time objects 308
12.3
■
Printing and parsing
Working with different time zones and calendars 310 Using time zones 310 Fixed offset from UTC/Greenwich Using alternative calendar systems 311 ■
13
311
Default methods 314 13.1
Evolving APIs 317 API version 1 317
13.2 13.3
■
API version 2
Default methods in a nutshell 320 Usage patterns for default methods 322 Optional methods 322
13.4
318
■
Multiple inheritance of behavior 323
Resolution rules 326 Three resolution rules to know 327 Most specific defaultproviding interface wins 327 Conflicts and explicit disambiguation 329 Diamond problem 330 ■
■
■
14
The Java Module System 333 14.1
The driving force: reasoning about software Separation of concerns 334 Java software 335
14.2
Information hiding
334 334
Why the Java Module System was designed 336 Modularity limitations 336 Comparison with OSGi 338
14.3 14.4
■
■
Monolithic JDK 337
Java modules: the big picture 339 Developing an application with the Java Module System 340 Setting up an application 340 Fine-grained and coarse-grained modularization 342 Java Module System basics 342 ■
■
CONTENTS
xiv
14.5
Working with several modules 343 The exports clause 344 Naming 345
14.6 14.7 14.8
■
The requires clause
344
Compiling and packaging 345 Automatic modules 349 Module declaration and clauses 350 requires 350 exports 350 requires transitive 351 exports to 351 open and opens 351 uses and provides 352 ■
■
■
14.9
■
A bigger example and where to learn more 352
PART 5 ENHANCED JAVA CONCURRENCY ......................355
15
Concepts behind CompletableFuture and reactive programming 357 15.1
Evolving Java support for expressing concurrency 360 Threads and higher-level abstractions 361 Executors and thread pools 362 Other abstractions of threads: non-nested with method calls 364 What do you want from threads? 366 ■
■
■
15.2
Synchronous and asynchronous APIs 366 Future-style API 368 Reactive-style API 369 Sleeping (and other blocking operations) considered harmful 370 Reality check 372 How do exceptions work with asynchronous APIs? 372 ■
■
■
15.3 15.4 15.5
The box-and-channel model 373 CompletableFuture and combinators for concurrency 375 Publish-subscribe and reactive programming Example use for summing two flows 380 A simple form of real backpressure 384
15.6 15.7
16
■
378
Backpressure 384
Reactive systems vs. reactive programming Road map 386
385
CompletableFuture: composable asynchronous programming 387 16.1
Simple use of Futures 388 Understanding Futures and their limitations 389 Using CompletableFutures to build an asynchronous application 390 ■
CONTENTS
16.2
xv
Implementing an asynchronous API 391 Converting a synchronous method into an asynchronous one 392 Dealing with errors 394
16.3
Making your code nonblocking
396
Parallelizing requests using a parallel Stream 397 Making asynchronous requests with CompletableFutures 397 Looking for the solution that scales better 399 Using a custom Executor 400 ■
16.4
Pipelining asynchronous tasks 402 Implementing a discount service 403 Using the Discount service 404 Composing synchronous and asynchronous operations 405 Combining two CompletableFutures: dependent and independent 408 Reflecting on Future vs. CompletableFuture 409 Using timeouts effectively 410 ■
■
■
■
■
16.5
Reacting to a CompletableFuture completion 411 Refactoring the best-price-finder application Putting it all together 414
16.6
17
412
Road map 414
Reactive programming 416 17.1
The Reactive Manifesto
417
Reactive at application level 418
17.2
■
Reactive at system level
420
Reactive streams and the Flow API 421 Introducing the Flow class 421 Creating your first reactive application 424 Transforming data with a Processor 429 Why doesn’t Java provide an implementation of the Flow API? 431 ■
■
17.3
Using the reactive library RxJava 431 Creating and using an Observable 433 combining Observables 437
■
Transforming and
PART 6 FUNCTIONAL PROGRAMMING AND FUTURE JAVA EVOLUTION .............................................443
18
Thinking functionally 445 18.1
Implementing and maintaining systems 446 Shared mutable data 446 Declarative programming Why functional programming? 448 ■
447
CONTENTS
xvi
18.2
What’s functional programming?
449
Functional-style Java 450 Referential transparency 452 Object-oriented vs. functional-style programming 452 Functional style in practice 453 ■
18.3
19
Recursion vs. iteration 455
Functional programming techniques 460 19.1
Functions everywhere 461 Higher-order functions
19.2
461
■
Currying
463
Persistent data structures 464 Destructive updates vs. functional 464 Another example with Trees 467 Using a functional approach 468 ■
■
19.3
Lazy evaluation with streams 469 Self-defining stream 470
19.4
Your own lazy list 472
Pattern matching 476 Visitor design pattern
19.5
■
477
■
Pattern matching to the rescue 478
Miscellany 481 Caching or memoization 481 What does “Return the same object” mean? 482 Combinators 483 ■
■
20
Blending OOP and FP: Comparing Java and Scala 485 20.1
Introduction to Scala 486 Hello beer 486 Basic data structures: List, Set, Map, Tuple, Stream, Option 488 ■
20.2
Functions 493 First-class functions in Scala 493 Anonymous functions and closures 494 Currying 496 ■
■
20.3
Classes and traits 497 Less verbosity with Scala classes 497 interfaces 498
21
■
Scala traits vs. Java
Conclusions and where next for Java 500 21.1
Review of Java 8 features 501 Behavior parameterization (lambdas and method references) 501 Streams 502 CompletableFuture 502 Optional 503 Flow API 503 Default methods 504 ■
■
■
21.2 21.3
The Java 9 module system 504 Java 10 local variable type inference
505
CONTENTS
21.4
What’s ahead for Java?
xvii
507
Declaration-site variance 507 Pattern matching Richer forms of generics 508 Deeper support for immutability 510 Value types 511 ■
507
■
■
21.5 21.6 appendix A appendix B appendix C appendix D
Moving Java forward faster The final word 515
514
Miscellaneous language updates 517 Miscellaneous library updates 521 Performing multiple operations in parallel on a stream 529 Lambdas and JVM bytecode 538 index
543
preface Back in 1998 when I was eight years old, I picked up my first book on computing—on JavaScript and HTML. Little did I know that opening that book would transform my life by exposing me to programming languages and the amazing things I could do with them. I was hooked. Every so often, I still find a new programming language feature that revives this excitement because it enables me to write clearer, more concise code in half the time. I hope that the new ideas in Java 8, Java 9, and Java 10, incorporated from functional programming and explored in this book, inspire you in the same way. So, you may wonder, how did this book—and its second edition—come about? Well, back in 2011, Brian Goetz (the Java Language Architect at Oracle) was sharing various proposals to add lambda expressions to Java, with the aim of getting the community involved. These rekindled my excitement, and I started to evangelize the ideas, organizing Java 8 workshops at various developer conferences and giving lectures to students at the University of Cambridge. By April 2013, word had spread, and our publisher at Manning emailed asking whether I was interested in writing a book about lambdas in Java 8. At the time I was a “humble” second-year PhD candidate, and that seemed to be a bad idea because it would interfere with my thesis submission schedule. On the other hand, carpe diem. I thought writing a short book shouldn’t be too much work, right? (It was only later that I realized I was utterly wrong!) So, I sought advice from my PhD supervisor, Professor Alan Mycroft, who, it turned out, was ready to support me in this adventure (even offering to help in such non-PhD work—I’m forever in his debt). A few days later, we
xix
xx
PREFACE
met fellow Java 8 evangelist Mario Fusco, who had vast professional experience and had become well known at major developer conferences for his talks on functional programming. We quickly realized that by combining our energy and diverse backgrounds we could deliver, not just a short book on Java 8 lambdas, but instead a book that, we hope, the Java community will still be reading in five or ten years. We had a unique opportunity to discuss many topics in depth that will benefit Java programmers and open doors to a new universe of ideas: functional programming. Now, it’s 2018, and we find that the first edition amazingly sold 20,000 copies, Java 9 has just been released, Java 10 is about to be released, and time has dulled the memory of many long nights of editing. So, here it is—the second edition Modern Java in Action, covering Java 8, Java 9, and Java 10. We hope you will enjoy it! RAOUL-GABRIEL URMA CAMBRIDGE SPARK
acknowledgments This book would not have been possible without the support of many amazing people: ■
■
■
■
■
Personal friends and people who provided valuable reviews and suggestions on a volunteer basis: Richard Walker, Jan Saganowski, Brian Goetz, Stuart Marks, Cem Redif, Paul Sandoz, Stephen Colebourne, Íñigo Mediavilla, Allahbaksh Asadullah, Tomasz Nurkiewicz, and Michael Müller Our Manning Early Access Program (MEAP) readers who posted comments in the Author Online forum The reviewers from the development process who provided helpful feedback: Antonio Magnaghi, Brent Stains, Franziska Meyer, Furkan Kamachi, Jason Lee, Jörn Dinkla, Lochana Menikarachchi, Mayur Patil, Nikolaos Kaintantzis, Simone Bordet, Steve Rogers, Will Hayworth, and William Wheeler Kevin Harreld, our development editor at Manning, who was very patient in answering all our questions and concerns, provided detailed feedback for each of our draft chapters, and supported us in all possible ways Dennis Selinger and Jean-François Morin, who provided a thorough technical proofread of the manuscript shortly before it went to production; and Al Scherer, who provided technical expertise during development
RAOUL-GABRIEL URMA
First and foremost, I’d like to thank my parents for their endless love and support in my life. This little dream of writing a book has now come true! Next, I would like to express my eternal gratitude to Alan Mycroft, my PhD supervisor and coauthor, for his
xxi
ACKNOWLEDGMENTS
xxii
trust and support. I’d also like to thank my coauthor Mario Fusco for sharing this fun journey. Finally, I’d like to thank friends who have provided mentorship, useful advice, and encouragement in my life: Sophia Drossopoulou, Aidan Roche, Alex Buckley, Haadi Jabado, and Jaspar Robertson. You guys rock! MARIO FUSCO
I’d like to especially thank my wife, Marilena, whose boundless patience allowed me to stay focused on the book, and our daughter, Sofia, because the infinite chaos she can produce allowed me to get creatively distracted from the book. As you’ll discover reading the book, Sofia also taught us, like only a two-year-old baby girl can, the difference between internal and external iteration. I’d like to also thank Raoul-Gabriel Urma and Alan Mycroft, with whom I shared the (big) joys and the (small) pains of this writing experience. ALAN MYCROFT
I’d like to thank my wife, Hilary, and the rest of my family for enduring the many hours that “just a bit more work to do on the book” consumed. I also thank my colleagues and students over the years for teaching me how to teach, Mario and Raoul for being such efficient coauthors, and particularly Raoul for his skill at being so pleasantly demanding when requiring “the next bit of text by Friday.”
about this book Put simply, the new features in Java 8 along with the (less-obvious) changes in Java 9 are the biggest change to Java in the 21 years since Java 1.0 was released. Nothing has been taken away, so all your existing Java code continues to work—but the new features provide powerful new idioms and design patterns to help you write clearer, more concise code. At first you might think (as with all new features), “Why are they changing my language again?” But then, after a bit of practice, comes the revelation that you’ve just used the features to write shorter, clearer code in half the time you expected—and you realize you could never go back to “old Java” again. The second edition of this book, Modern Java in Action: Lambdas, Streams, Functional and Reactive Programming, is written to get you over that initial hump of “sounds good in principle, but it’s all a bit new and unfamiliar” and into coding like a native. “Perhaps,” you might think, “but lambdas, functional programming—aren’t those the sort of things that bearded sandal-wearing academics talk about in their ivory towers?” They might be, but Java 8 has incorporated just the right balance of ideas into Java to gain many of their advantages in a way that’s comprehensible to ordinary Java programmers. And this book tells the story from the ordinary-programmer viewpoint, with an occasional “how this arose” for perspective. “Lambdas—that sounds Greek to me!” Yes, it does, but it’s a great idea for enabling you to write concise Java programs. Many of you are familiar with event handlers and callbacks, where you register an object containing a method to be used when some event happens. Lambdas make this sort of idea much more widely usable in Java. Put simply, lambdas and their friends, method references, provide the ability to concisely
xxiii
xxiv
ABOUT THIS BOOK
pass code or methods as arguments to be executed in the middle of doing something else. You’ll see in this book how this idea occurs more frequently than you might think: from simply parameterizing a sort method with code to do the comparison to expressing complex queries on collections of data using the new Streams API. “Streams—what are they?” They’re a great new Java 8 addition. They behave like collections but have several advantages that enable new styles of programming. First, if you’ve ever programmed using a database-query language such as SQL, you’ll recognize that it enables queries to be written in a few lines that would take many lines in Java. Java 8 streams support this concise database-queries style of programming— but with Java syntax and none of the need to know about databases! Second, streams are designed so that not all their data needs to be in memory (or even computed) at once. Thus, you can process streams that are too big to fit in your computer memory. But Java 8 can optimize operations on streams in a way that Java can’t do for collections—for example, it can group together several operations on the same stream so that the data is traversed only once instead of expensively traversing it multiple times. Even better, Java can automatically parallelize stream operations for you (unlike collections). “And functional-style programming, what’s that?” It’s another style of programming, just like object-oriented programming, but centered on using functions as values, just as we mentioned previously when discussing lambdas. What’s great about Java 8 is that it incorporates many of the best ideas from functional programming into the familiar Java syntax. The fine design choices enable you to see functional-style programming in Java 8 as an additional set of design patterns and idioms to enable you to write clearer, more concise code in less time. Think of it as having a wider range of weapons in your programming armory. Oh yes, in addition to these features that lean on big conceptual additions to Java, we also explain the many other useful Java 8 features and updates such as default methods, the new Optional class, CompletableFuture, and the new Date and Time API. And there are the Java 9 additions: a new module system, support for reactive programming via the Flow API, and various other enhancements. But hey, this is an overview, and it’s time now for us to leave you to read the book.
How this book is organized: a roadmap Modern Java in Action is divided into six parts: “Fundamentals,” “Functional-style data processing with streams,” “Effective programming with streams and lambdas,” “Everyday Java,” “Enhanced Java concurrency,” and “Functional programming and future Java evolution.” While we strongly recommend that you read the chapters in the first two parts first (and in order because many of the concepts presented build on previous chapters), the remaining four parts can be read reasonably independently. Most chapters include several quizzes to help you work through the material. The first part of the book provides the fundamentals to help you get started with the new Java ideas introduced in Java 8. By the end of this first part, you’ll have a full
ABOUT THIS BOOK
xxv
understanding of what lambda expressions are, and you’ll be able to write code that’s both concise and flexible enough to easily adapt to changing requirements. ■
■
■
In chapter 1, we summarize the main changes to Java (lambda expressions, method references, streams, and default methods) and set the scene for the book. In chapter 2, you’ll learn about behavior parameterization, a softwaredevelopment pattern that Java 8 relies heavily on and is the motivation for lambda expressions. Chapter 3 gives a full explanation, with code examples and quizzes at every step, of the concepts of lambda expressions and method references.
The second part of this book is a deep exploration of the new Streams API, which lets you write powerful code that processes a collection of data in a declarative way. By the end of this second part, you’ll have a full understanding of what streams are and how you can use them in your codebase to process a collection of data concisely and efficiently. ■
■
■
■
Chapter 4 introduces the concept of a stream and explains how it compares with a collection. Chapter 5 investigates in detail the stream operations available to express sophisticated data-processing queries. You’ll look at many patterns such as filtering, slicing, finding, matching, mapping, and reducing. Chapter 6 covers collectors—a feature of the Streams API that lets you express even more complex data-processing queries. In chapter 7, you’ll learn about how streams can automatically run in parallel and leverage your multicore architectures. In addition, you’ll learn about various pitfalls to avoid when using parallel streams correctly and effectively.
The third part of this book explores various Java 8 and Java 9 topics that will make you more effective at using Java and will enhance your codebase with modern idioms. Because it is oriented toward more-advanced programming ideas we have arranged, nothing later in the book depends on the techniques described here. ■
■
■
Chapter 8 is a new chapter for the second edition and explores the Collection API Enhancements of Java 8 and Java 9. It covers using collection factories and learning new idiomatic patterns to work with List and Set collections along with idiomatic patterns involving Map. Chapter 9 explores how you can improve your existing code using new Java 8 features and a few recipes. In addition, it explores vital software-development techniques such as design patterns, refactoring, testing, and debugging. Chapter 10 is also new for the second edition. It explores the idea of basing an API on a domain-specific language (DSL). This is not only a powerful way of designing APIs but one which is both becoming increasingly popular and is already appearing in the Java classes such as Comparators, Stream, and Collectors.
ABOUT THIS BOOK
xxvi
The fourth part of this book explores various new features in Java 8 and Java 9 centered around making it easier and more reliable to code your projects. We start with two APIs introduced in Java 8. ■
■
■
■
Chapter 11 covers the java.util.Optional class, which allows you to both design better APIs and reduce null pointer exceptions. Chapter 12 explores the Date and Time API, which greatly improves the previous error-prone APIs for working with dates and time. In chapter 13, you’ll learn what default methods are, how you can use them to evolve APIs in a compatible way, some practical usage patterns, and rules for using default methods effectively. Chapter 14 is new for this second edition and explores the Java Module System—a major enhancement in Java 9 that enables huge systems to be modularized in a documented and enforceable way, rather than being “just a haphazard collection of packages.”
The fifth part of this book explores the more advanced ways of structuring concurrent programs in Java—beyond the ideas of easy-to-use parallel processing for streams introduced in chapters 6 and 7. Chapter 15 is new to this second edition and covers the “big-picture” idea of asynchronous APIs—including the ideas of Futures and the Publish-Subscribe protocol behind Reactive Programming and encapsulated in the Java 9 Flow API. ■
■
Chapter 16 explores CompletableFuture, which lets you express complex asynchronous computations in a declarative way—paralleling the design of the Streams API. Chapter 17 is again new to this second edition and explores the Java 9 Flow API in detail, focusing on practical reactive programming code.
In the sixth and final part of this book, we draw back a little with a tutorial introduction to writing effective functional-style programs in Java, along with a comparison of Java 8 features with those of Scala. ■
■
■
Chapter 18 gives a full tutorial on functional programming, introduces some of its terminology, and explains how to write functional-style programs in Java. Chapter 19 covers more advanced functional programming techniques including higher-order functions, currying persistent data structures, lazy lists, and pattern matching. You can view this chapter as a mix of practical techniques to apply in your codebase as well as academic information that will make you a more knowledgeable programmer. Chapter 20 follows by discussing how Java 8 features compare to features in the Scala language—a language that, like Java, is implemented on top of the JVM and that has evolved quickly to threaten some aspects of Java’s niche in the programming language ecosystem.
ABOUT THIS BOOK
■
xxvii
In chapter 21, we review the journey of learning about Java 8 and the gentle push toward functional-style programming. In addition, we speculate on what future enhancements and great new features may be in Java’s pipeline beyond Java 8, Java 9, and the small additions in Java 10.
Finally, there are four appendixes, which cover a number of other topics related to Java 8. Appendix A summarizes minor Java 8 language features that we didn’t discuss in the book. Appendix B gives an overview of other main additions to the Java library that you may find useful. Appendix C is a continuation of part 2 and looks at advanced uses of streams. Appendix D explores how the Java compiler implements lambda expressions behind the scenes.
About the code All source code in listings or in text is in a fixed-width font like this to separate it from ordinary text. Code annotations accompany many of the listings, highlighting important concepts. Source code for all the working examples in the book and instructions to run them are available on a GitHub repository and as a download via the book’s website. Both links to the source code may be found at www.manning.com/books/modernjava-in-action.
Book forum Purchase of Modern Java in Action includes free access to a private web forum run by Manning Publications where you can make comments about the book, ask technical questions, and receive help from the authors and from other users. To access the forum, go to https://forums.manning.com/forums/modern-java-in-action. You can also learn more about Manning’s forums and the rules of conduct at https://forums .manning.com/forums/about. Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the authors can take place. It is not a commitment to any specific amount of participation on the part of the authors, whose contribution to the forum remains voluntary (and unpaid). We suggest you try asking the authors some challenging questions lest their interest stray! The forum and the archives of previous discussions will be accessible from the publisher’s website as long as the book is in print.
about the authors RAOUL-GABRIEL URMA is CEO and co-founder of Cambridge Spark, a leading learning community for data scientists and developers in the UK. Raoul was nominated a Java Champion in 2017. He has worked for Google, eBay, Oracle, and Goldman Sachs. Raoul completed a PhD in Computer Science at the University of Cambridge. In addition, he holds a MEng in Computer Science from Imperial College London and graduated with first-class honors, having won several prizes for technical innovation. Raoul has delivered over 100 technical talks at international conferences. MARIO FUSCO is a senior software engineer at Red Hat working on the core development of Drools, the JBoss rule engine. He has vast experience as a Java developer, having been involved in (and often leading) many enterprise-level projects in several industries ranging from media companies to the financial sector. Among his interests are functional programming and domain-specific languages. By leveraging these two passions, he created the open source library lambdaj with the goal of providing an internal Java DSL for manipulating collections and allowing a bit of functional programming in Java.
xxviii
ABOUT THE AUTHORS
xxix
ALAN MYCROFT is professor of computing in the Computer Laboratory of Cambridge University, where he has been a faculty member since 1984. He’s also a fellow at Robinson College, a co-founder of the European Association for Programming Languages and Systems, and a co-founder and trustee of the Raspberry Pi Foundation. He has degrees in Mathematics (Cambridge) and Computer Science (Edinburgh). He’s the author of about 100 research papers and has supervised more than 20 PhD theses. His research centers on programming languages and their semantics, optimization, and implementation. He maintains strong industrial links, having worked at AT&T Laboratories and Intel Research during academic leave, as well as spinning out Codemist Ltd., which built the original ARM C compiler under the Norcroft name.
about the cover illustration The figure on the cover of Java in Action is captioned “Habit of a Mandarin of War in Chinese Tartary in 1700.” The Mandarin’s habit is ornately decorated, and he is carrying a sword and a bow and quiver on his back. If you look carefully at his belt, you will find a lambda buckle (added by our designer as a wink at one of the topics of this book). The illustration is taken from Thomas Jefferys’ A Collection of the Dresses of Different Nations, Ancient and Modern, London, published between 1757 and 1772. The title page states that these are hand-colored copperplate engravings, heightened with gum Arabic. Thomas Jefferys (1719–1771) was called “Geographer to King George III.” He was an English cartographer who was the leading map supplier of his day. He engraved and printed maps for government and other official bodies and produced a wide range of commercial maps and atlases, especially of North America. His work as a mapmaker sparked an interest in local dress customs of the lands he surveyed and mapped; they are brilliantly displayed in this four-volume collection. Fascination with faraway lands and travel for pleasure were relatively new phenomena in the eighteenth century, and collections such as this one were popular, introducing both the tourist as well as the armchair traveler to the inhabitants of other countries. The diversity of the drawings in Jefferys’ volumes speaks vividly of the uniqueness and individuality of the world’s nations centuries ago. Dress codes have changed, and the diversity by region and country, so rich at one time, has faded away. It is now often hard to tell the inhabitant of one continent from another. Perhaps, trying to view it optimistically, we have traded a cultural and visual
xxx
ABOUT THE COVER ILLUSTRATION
xxxi
diversity for a more varied personal life—or a more varied and interesting intellectual and technical life. At a time when it is hard to tell one computer book from another, Manning celebrates the inventiveness and initiative of the computer business with book covers based on the rich diversity of national costumes three centuries ago, brought back to life by Jefferys’ pictures.
Part 1 Fundamentals
T
his first part of the book provides the fundamentals to help you get started with the new Java ideas introduced in Java 8. By the end of this first part, you’ll have a full understanding of what lambda expressions are, and you’ll be able to write code that’s both concise and flexible enough to easily adapt to changing requirements. In chapter 1, we summarize the main changes to Java (lambda expressions, method references, streams, and default methods) and set the scene for the book. In chapter 2, you’ll learn about behavior parameterization, a software development pattern that Java 8 relies heavily on and is the motivation for lambda expressions. Chapter 3 gives a full explanation, with code examples and quizzes at every step, of the concepts of lambda expressions and method references.
Java 8, 9, 10, and 11: what’s happening?
This chapter covers Why Java keeps changing Changing computing background Pressures for Java to evolve Introducing new core features of Java 8 and 9
Since the release of Java Development Kit (JDK 1.0) in 1996, Java has won a large following of students, project managers, and programmers who are active users. It’s an expressive language and continues to be used for projects both large and small. Its evolution (via the addition of new features) from Java 1.1 (1997) to Java 7 (2011) has been well managed. Java 8 was released in March 2014, Java 9 in September 2017, Java 10 in March 2018, and Java 11 planned for September 2018. The question is this: Why should you care about these changes?
1.1
So, what’s the big story? We argue that the changes to Java 8 were in many ways more profound than any other changes to Java in its history (Java 9 adds important, but less-profound, productivity changes, as you’ll see later in this chapter, while Java 10 makes much
3
4
CHAPTER 1 Java 8, 9, 10, and 11: what’s happening?
smaller adjustments to type inference). The good news is that the changes enable you to write programs more easily. For example, instead of writing verbose code (to sort a list of apples in inventory based on their weight) like Collections.sort(inventory, new Comparator() { public int compare(Apple a1, Apple a2){ return a1.getWeight().compareTo(a2.getWeight()); } });
in Java 8 you can write more concise code that reads a lot closer to the problem statement, like the following: inventory.sort(comparing(Apple::getWeight));
The first Java 8 code of the book!
It reads “sort inventory comparing apple weight.” Don’t worry about this code for now. This book will explain what it does and how you can write similar code. There’s also a hardware influence: commodity CPUs have become multicore—the processor in your laptop or desktop machine probably contains four or more CPU cores. But the vast majority of existing Java programs use only one of these cores and leave the other three idle (or spend a small fraction of their processing power running part of the operating system or a virus checker). Prior to Java 8, experts might tell you that you have to use threads to use these cores. The problem is that working with threads is difficult and error-prone. Java has followed an evolutionary path of continually trying to make concurrency easier and less error-prone. Java 1.0 had threads and locks and even a memory model—the best practice at the time—but these primitives proved too difficult to use reliably in nonspecialist project teams. Java 5 added industrial-strength building blocks like thread pools and concurrent collections. Java 7 added the fork/join framework, making parallelism more practical but still difficult. Java 8 gave us a new, simpler way of thinking about parallelism. But you still have to follow some rules, which you’ll learn in this book. As you’ll see later in this book, Java 9 adds a further structuring method for concurrency—reactive programming. Although this has more-specialist use, it standardizes a means of exploiting the RxJava and Akka reactive streams toolkits that are becoming popular for highly concurrent systems. From the previous two desiderata (more concise code and simpler use of multicore processors) springs the whole consistent edifice captured by Java 8. We start by giving you a quick taste of these ideas (hopefully enough to intrigue you, but short enough to summarize them): The Streams API Techniques for passing code to methods Default methods in interfaces
So, what’s the big story?
5
Java 8 provides a new API (called Streams) that supports many parallel operations to process data and resembles the way you might think in database query languages— you express what you want in a higher-level manner, and the implementation (here the Streams library) chooses the best low-level execution mechanism. As a result, it avoids the need for you to write code that uses synchronized, which is not only highly error-prone but also more expensive than you may realize on multicore CPUs.1 From a slightly revisionist viewpoint, the addition of Streams in Java 8 can be seen as a direct cause of the two other additions to Java 8: concise techniques to pass code to methods (method references, lambdas) and default methods in interfaces. But thinking of passing code to methods as a mere consequence of Streams downplays its range of uses within Java 8. It gives you a new concise way to express behavior parameterization. Suppose you want to write two methods that differ in only a few lines of code. You can now simply pass the code of the parts that differ as an argument (this programming technique is shorter, clearer, and less error-prone than the common tendency to use copy and paste). Experts will here note that behavior parameterization could, prior to Java 8, be encoded using anonymous classes—but we’ll let the example at the beginning of this chapter, which shows increased code conciseness with Java 8, speak for itself in terms of clarity. The Java 8 feature of passing code to methods (and being able to return it and incorporate it into data structures) also provides access to a range of additional techniques that are commonly referred to as functional-style programming. In a nutshell, such code, called functions in the functional programming community, can be passed around and combined in a way to produce powerful programming idioms that you’ll see in Java guise throughout this book. The meat of this chapter begins with a high-level discussion on why languages evolve, continues with sections on the core features of Java 8, and then introduces the ideas of functional-style programming that the new features simplify using and that new computer architectures favor. In essence, section 1.2 discusses the evolution process and the concepts, which Java was previously lacking, to exploit multicore parallelism in an easy way. Section 1.3 explains why passing code to methods in Java 8 is such a powerful new programming idiom, and section 1.4 does the same for Streams—the new Java 8 way of representing sequenced data and indicating whether these can be processed in parallel. Section 1.5 explains how the new Java 8 feature of default methods enables interfaces and their libraries to evolve with less fuss and less recompilation; it also explains the modules addition to Java 9, which enables components of large Java systems to be specified more clearly than “just a JAR file of packages.” Finally, section 1.6 looks ahead at the ideas of functional-style programming in Java and other languages sharing the JVM. In summary, this chapter introduces ideas that are successively elaborated in the rest of the book. Enjoy the ride!
1
Multicore CPUs have separate caches (fast memory) attached to each processor core. Locking requires these to be synchronized, requiring relatively slow cache-coherency-protocol inter-core communication.
6
CHAPTER 1 Java 8, 9, 10, and 11: what’s happening?
1.2
Why is Java still changing? With the 1960s came the quest for the perfect programming language. Peter Landin, a famous computer scientist of his day, noted in 1966 in a landmark article2 that there had already been 700 programming languages and speculated on what the next 700 would be like—including arguments for functional-style programming similar to that in Java 8. Many thousands of programming languages later, academics have concluded that programming languages behave like ecosystems: new languages appear, and old languages are supplanted unless they evolve. We all hope for a perfect universal language, but in reality certain languages are better fitted for certain niches. For example, C and C++ remain popular for building operating systems and various other embedded systems because of their small runtime footprint and in spite of their lack of programming safety. This lack of safety can lead to programs crashing unpredictably and exposing security holes for viruses and the like; indeed, type-safe languages such as Java and C# have supplanted C and C++ in various applications when the additional runtime footprint is acceptable. Prior occupancy of a niche tends to discourage competitors. Changing to a new language and tool chain is often too painful for just a single feature, but newcomers will eventually displace existing languages, unless they evolve fast enough to keep up. (Older readers are often able to quote a range of such languages in which they’ve previously coded but whose popularity has since waned—Ada, Algol, COBOL, Pascal, Delphi, and SNOBOL, to name but a few.) You’re a Java programmer, and Java has been successful at colonizing (and displacing competitor languages in) a large ecosystem niche of programming tasks for nearly 20 years. Let’s examine some reasons for that.
1.2.1
Java’s place in the programming language ecosystem Java started well. Right from the start, it was a well-designed object-oriented language with many useful libraries. It also supported small-scale concurrency from day one with its integrated support for threads and locks (and with its early prescient acknowledgment, in the form of a hardware-neutral memory model, that concurrent threads on multicore processors can have unexpected behaviors in addition to those that happen on single-core processors). Also, the decision to compile Java to JVM bytecode (a virtual machine code that soon every browser supported) meant that it became the language of choice for internet applet programs (do you remember applets?). Indeed, there’s a danger that the Java Virtual Machine (JVM) and its bytecode will be seen as more important than the Java language itself and that, for certain applications, Java might be replaced by one of its competing languages such as Scala, Groovy, or Kotlin, which also run on the JVM. Various recent updates to the JVM (for example, the new invokedynamic bytecode in JDK7) aim to help such competitor languages
2
P. J. Landin, “The Next 700 Programming Languages,” CACM 9(3):157–65, March 1966.
7
Why is Java still changing?
run smoothly on the JVM—and to interoperate with Java. Java has also been successful at colonizing various aspects of embedded computing (everything from smart cards, toasters, and set-top boxes to car-braking systems).
How did Java get into a general programming niche? Object orientation became fashionable in the 1990s for two reasons: its encapsulation discipline resulted in fewer software engineering issues than those of C; and as a mental model it easily captured the WIMP programming model of Windows 95 and up. This can be summarized as follows: everything is an object; and a mouse click sends an event message to a handler (invokes the clicked method in a Mouse object). The write-once, run-anywhere model of Java and the ability of early browsers to (safely) execute Java code applets gave it a niche in universities, whose graduates then populated industry. There was initial resistance to the additional run cost of Java over C/C++, but machines got faster, and programmer time became more and more important. Microsoft’s C# further validated the Java-style objectoriented model.
But the climate is changing for the programming language ecosystem; programmers are increasingly dealing with so-called big data (data sets of terabytes and up) and wishing to exploit multicore computers or computing clusters effectively to process it. And this means using parallel processing—something Java wasn’t previously friendly to. You may have come across ideas from other programming niches (for example, Google’s map-reduce or the relative ease of data manipulation using database query languages such as SQL) that help you work with large volumes of data and multicore CPUs. Figure 1.1 summarizes the language ecosystem pictorially: think of the landscape as the space of programming problems and the dominant vegetation for a particular bit of ground as the favorite language for that program. Climate change is the idea that new hardware or new programming influences (for example, “Why can’t I program in an SQL-like style?”) mean that different languages become the language
Climate change (multicore processors, new programmer influences)
Scala JavaScript
C#/F# etc.
Java
C/C++
Figure 1.1 Programminglanguage ecosystem and climate change
8
CHAPTER 1 Java 8, 9, 10, and 11: what’s happening?
of choice for new projects, just like increasing regional temperatures mean grapes now thrive in higher latitudes. But there’s hysteresis—many an old farmer will keep raising traditional crops. In summary, new languages are appearing and becoming increasingly popular because they’ve adapted quickly to the climate change. The main benefit of the Java 8 additions for a programmer is that they provide more programming tools and concepts to solve new or existing programming problems more quickly or, more importantly, in a more concise, more easily maintainable way. Although the concepts are new to Java, they’ve proved powerful in niche researchlike languages. In the following sections, we’ll highlight and develop the ideas behind three such programming concepts that have driven the development of the Java 8 features to exploit parallelism and write more concise code in general. We’ll introduce them in a slightly different order from the rest of the book to enable a Unix-based analogy and to expose the “need this because of that” dependencies in Java 8’s new parallelism for multicore.
Another climate-change factor for Java One climate-change factor involves how large systems are designed. Nowadays, it’s common for a large system to incorporate large component subsystems from elsewhere, and perhaps these are built on top of other components from other vendors. Worse still, these components and their interfaces also tend to evolve. Java 8 and Java 9 have addressed these aspects by providing default methods and modules to facilitate this design style.
The next three sections examine the three programming concepts that drove the design of Java 8.
1.2.2
Stream processing The first programming concept is stream processing. For introductory purposes, a stream is a sequence of data items that are conceptually produced one at a time. A program might read items from an input stream one by one and similarly write items to an output stream. The output stream of one program could well be the input stream of another. One practical example is in Unix or Linux, where many programs operate by reading data from standard input (stdin in Unix and C, System.in in Java), operating on it, and then writing their results to standard output (stdout in Unix and C, System.out in Java). First, a little background: Unix cat creates a stream by concatenating two files, tr translates the characters in a stream, sort sorts lines in a stream, and tail -3 gives the last three lines in a stream. The Unix command line allows such programs to be linked together with pipes (|), giving examples such as cat file1 file2
|
tr "[A-Z]"
"[a-z]"
|
sort
|
tail -3
9
Why is Java still changing?
which (supposing file1 and file2 contain a single word per line) prints the three words from the files that appear latest in dictionary order, after first translating them to lowercase. We say that sort takes a stream of lines3 as input and produces another stream of lines as output (the latter being sorted), as illustrated in figure 1.2. Note that in Unix these commands (cat, tr, sort, and tail) are executed concurrently, so that sort can be processing the first few lines before cat or tr has finished. A more mechanical analogy is a car-manufacturing assembly line where a stream of cars is queued between processing stations that each take a car, modify it, and pass it on to the next station for further processing; processing at separate stations is typically concurrent even though the assembly line is physically a sequence. file 1 file 2
cat Figure 1.2
"[A-Z]"
"[a-z]"
tr
-3
sort
tail
Unix commands operating on streams
Java 8 adds a Streams API (note the uppercase S) in java.util.stream based on this idea; Stream is a sequence of items of type T. You can think of it as a fancy iterator for now. The Streams API has many methods that can be chained to form a complex pipeline just like Unix commands were chained in the previous example. The key motivation for this is that you can now program in Java 8 at a higher level of abstraction, structuring your thoughts of turning a stream of this into a stream of that (similar to how you think when writing database queries) rather than one item at a time. Another advantage is that Java 8 can transparently run your pipeline of Stream operations on several CPU cores on disjoint parts of the input—this is parallelism almost for free instead of hard work using Threads. We cover the Java 8 Streams API in detail in chapters 4–7.
1.2.3
Passing code to methods with behavior parameterization The second programming concept added to Java 8 is the ability to pass a piece of code to an API. This sounds awfully abstract. In the Unix example, you might want to tell the sort command to use a custom ordering. Although the sort command supports command-line parameters to perform various predefined kinds of sorting such as reverse order, these are limited. For example, let’s say you have a collection of invoice IDs with a format similar to 2013UK0001, 2014US0002, and so on. The first four digits represent the year, the next two letters a country code, and last four digits the ID of a client. You may want to sort
3
Purists will say a “stream of characters,” but it’s conceptually simpler to think that sort reorders lines.
10
CHAPTER 1 Java 8, 9, 10, and 11: what’s happening?
these invoice IDs by year or perhaps using the customer ID or even the country code. What you want is the ability to tell the sort command to take as an argument an ordering defined by the user: a separate piece of code passed to the sort command. Now, as a direct parallel in Java, you want to tell a sort method to compare using a customized order. You could write a method compareUsingCustomerId to compare two invoice IDs, but, prior to Java 8, you couldn’t pass this method to another method! You could create a Comparator object to pass to the sort method as we showed at the start of this chapter, but this is verbose and obfuscates the idea of simply reusing an existing piece of behavior. Java 8 adds the ability to pass methods (your code) as arguments to other methods. Figure 1.3, based on figure 1.2, illustrates this idea. We also refer to this conceptually as behavior parameterization. Why is this important? The Streams API is built on the idea of passing code to parameterize the behavior of its operations, just as you passed compareUsingCustomerId to parameterize the behavior of sort. public int compareUsingCustomerId(String inv1, String inv2){ .... }
sort Figure 1.3
Passing method compareUsingCustomerId as an argument to sort
We summarize how this works in section 1.3 of this chapter, but leave full details to chapters 2 and 3. Chapters 18 and 19 look at more advanced things you can do using this feature, with techniques from the functional programming community.
1.2.4
Parallelism and shared mutable data The third programming concept is rather more implicit and arises from the phrase “parallelism almost for free” in our previous discussion on stream processing. What do you have to give up? You may have to make some small changes in the way you code the behavior passed to stream methods. At first, these changes might feel a little uncomfortable, but once you get used to them, you’ll love them. You must provide behavior that is safe to execute concurrently on different pieces of the input. Typically this means writing code that doesn’t access shared mutable data to do its job. Sometimes these are referred to as pure functions or side-effect-free functions or stateless functions, and we’ll discuss these in detail in chapters 18 and 19. The previous parallelism arises only by assuming that multiple copies of your piece of code can work independently. If there’s a shared variable or object, which is written to, then things no longer work. What if two processes want to modify the shared variable at the same
Why is Java still changing?
11
time? (Section 1.4 gives a more detailed explanation with a diagram.) You’ll find more about this style throughout the book. Java 8 streams exploit parallelism more easily than Java’s existing Threads API, so although it’s possible to use synchronized to break the no-shared-mutable-data rule, it’s fighting the system in that it’s abusing an abstraction optimized around that rule. Using synchronized across multiple processing cores is often far more expensive than you expect, because synchronization forces code to execute sequentially, which works against the goal of parallelism. Two of these points (no shared mutable data and the ability to pass methods and functions—code—to other methods) are the cornerstones of what’s generally described as the paradigm of functional programming, which you’ll see in detail in chapters 18 and 19. In contrast, in the imperative programming paradigm you typically describe a program in terms of a sequence of statements that mutate state. The no-shared-mutabledata requirement means that a method is perfectly described solely by the way it transforms arguments to results; in other words, it behaves as a mathematical function and has no (visible) side effects.
1.2.5
Java needs to evolve You’ve seen evolution in Java before. For example, the introduction of generics and using List instead of just List may initially have been irritating. But you’re now familiar with this style and the benefits it brings (catching more errors at compile time and making code easier to read, because you now know what something is a list of). Other changes have made common things easier to express (for example, using a for-each loop instead of exposing the boilerplate use of an Iterator). The main changes in Java 8 reflect a move away from classical object orientation, which often focuses on mutating existing values, and toward the functional-style programming spectrum in which what you want to do in broad-brush terms (for example, create a value representing all transport routes from A to B for less than a given price) is considered prime and separated from how you can achieve this (for example, scan a data structure modifying certain components). Note that classical object-oriented programming and functional programming, as extremes, might appear to be in conflict. But the idea is to get the best from both programming paradigms, so you have a better chance of having the right tool for the job. We discuss this in detail in sections 1.3 and 1.4. A takeaway line might be this: languages need to evolve to track changing hardware or programmer expectations (if you need convincing, consider that COBOL was once one of the most important languages commercially). To endure, Java has to evolve by adding new features. This evolution will be pointless unless the new features are used, so in using Java 8 you’re protecting your way of life as a Java programmer. On top of that, we have a feeling you’ll love using Java 8’s new features. Ask anyone who’s used Java 8 whether they’re willing to go back! Additionally, the new Java 8
12
CHAPTER 1 Java 8, 9, 10, and 11: what’s happening?
features might, in the ecosystem analogy, enable Java to conquer programming-task territory currently occupied by other languages, so Java 8 programmers will be even more in demand. We now introduce the new concepts in Java 8, one by one, pointing out the chapters that cover these concepts in more detail.
1.3
Functions in Java The word function in programming languages is commonly used as a synonym for method, particularly a static method; this is in addition to it being used for mathematical function, one without side effects. Fortunately, as you’ll see, when Java 8 refers to functions these usages nearly coincide. Java 8 adds functions as new forms of value. These facilitate the use of streams, covered in section 1.4, which Java 8 provides to exploit parallel programming on multicore processors. We start by showing that functions as values are useful in themselves. Think about the possible values manipulated by Java programs. First, there are primitive values such as 42 (of type int) and 3.14 (of type double). Second, values can be objects (more strictly, references to objects). The only way to get one of these is by using new, perhaps via a factory method or a library function; object references point to instances of a class. Examples include "abc" (of type String), new Integer(1111) (of type Integer), and the result new HashMap(100) of explicitly calling a constructor for HashMap. Even arrays are objects. What’s the problem? To help answer this, we’ll note that the whole point of a programming language is to manipulate values, which, following historical programming-language tradition, are therefore called first-class values (or citizens, in the terminology borrowed from the 1960s civil rights movement in the United States). Other structures in our programming languages, which perhaps help us express the structure of values but which can’t be passed around during program execution, are second-class citizens. Values as listed previously are first-class Java citizens, but various other Java concepts, such as methods and classes, exemplify second-class citizens. Methods are fine when used to define classes, which in turn may be instantiated to produce values, but neither are values themselves. Does this matter? Yes, it turns out that being able to pass methods around at runtime, and hence making them first-class citizens, is useful in programming, so the Java 8 designers added the ability to express this directly in Java. Incidentally, you might wonder whether making other second-class citizens such as classes into firstclass-citizen values might also be a good idea. Various languages such as Smalltalk and JavaScript have explored this route.
1.3.1
Methods and lambdas as first-class citizens Experiments in other languages, such as Scala and Groovy, have determined that allowing concepts like methods to be used as first-class values made programming easier by adding to the toolset available to programmers. And once programmers become familiar with a powerful feature, they become reluctant to use languages
Functions in Java
13
without it! The designers of Java 8 decided to allow methods to be values—to make it easier for you to program. Moreover, the Java 8 feature of methods as values forms the basis of various other Java 8 features (such as Streams). The first new Java 8 feature we introduce is that of method references. Suppose you want to filter all the hidden files in a directory. You need to start writing a method that, given a File, will tell you whether it’s hidden. Fortunately, there’s such a method in the File class called isHidden. It can be viewed as a function that takes a File and returns a boolean. But to use it for filtering, you need to wrap it into a FileFilter object that you then pass to the File.listFiles method, as follows: File[] hiddenFiles = new File(".").listFiles(new FileFilter() { public boolean accept(File file) { return file.isHidden(); Filtering hidden files! } });
Yuck! That’s horrible. Although it’s only three significant lines, it’s three opaque lines—we all remember saying “Do I really have to do it this way?” on first encounter. You already have the method isHidden that you could use. Why do you have to wrap it up in a verbose FileFilter class and then instantiate it? Because that’s what you had to do prior to Java 8. Now, you can rewrite that code as follows: File[] hiddenFiles = new File(".").listFiles(File::isHidden);
Wow! Isn’t that cool? You already have the function isHidden available, so you pass it to the listFiles method using the Java 8 method reference :: syntax (meaning “use this method as a value”); note that we’ve also slipped into using the word function for methods. We’ll explain later how the mechanics work. One advantage is that your code now reads closer to the problem statement. Here’s a taste of what’s coming: methods are no longer second-class values. Analogous to using an object reference when you pass an object around (and object references are created by new), in Java 8 when you write File::isHidden, you create a method reference, which can similarly be passed around. This concept is discussed in detail in chapter 3. Given that methods contain code (the executable body of a method), using method references enables passing code around as in figure 1.3. Figure 1.4 illustrates the concept. You’ll also see a concrete example (selecting apples from an inventory) in the next section. LAMBDAS: ANONYMOUS FUNCTIONS
As well as allowing (named) methods to be first-class values, Java 8 allows a richer idea of functions as values, including lambdas 4 (or anonymous functions). For example, you can now write (int x) -> x + 1 to mean “the function that, when called with argument
4
Originally named after the Greek letter (lambda). Although the symbol isn’t used in Java, its name lives on.
14
CHAPTER 1 Java 8, 9, 10, and 11: what’s happening?
Old way of filtering hidden files File[] hiddenFiles = new File(".").listFiles(new FileFilter() { public boolean accept(File file) { return file.isHidden(); } });
FileFilter Object
Filtering files with the isHidden method requires wrapping the method inside a FileFilter object before passing it to the File.listFiles method.
isHidden Method File.listFiles
File -> boolean
Java 8 style File[] hiddenFiles = new File(".").listFiles(File::isHidden)
In Java 8 you can pass the isHidden function to the listFiles method using the method reference :: syntax.
File::isHidden syntax File::isHidden Figure 1.4
File.listFiles
Passing the method reference File::isHidden to the method listFiles
x, returns the value x + 1.” You might wonder why this is necessary, because you could define a method add1 inside a class MyMathsUtils and then write MyMathsUtils::add1! Yes, you could, but the new lambda syntax is more concise for cases where you don’t have a convenient method and class available. Chapter 3 explores lambdas in detail. Programs using these concepts are said to be written in functional-programming style; this phrase means “writing programs that pass functions around as first-class values.”
1.3.2
Passing code: an example Let’s look at an example of how this helps you write programs (discussed in more detail in chapter 2). All the code for the examples is available on a GitHub repository and as a download via the book’s website. Both links may be found at www.manning .com/books/modern-java-in-action. Suppose you have a class Apple with a method getColor and a variable inventory holding a list of Apples; then you might wish to select all the green apples (here using a Color enum type that includes values GREEN
Functions in Java
15
and RED) and return them in a list. The word filter is commonly used to express this concept. Before Java 8, you might write a method filterGreenApples: public static List filterGreenApples(List inventory) { List result = new ArrayList(); The result list for (Apple apple: inventory){ accumulates the result; if (GREEN.equals(apple.getColor())) { it starts as empty, and result.add(apple); then green apples are } The highlighted added one by one. } text selects only green apples. return result; }
But next, somebody would like the list of heavy apples (say over 150 g), and so, with a heavy heart, you’d write the following method to achieve this (perhaps even using copy and paste): public static List filterHeavyApples(List inventory) { List result = new ArrayList(); for (Apple apple: inventory){ if (apple.getWeight() > 150) { Here the highlighted result.add(apple); text selects only } heavy apples. } return result; }
We all know the dangers of copy and paste for software engineering (updates and bug fixes to one variant but not the other), and hey, these two methods vary only in one line: the highlighted condition inside the if construct. If the difference between the two method calls in the highlighted code had been what weight range was acceptable, then you could have passed lower and upper acceptable weights as arguments to filter—perhaps (150, 1000) to select heavy apples (over 150 g) or (0, 80) to select light apples (under 80 g). But as we mentioned previously, Java 8 makes it possible to pass the code of the condition as an argument, avoiding code duplication of the filter method. You can now write this: public static boolean isGreenApple(Apple apple) { return GREEN.equals(apple.getColor()); } public static boolean isHeavyApple(Apple apple) { Included for clarity return apple.getWeight() > 150; (normally imported } from java.util.function) public interface Predicate{ boolean test(T t); A method is passed as } a Predicate parameter static List filterApples(List inventory, named p (see the Predicate p) { sidebar “What’s a List result = new ArrayList(); Predicate?”). for (Apple apple: inventory){
16
CHAPTER 1 Java 8, 9, 10, and 11: what’s happening? if (p.test(apple)) { result.add(apple); } } return result;
Does the apple match the condition represented by p?
}
And to use this, you call either filterApples(inventory, Apple::isGreenApple);
or filterApples(inventory, Apple::isHeavyApple);
We explain how this works in detail in the next two chapters. The key idea to take away for now is that you can pass around a method in Java 8.
What’s a Predicate? The previous code passed a method Apple::isGreenApple (which takes an Apple for argument and returns a boolean) to filterApples, which expected a Predicate parameter. The word predicate is often used in mathematics to mean something function-like that takes a value for an argument and returns true or false. As you’ll see later, Java 8 would also allow you to write Function— more familiar to readers who learned about functions but not predicates at school— but using Predicate is more standard (and slightly more efficient because it avoids boxing a boolean into a Boolean).
1.3.3
From passing methods to lambdas Passing methods as values is clearly useful, but it’s annoying having to write a definition for short methods such as isHeavyApple and isGreenApple when they’re used perhaps only once or twice. But Java 8 has solved this, too. It introduces a new notation (anonymous functions, or lambdas) that enables you to write just filterApples(inventory, (Apple a) -> GREEN.equals(a.getColor()) );
or filterApples(inventory, (Apple a) -> a.getWeight() > 150 );
or even filterApples(inventory, (Apple a) -> a.getWeight() < 80 || RED.equals(a.getColor()) );
You don’t even need to write a method definition that’s used only once; the code is crisper and clearer because you don’t need to search to find the code you’re passing.
17
Streams
But if such a lambda exceeds a few lines in length (so that its behavior isn’t instantly clear), you should instead use a method reference to a method with a descriptive name instead of using an anonymous lambda. Code clarity should be your guide. The Java 8 designers could almost have stopped here, and perhaps they would have done so before multicore CPUs. Functional-style programming as presented so far turns out to be powerful, as you’ll see. Java might then have been rounded off by adding filter and a few friends as generic library methods, such as static Collection filter(Collection c, Predicate p);
You wouldn’t even have to write methods like filterApples because, for example, the previous call filterApples(inventory, (Apple a) -> a.getWeight() > 150 );
could be written as a call to the library method filter: filter(inventory, (Apple a) -> a.getWeight() > 150 );
But, for reasons centered on better exploiting parallelism, the designers didn’t do this. Java 8 instead contains a new Collection-like API called Stream, containing a comprehensive set of operations similar to the filter operation that functional programmers may be familiar with (for example, map and reduce), along with methods to convert between Collections and Streams, which we now investigate.
1.4
Streams Nearly every Java application makes and processes collections. But working with collections isn’t always ideal. For example, let’s say you need to filter expensive transactions from a list and then group them by currency. You’d need to write a lot of boilerplate code to implement this data-processing query, as shown here: Creates the Map where the grouped transaction will be accumulated
Map transactionsByCurrencies = new HashMap(); for (Transaction transaction : transactions) { Iterates the List if(transaction.getPrice() > 1000){ of transactions Currency currency = transaction.getCurrency(); List transactionsForCurrency = Extracts the transactionsByCurrencies.get(currency); transaction’s if (transactionsForCurrency == null) { If there isn’t currency transactionsForCurrency = new ArrayList(); an entry in transactionsByCurrencies.put(currency, the grouping transactionsForCurrency); Map for this } currency, Adds the currently transactionsForCurrency.add(transaction); create it. traversed transaction to } the List of transactions }
Filters expensive transactions
with the same currency
18
CHAPTER 1 Java 8, 9, 10, and 11: what’s happening?
In addition, it’s difficult to understand at a glance what the code does because of the multiple nested control-flow statements. Using the Streams API, you can solve this problem as follows: import static java.util.stream.Collectors.groupingBy; Map transactionsByCurrencies = transactions.stream() .filter((Transaction t) -> t.getPrice() > 1000) .collect(groupingBy(Transaction::getCurrency));
Filters expensive transactions Groups them by currency
Don’t worry about this code for now because it may look like a bit of magic. Chapters 4–7 are dedicated to explaining how to make sense of the Streams API. For now, it’s worth noticing that the Streams API provides a different way to process data in comparison to the Collections API. Using a collection, you’re managing the iteration process yourself. You need to iterate through the elements one by one using a foreach loop processing them in turn. We call this way of iterating over data external iteration. In contrast, using the Streams API, you don’t need to think in terms of loops. The data processing happens internally inside the library. We call this idea internal iteration. We come back to these ideas in chapter 4. As a second pain point of working with collections, think for a second about how you would process the list of transactions if you had a vast number of them; how can you process this huge list? A single CPU wouldn’t be able to process this large amount of data, but you probably have a multicore computer on your desk. Ideally, you’d like to share the work among the different CPU cores available on your machine to reduce the processing time. In theory, if you have eight cores, they should be able to process your data eight times as fast as using one core, because they work in parallel.5
Multicore computers All new desktop and laptop computers are multicore computers. Instead of a single CPU, they have four or eight or more CPUs (usually called Cores5). The problem is that a classic Java program uses just a single one of these cores, and the power of the others is wasted. Similarly, many companies use computing clusters (computers connected together with fast networks) to be able to process vast amounts of data efficiently. Java 8 facilitates new programming styles to better exploit such computers. Google’s search engine is an example of a piece of code that’s too big to run on a single computer. It reads every page on the internet and creates an index, mapping every word appearing on any internet page back to every URL containing that word. Then, when you do a Google search involving several words, software can quickly use this index to give you a set of web pages containing those words. Try to imagine how you might code this algorithm in Java (even for a smaller index than Google’s, you’d need to exploit all the cores in your computer).
5
This naming is unfortunate in some ways. Each of the cores in a multicore chip is a full-fledged CPU. But the phrase multicore CPU has become common, so core is used to refer to the individual CPUs.
19
Streams
1.4.1
Multithreading is difficult The problem is that exploiting parallelism by writing multithreaded code (using the Threads API from previous versions of Java) is difficult. You have to think differently: threads can access and update shared variables at the same time. As a result, data could change unexpectedly if not coordinated6 properly. This model is harder to think about7 than a step-by-step sequential model. For example, figure 1.5 shows a possible problem with two threads trying to add a number to a shared variable sum if they’re not synchronized properly. Execution 1
Thread 1
2
100
3
4
103
5
6
103
Add (3) Read sum
100
Write 100
100
100
Read Add (5) Thread 2
100
105
103
105 Write 105
Thread 1: sum = sum + 3; Thread 2: sum = sum + 5; Figure 1.5 A possible problem with two threads trying to add to a shared sum variable. The result is 105 instead of an expected result of 108.
Java 8 also addresses both problems (boilerplate and obscurity involving processing collections and difficulty exploiting multicore) with the Streams API (java.util .stream). The first design motivator is that there are many data-processing patterns (similar to filterApples in the previous section or operations familiar from database query languages such as SQL) that occur over and over again and that would benefit from forming part of a library: filtering data based on a criterion (for example, heavy apples), extracting data (for example, extracting the weight field from each apple in a list), or grouping data (for example, grouping a list of numbers into separate lists of even and odd numbers), and so on. The second motivator is that such operations can 6
7
Traditionally via the keyword synchronized, but many subtle bugs arise from its misplacement. Java 8’s Stream-based parallelism encourages a functional programming style where synchronized is rarely used; it focuses on partitioning the data rather than coordinating access to it. Aha—a source of pressure for the language to evolve!
20
CHAPTER 1 Java 8, 9, 10, and 11: what’s happening?
often be parallelized. For instance, as illustrated in figure 1.6, filtering a list on two CPUs could be done by asking one CPU to process the first half of a list and the second CPU to process the other half of the list. This is called the forking step (1). The CPUs then filter their respective half-lists (2). Finally (3), one CPU would join the two results. (This is closely related to how Google searches work so quickly, using many more than two processors.)
1 Fork
2 Filter
A
B
C
D
List of 5 apples
E
CPU 1 B
CPU 2 C
E
3 Joint results
B
C
E
Figure 1.6 Forking filter onto two CPUs and joining the result
For now, we’ll just say that the new Streams API behaves similarly to Java’s existing Collections API: both provide access to sequences of data items. But it’s useful for now to keep in mind that Collections is mostly about storing and accessing data, whereas Streams is mostly about describing computations on data. The key point here is that the Streams API allows and encourages the elements within a stream to be processed in parallel. Although it may seem odd at first, often the fastest way to filter a collection (for example, to use filterApples in the previous section on a list) is to convert it to a stream, process it in parallel, and then convert it back to a list. Again, we’ll just say “parallelism almost for free” and provide a taste of how you can filter heavy apples from a list sequentially or in parallel using streams and a lambda expression. Here’s an example of sequential processing: import static java.util.stream.Collectors.toList; List heavyApples = inventory.stream().filter((Apple a) -> a.getWeight() > 150) .collect(toList());
Default methods and Java modules
21
And here it is using parallel processing: import static java.util.stream.Collectors.toList; List heavyApples = inventory.parallelStream().filter((Apple a) -> a.getWeight() > 150) .collect(toList());
Parallelism in Java and no shared mutable state People have always said parallelism in Java is difficult, and all this stuff about synchronized is error-prone. Where’s the magic bullet in Java 8? There are two magic bullets. First, the library handles partitioning—breaking down a big stream into several smaller streams to be processed in parallel for you. Second, this parallelism almost for free from streams, works only if the methods passed to library methods like filter don’t interact (for example, by having mutable shared objects). But it turns out that this restriction feels natural to a coder (see, by way of example, our Apple::isGreenApple example). Although the primary meaning of functional in functional programming means “using functions as first-class values,” it often has a secondary nuance of “no interaction during execution between components.”
Chapter 7 explores parallel data processing in Java 8 and its performance in more detail. One of the practical issues the Java 8 developers found in evolving Java with all these new goodies was that of evolving existing interfaces. For example, the method Collections.sort belongs to the List interface but was never included. Ideally, you’d like to do list.sort(comparator) instead of Collections.sort(list, comparator). This may seem trivial but, prior to Java 8 you can update an interface only if you update all the classes that implement it—a logistical nightmare! This issue is resolved in Java 8 by default methods.
1.5
Default methods and Java modules As we mentioned earlier, modern systems tend to be built from components—perhaps bought-in from elsewhere. Historically, Java had little support for this, apart from a JAR file containing a set of Java packages with no particular structure. Moreover, evolving interfaces to such packages was hard—changing a Java interface meant changing every class that implements it. Java 8 and 9 have started to address this. First, Java 9 provides a module system that provide you with syntax to define modules containing collections of packages—and keep much better control over visibility and namespaces. Modules enrich a simple JAR-like component with structure, both as user documentation and for machine checking; we explain them in detail in chapter 14. Second, Java 8 added default methods to support evolvable interfaces. We cover these in detail in chapter 13. They’re important because you’ll increasingly encounter them in interfaces, but because relatively few programmers will need to write default methods themselves and because they facilitate program evolution
22
CHAPTER 1 Java 8, 9, 10, and 11: what’s happening?
rather than helping write any particular program, we keep the explanation here short and example-based. In section 1.4, we gave the following example Java 8 code: List heavyApples1 = inventory.stream().filter((Apple a) -> a.getWeight() > 150) .collect(toList()); List heavyApples2 = inventory.parallelStream().filter((Apple a) -> a.getWeight() > 150) .collect(toList());
But there’s a problem here: a List prior to Java 8 doesn’t have stream or parallelStream methods—and neither does the Collection interface that it implements— because these methods hadn’t been conceived of. And without these methods, this code won’t compile. The simplest solution, which you might employ for your own interfaces, would have been for the Java 8 designers to add the stream method to the Collection interface and add the implementation in the ArrayList class. But doing this would have been a nightmare for users. Many alternative collection frameworks implement interfaces from the Collections API. Adding a new method to an interface means all concrete classes must provide an implementation for it. Language designers have no control over existing implementations of Collection, so you have a dilemma: How can you evolve published interfaces without disrupting existing implementations? The Java 8 solution is to break the last link: an interface can now contain method signatures for which an implementing class doesn’t provide an implementation. Then who implements them? The missing method bodies are given as part of the interface (hence default implementations) rather than in the implementing class. This provides a way for an interface designer to enlarge an interface beyond those methods that were originally planned—without breaking existing code. Java 8 allows the existing default keyword to be used in interface specifications to achieve this. For example, in Java 8, you can call the sort method directly on a list. This is made possible with the following default method in the Java 8 List interface, which calls the static method Collections.sort: default void sort(Comparator
4.0.0 com.example expenses.readers 1.0 jar
Compiling and packaging
347
com.example expenses 1.0
The important thing to note is that this code explicitly mentions the parent module to help in the build process. The parent is the artifact with the ID expenses. You need to define the parent in pom.xml, as you see shortly. Next, you need to specify the pom.xml for the expenses.application module. This file is similar to the preceding one, but you have to add a dependency to the expenses.readers project, because ExpensesApplication requires the classes and interfaces that it contains to compile:
4.0.0 com.example expenses.application 1.0 jar
com.example expenses 1.0
com.example expenses.readers 1.0
Now that two modules, expenses.application and expenses.readers, have their own pom.xml, you can set up the global pom.xml to guide the build process. Maven supports projects that have multiple Maven modules with the special XML element , which refers to the children’s artifact IDs. Here’s the complete definition, which refers to the two child modules expenses.application and expenses.readers:
4.0.0 com.example expenses pom 1.0
expenses.application expenses.readers
org.apache.maven.plugins maven-compiler-plugin 3.7.0
9 9
Congratulations! Now you can run the command mvn clean package to generate the JARs for the modules in your project. This command generates ./expenses.application/target/expenses.application-1.0.jar ./expenses.readers/target/expenses.readers-1.0.jar
You can run your module application by including these two JARs on the module path as follows: java --module-path \ ./expenses.application/target/expenses.application-1.0.jar:\ ./expenses.readers/target/expenses.readers-1.0.jar \ --module \ expenses.application/com.example.expenses.application.ExpensesApplication
So far, you’ve learned about modules you created, and you’ve seen how to use requires to reference java.base. Real-world software, however, depends on external modules and libraries. How does that process work, and what if legacy libraries haven’t been updated with an explicit module-info.java? In the next section, we answer these questions by introducing automatic modules.
Automatic modules
349
14.7 Automatic modules You may decide that the implementation of your HttpReader is low-level; instead, you’d like to use a specialized library such as the httpclient from the Apache project. How do you incorporate that library into your project? You’ve learned about the requires clause, so try to add it in the module-info.java for the expenses.readers project. Run mvn clean package again to see what happens. Unfortunately, the result is bad news: [ERROR] module not found: httpclient
You get this error because you also need to update your pom.xml to state the dependency. The maven compiler plugin puts all dependencies on the module path when you’re building a project that has a module-info.java so that the appropriate JARs are downloaded and recognized in your project, as follows:
org.apache.httpcomponents httpclient 4.5.3
Now running mvn clean package builds the project correctly. Notice something interesting, though: the library httpclient isn’t a Java module. It’s an external library that you want to use as a module, but it hasn’t yet been modularized. Java turns the appropriate JAR into a so-called automatic module. Any JAR on the module path without a module-info file becomes an automatic module. Automatic modules implicitly export all their packages. A name for this automatic module is invented automatically, derived from the JAR name. You have a few ways to derive the name, but the easiest way is to use the jar tool with the --describe-module argument: jar --file=./expenses.readers/target/dependency/httpclient-4.5.3.jar \ --describe-module [email protected] automatic
In this case, the name is httpclient. The final step is running the application and adding the httpclient JAR to the module path: java --module-path \ ./expenses.application/target/expenses.application-1.0.jar:\ ./expenses.readers/target/expenses.readers-1.0.jar \ ./expenses.readers/target/dependency/httpclient-4.5.3.jar \ --module \ expenses.application/com.example.expenses.application.ExpensesApplication
350
CHAPTER 14
The Java Module System
There’s a project (https://github.com/moditect/moditect) to provide better support for the Java 9 Module System within Maven, such as to generate module-info files automatically.
NOTE
14.8 Module declaration and clauses The Java Module System is a large beast. As we mentioned earlier, we recommend that you read a dedicated book on the topic if you’d like to go further. Nonetheless, this section gives you a brief overview of other keywords available in the module declaration language to give you an idea of what’s possible. As you learned in the earlier sections, you declare a module by using the module directive. Here, it has the name com.iteratrlearning.application: module com.iteratrlearning.application { }
What can go inside the module declaration? You’ve learned about the requires and exports clauses, but there are other clauses, including requires-transitive, exports-to, open, opens, uses, and provides. We look at these clauses in turn in the following sections.
14.8.1 requires The requires clause lets you specify that your module depends on another module at both compile time and runtime. The module com.iteratrlearning.application, for example, depends on the module com.iteratrlearning.ui: module com.iteratrlearning.application { requires com.iteratrlearning.ui; }
The result is that only public types that were exported by com.iteratrlearning.ui are available for com.iteratrlearning.application to use.
14.8.2 exports The exports clause makes the public types in specific packages available for use by other modules. By default, no package is exported. You gain strong encapsulation by making explicit what packages should be exported. In the following example, the packages com.iteratrlearning.ui.panels and com.iteratrlearning.ui.widgets are exported. (Note that exports takes a package name as an argument and that requires takes a module name, despite the similar naming schemes.) module com.iteratrlearning.ui { requires com.iteratrlearning.core; exports com.iteratrlearning.ui.panels; exports com.iteratrlearning.ui.widgets; }
Module declaration and clauses
351
14.8.3 requires transitive You can specify that a module can use the public types required by another module. You can modify the requires clause, for example, to requires-transitive inside the declaration of the module com.iteratrlearning.ui: module com.iteratrlearning.ui { requires transitive com.iteratrlearning.core; exports com.iteratrlearning.ui.panels; exports com.iteratrlearning.ui.widgets; } module com.iteratrlearning.application { requires com.iteratrlearning.ui; }
The result is that the module com.iteratrlearning.application has access to the public types exported by com.iteratrlearning.core. Transitivity is useful when the module required (here, com.iteratrlearning.ui) returns types from another module required by this module (com.iteratrlearning.core). It would be annoying to re-declare requires com.iteratrlearning.core inside the module com .iteratrlearning.application. This problem is solved by transitive. Now any module that depends on com.iteratrlearning.ui automatically reads the com .iteratrlearning.core module.
14.8.4 exports to You have a further level of visibility control, in that you can restrict the allowed users of a particular export by using the exports to construct. As you saw in section 14.8.2, you can restrict the allowed users of com.iteratrlearning.ui.widgets to com .iteratrlearning.ui.widgetuser by adjusting the module declaration like so: module com.iteratrlearning.ui { requires com.iteratrlearning.core; exports com.iteratrlearning.ui.panels; exports com.iteratrlearning.ui.widgets to com.iteratrlearning.ui.widgetuser; }
14.8.5 open and opens Using the open qualifier on module declaration gives other modules reflective access to all its packages. The open qualifier has no effect on module visibility except for allowing reflective access, as in this example: open module com.iteratrlearning.ui { }
352
CHAPTER 14
The Java Module System
Before Java 9, you could inspect the private state of objects by using reflection. In other words, nothing was truly encapsulated. Object-relational mapping (ORM) tools such as Hibernate often use this capability to access and modify state directly. In Java 9, reflection is no longer allowed by default. The open clause in the preceding code serves to allow that behavior when it’s needed. Instead of opening a whole module to reflection, you can use an opens clause within a module declaration to open its packages individually, as required. You can also use the to qualifier in the opens-to variant to limit the modules allowed to perform reflective access, analogous to how exports-to limits the modules allowed to require an exported package.
14.8.6 uses and provides If you’re familiar with services and ServiceLoader, the Java Module System allows you to specify a module as a service provider using the provides clause and a service consumer using the uses clause. This topic is advanced, however, and beyond the scope of this chapter. If you’re interested in combining modules and service loaders, we recommend that you read a comprehensive resource such as The Java Module System, by Nicolai Parlog (Manning Publications), mentioned earlier in this chapter.
14.9 A bigger example and where to learn more You can get a flavor of the module system from the following example, taken from Oracle’s Java documentation. This example shows a module declaration using most of the features discussed in this chapter. The example isn’t meant to frighten you (the vast majority of module statements are simple exports and requires), but it gives you a look at some richer features: module com.example.foo { requires com.example.foo.http; requires java.logging; requires transitive com.example.foo.network; exports com.example.foo.bar; exports com.example.foo.internal to com.example.foo.probe; opens com.example.foo.quux; opens com.example.foo.internal to com.example.foo.network, com.example.foo.probe; uses com.example.foo.spi.Intf; provides com.example.foo.spi.Intf with com.example.foo.Impl; }
This chapter discussed the need for the new Java Module System and provided a gentle introduction to its main features. We didn’t cover many features, including service loaders, additional module descriptor clauses, and tools for working with modules such as jdeps and jlink. If you’re a Java EE developer, it’s important to keep in mind when
Summary
353
migrating your applications to Java 9 that several packages relevant to EE aren’t loaded by default in the modularized Java 9 Virtual Machine. The JAXB API classes, for example, are now considered to be Java EE APIs and are no longer available in the default class path in Java SE 9. You need to explicitly add modules of interest by using the --addmodules command-line switch to keep compatibility. To add java.xml.bind, for example, you need to specify --add-modules java.xml.bind. As we noted earlier, doing the Java Module System justice would require a whole book, not a single chapter. To explore the details in greater depth, we suggest a book such as The Java Module System, by Nicolai Parlog (Manning Publications), mentioned earlier in this chapter.
Summary Separation of concerns and information hiding are two important principles to
help construct software that you can reason about. Before Java 9, you made code modular by introducing packages, classes, and
interfaces that have a specific concern, but these elements weren’t rich enough for effective encapsulation. The Class Path Hell problem makes it hard to reason about the dependencies of an application. Before Java 9, the JDK was monolithic, resulting in high maintenance costs and restricted evolution. Java 9 introduced a new module system in which a module-info.java file names a module and specifies its dependencies (requires) and public API (exports). The requires clause lets you specify dependencies on other modules. The exports clause makes the public types of specific packages in a module available for use by other modules. The prefered naming convention for a module follows the reverse internet domain-name convention. Any JAR on the module path without a module-info file becomes an automatic module. Automatic modules implicitly export all their packages. Maven supports applications structured with the Java 9 Module System.
Part 5 Enhanced Java concurrency
T
he fifth part of this book explores the more advanced ways of structuring concurrent programs in Java—beyond the ideas of easy-to-use parallel processing for streams introduced in chapters 6 and 7. Again, nothing in the rest of the book depends on this part, so do feel free to skip this part if you don’t (yet) need to explore these ideas. Chapter 15 is new to this second edition and covers the “big-picture” idea of asynchronous APIs, including the ideas of Futures and the publish-subscribe protocol behind reactive programming and encapsulated in the Java 9 Flow API. Chapter 16 explores CompletableFuture, which lets you express complex asynchronous computations in a declarative way—paralleling the design of the Streams API. Chapter 17 is also new to this second edition and explores the Java 9 Flow API in detail, focusing on practical reactive programming code.
Concepts behind CompletableFuture and reactive programming This chapter covers Threads, Futures, and the evolutionary forces
causing Java to support richer concurrency APIs Asynchronous APIs The boxes-and-channels view of concurrent
computing CompletableFuture combinators to connect boxes
dynamically The publish-subscribe protocol that forms the
basis of the Java 9 Flow API for reactive programming Reactive programming and reactive systems
In recent years, two trends are obliging developers to rethink the way software is written. The first trend is related to the hardware on which applications run, and the second trend concerns how applications are structured (particularly how they interact). We discussed the effect of the hardware trend in chapter 7. We noted that since the advent of multicore processors, the most effective way to speed your applications is to write software that can fully exploit multicore processors. You saw that
357
358
CHAPTER 15
Concepts behind CompletableFuture and reactive programming
you can split large tasks and make each subtask run in parallel with the others. You also learned how the fork/join framework (available since Java 7) and parallel streams (new in Java 8) allow you to accomplish this task in a simpler, more effective way than working directly with threads. The second trend reflects the increasing availability and use by applications of Internet services. The adoption of microservices architecture, for example, has grown over the past few years. Instead of being one monolithic application, your application is subdivided into smaller services. The coordination of these smaller services requires increased network communication. Similarly, many more internet services are accessible through public APIs, made available by known providers such as Google (localization information), Facebook (social information), and Twitter (news). Nowadays, it’s relatively rare to develop a website or a network application that works in total isolation. It’s far more likely that your next web application will be a mashup, using content from multiple sources and aggregating it to ease your users’ lives. You may want to build a website that collects and summarizes social-media sentiment on a given topic to your French users. To do so, you could use the Facebook or Twitter API to find trending comments about that topic in many languages and rank the most relevant ones with your internal algorithms. Then you might use Google Translate to translate the comments into French or use Google Maps to geolocate their authors, aggregate all this information, and display it on your website. If any of these external network services are slow to respond, of course, you’ll want to provide partial results to your users, perhaps showing your text results alongside a generic map with a question mark in it instead of showing a blank screen until the map server responds or times out. Figure 15.1 illustrates how this style of mashup application interacts with remote services. Your program Comments tweets
Topic
Ranking
Sentiment
French sentiment
Facebook
Twitter
Remote services
GoogleTranslate
Figure 15.1 A typical mashup application
To implement applications like this, you have to contact multiple web services across the Internet. But you don’t want to block your computations and waste billions of precious clock cycles of your CPU waiting for an answer from these services. You
359
shouldn’t have to wait for data from Facebook before processing the data coming from Twitter, for example. This situation represents the other side of the multitask-programming coin. The fork/join framework and parallel streams, discussed in chapter 7, are valuable tools for parallelism; they divide a task into multiple subtasks and perform those subtasks in parallel on different cores, CPUs, or even machines. Conversely, when you’re dealing with concurrency instead of parallelism, or when your main goal is to perform several loosely related tasks on the same CPUs, keeping their cores as busy as possible to maximize the throughput of your application, you want to avoid blocking a thread and wasting its computational resources while waiting (potentially for quite a while) for a result from a remote service or from interrogating a database. Java offers two main tool sets for such circumstances. First, as you’ll see in chapters 16 and 17, the Future interface, and particularly its Java 8 CompletableFuture implementation, often provide simple and effective solutions (chapter 16). More recently, Java 9 added the idea of reactive programming, built around the idea of the so-called publish-subscribe protocol via the Flow API, which offers more sophisticated programming approaches (chapter 17). Figure 15.2 illustrates the difference between concurrency and parallelism. Concurrency is a programming property (overlapped execution) that can occur even for a single-core machine, whereas parallelism is a property of execution hardware (simultaneous execution). Concurrency Core 1
Core 2
Parallelism Core 1
Core 2
task1
task2
task1 task2 task1
Figure 15.2 Concurrency versus parallelism
The rest of this chapter explains the fundamental ideas underpinning Java’s new CompletableFuture and Flow APIs. We start by explaining the Java evolution of concurrency, including Threads, and higher-level abstractions, including Thread Pools and Futures (section 15.1). We note that chapter 7 dealt with mainly using parallelism in looplike programs. Section 15.2 explores how you can better exploit concurrency for method calls. Section 15.3 gives you a diagrammatic way to see parts of programs as boxes that communicate over
360
CHAPTER 15
Concepts behind CompletableFuture and reactive programming
Guidance for the reader This chapter contains little real-life Java code. We suggest that readers who want to see only code skip to chapters 16 and 17. On the other hand, as we’ve all discovered, code that implements unfamiliar ideas can be hard to understand. Therefore, we use simple functions and include diagrams to explain the big-picture ideas, such as the publish-subscribe protocol behind the Flow API capturing reactive programming.
channels. Section 15.4 and section 15.5 look at the CompletableFuture and reactiveprogramming principles in Java 8 and 9. Finally, section 15.6 explains the difference between a reactive system and reactive programming. We exemplify most of the concepts with a running example showing how to calculate expressions such as f(x)+g(x) and then return, or print, the result by using various Java concurrency features—assuming that f(x) and g(x) are long-running computations.
15.1 Evolving Java support for expressing concurrency Java has evolved considerably in its support for concurrent programming, largely reflecting the changes in hardware, software systems, and programming concepts over the past 20 years. Summarizing this evolution can help you understand the reason for the new additions and their roles in programming and system design. Initially, Java had locks (via synchronized classes and methods), Runnables and Threads. In 2004, Java 5 introduced the java.util.concurrent package, which supported more expressive concurrency, particularly the ExecutorService1 interface (which decoupled task submission from thread execution), as well as Callable and Future, which produced higher-level and result-returning variants of Runnable and Thread and used generics (also introduced in Java 5). ExecutorServices can execute both Runnables and Callables. These features facilitated parallel programming on the multicore CPUs that started to appear the following year. To be honest, nobody enjoyed working with threads directly! Later versions of Java continued to enhance concurrency support, as it became increasingly demanded by programmers who needed to program multicore CPUs effectively. As you saw in chapter 7, Java 7 added java.util.concurrent.RecursiveTask to support fork/join implementation of divide-and-conquer algorithms, and Java 8 added support for Streams and their parallel processing (building on the newly added support for lambdas). Java further enriched its concurrency features by providing support for composing Futures (via the Java 8 CompletableFuture implementation of Future, section 15.4 and chapter 16), and Java 9, provided explicit support for distributed asynchronous programming. These APIs give you a mental model and toolkit for building the sort of mashup application mentioned in the introduction to this chapter. There the 1
The ExecutorService interface extends the Executor interface with the submit method to run a Callable; the Executor interface merely has an execute method for Runnables.
Evolving Java support for expressing concurrency
361
application worked by contacting various web services and combining their information in real time for a user or to expose it as a further web service. This process is called reactive programming, and Java 9 provides support for it via the publish-subscribe protocol (specified by the java.util.concurrent.Flow interface; see section 15.5 and chapter 17). A key concept of CompletableFuture and java.util.concurrent.Flow is to provide programming structures that enable independent tasks to execute concurrently wherever possible and in a way that easily exploits as much as possible of the parallelism provided by multicore or multiple machines.
15.1.1 Threads and higher-level abstractions Many of us learned about threads and processes from a course on operating systems. A single-CPU computer can support multiple users because its operating system allocates a process to each user. The operating system gives these processes separate virtual address spaces so that two users feel like they’re the only users of the computer. The operating system furthers this illusion by waking periodically to share the CPU among the processes. A process can request that the operating system allocate it one or more threads—processes that share an address space as their owning process and therefore can run tasks concurrently and cooperatively. In a multicore setting, perhaps a single-user laptop running only one user process, a program can never fully exploit the computing power of the laptop unless it uses threads. Each core can be used for one or more processes or threads, but if your program doesn’t use threads, it’s effectively using only one of the processor cores. Indeed, if you have a four-core CPU and can arrange for each core to continually do useful work, your program theoretically runs up to four times faster. (Overheads reduce this result somewhere, of course.) Given an array of numbers of size 1,000,000 storing the number of correct questions answered by students in an example, compare the program long sum = 0; for (int i = 0; i < 1_000_000; i++) { sum += stats[i]; }
running on a single thread, which worked fine in single-core days, with a version that creates four threads, with the first thread executing long sum0 = 0; for (int i = 0; i < 250_000; i++) { sum0 += stats[i]; }
and to the fourth thread executing long sum3 = 0; for (int i = 750_000; i < 1_000_000; i++) { sum3 += stats[i]; }
362
CHAPTER 15
Concepts behind CompletableFuture and reactive programming
These four threads are complemented by the main program starting them in turn (.start() in Java), waiting for them to complete (.join()), and then computing sum = sum0 + … + sum3;
The trouble is that doing this for each loop is tedious and error-prone. Also, what can you do for code that isn’t a loop? Chapter 7 showed how Java Streams can achieve this parallelism with little programmer effort by using internal iteration instead of external iteration (explicit loops): sum = Arrays.stream(stats).parallel().sum();
The takeaway idea is that parallel Stream iteration is a higher-level concept than explicit use of threads. In other words, this use of Streams abstracts a given use pattern of threads. This abstraction into Streams is analogous to a design pattern, but with the benefit that much of the complexity is implemented inside the library rather than being boilerplate code. Chapter 7 also explained how to use java.util.concurrent .RecursiveTask support in Java 7 for the fork/join abstraction of threads to parallelize divide-and-conquer algorithms, providing a higher-level way to sum the array efficiently on a multicore machine. Before looking at additional abstractions for threads, we visit the (Java 5) idea of ExecutorServices and the thread pools on which these further abstractions are built.
15.1.2 Executors and thread pools Java 5 provided the Executor framework and the idea of thread pools as a higher-level idea capturing the power of threads, which allow Java programmers to decouple task submission from task execution. PROBLEMS WITH THREADS
Java threads access operating-system threads directly. The problem is that operating-system threads are expensive to create and to destroy (involving interaction with page tables), and moreover, only a limited number exist. Exceeding the number of operatingsystem threads is likely to cause a Java application to crash mysteriously, so be careful not to leave threads running while continuing to create new ones. The number of operating system (and Java) threads will significantly exceed the number of hardware threads2, so all the hardware threads can be usefully occupied executing code even when some operating-system threads are blocked or sleeping. As an example, the 2016 Intel Core i7-6900K server processor has eight cores, each with two symmetric multiprocessing (SMP) hardware threads, leading to 16 hardware threads, and a server may contain several of these processors, consisting of perhaps 64 hardware threads. By contrast, a laptop may have only one or two hardware threads, so portable programs must avoid making assumptions about how many hardware threads 2
We’d use the word core here, but CPUs like the Intel i7-6900K have multiple hardware threads per core, so the CPU can execute useful instructions even for short delays such as a cache miss.
Evolving Java support for expressing concurrency
363
are available. Contrarily, the optimum number of Java threads for a given program depends on the number of hardware cores available! THREAD POOLS AND WHY THEY’RE BETTER The Java ExecutorService provides an interface where you can submit tasks and obtain
their results later. The expected implementation uses a pool of threads, which can be created by one of the factory methods, such as the newFixedThreadPool method: ExecutorService newFixedThreadPool(int nThreads)
This method creates an ExecutorService containing nThreads (often called worker threads) and stores them in a thread pool, from which unused threads are taken to run submitted tasks on a first-come, first-served basis. These threads are returned to the pool when their tasks terminate. One great outcome is that it’s cheap to submit thousands of tasks to a thread pool while keeping the number of tasks to a hardware-appropriate number. Several configurations are possible, including the size of the queue, rejection policy, and priority for different tasks. Note the wording: The programmer provides a task (a Runnable or a Callable), which is executed by a thread. THREAD POOLS AND WHY THEY’RE WORSE
Thread pools are better than explicit thread manipulation in almost all ways, but you need to be aware of two “gotchas:” A thread pool with k threads can execute only k tasks concurrently. Any further
task submissions are held in a queue and not allocated a thread until one of the existing tasks completes. This situation is generally good, in that it allows you to submit many tasks without accidentally creating an excessive number of threads, but you have to be wary of tasks that sleep or wait for I/O or network connections. In the context of blocking I/O, these tasks occupy worker threads but do no useful work while they’re waiting. Try taking four hardware threads and a thread pool of size 5 and submitting 20 tasks to it (figure 15.3). You might expect that the tasks would run in parallel until all 20 have completed. But suppose that three of the first-submitted tasks sleep or wait for I/O. Then only two threads are available for the remaining 15 tasks, so you’re getting only half the throughput you expected (and would have if you created the thread pool with eight threads instead). It’s even possible to cause deadlock in a thread pool if earlier task submissions or already running tasks, need to wait for later task submissions, which is a typical use-pattern for Futures. The takeaway is to try to avoid submitting tasks that can block (sleep or wait for events) to thread pools, but you can’t always do so in existing systems. Java typically waits for all threads to complete before allowing a return from
main to avoid killing a thread executing vital code. Therefore, it’s important in
practice and as part of good hygiene to shut down every thread pool before exiting the program (because worker threads for this pool will have been created but
364
CHAPTER 15
Concepts behind CompletableFuture and reactive programming
Queued tasks
Five worker treads running five active tasks
If these three tasks sleep or otherwise block, then these 15 queued tasks all have to wait for the remaining two workers to run them—reducing the execution parallelism. Figure 15.3
Sleeping tasks reduce the throughput of thread pools.
not terminated, as they’re waiting for another task submission). In practice, it’s common to have a long-running ExecutorService that manages an alwaysrunning Internet service. Java does provide the Thread.setDaemon method to control this behavior, which we discuss in the next section.
15.1.3 Other abstractions of threads: non-nested with method calls To explain why the forms of concurrency used in this chapter differ from those used in chapter 7 (parallel Stream processing and the fork/join framework), we’ll note that the forms used in chapter 7 have one special property: whenever any task (or thread) is started within a method call, the same method call waits for it to complete before returning. In other words, thread creation and the matching join() happen in a way that nests properly within the call-return nesting of method calls. This idea, called strict fork/join, is depicted in figure 15.4. It’s relatively innocuous to have a more relaxed form of fork/join in which a spawned task escapes from an internal method call but is joined in an outer call, so that the interface provided to users still appears to be a normal call,3 as shown in figure 15.5. 3
Compare “Thinking Functionally” (chapter 18) in which we discuss having a side-effect-free interface to a method that internally uses side-effects!
365
Evolving Java support for expressing concurrency
join
call
return fork
Figure 15.4 Strict fork/join. Arrows denote threads, circles represent forks and joins, and rectangles represent method calls and returns.
Escaping spawned thread call
return join
fork
Figure 15.5 Relaxed fork/join
In this chapter, we focus on richer forms of concurrency in which threads created (or tasks spawned) by a user’s method call may outlive the call, as shown in figure 15.6. Ongoing thread
call
return fork
Figure 15.6 An asynchronous method
method
This type of method is often called an asynchronous method, particularly when the ongoing spawned task continues to do work that’s helpful to the method caller. We explore Java 8 and 9 techniques for benefiting from such methods later in this chapter, starting in section 15.2, but first, check the dangers: The ongoing thread runs concurrently with the code following the method call
and therefore requires careful programming to avoid data races. What happens if the Java main() method returns before the ongoing thread has
terminated? There are two answers, both rather unsatisfactory: – Wait for all such outstanding threads before exiting the application. – Kill all outstanding threads and then exit.
366
CHAPTER 15
Concepts behind CompletableFuture and reactive programming
The former solution risks a seeming application crash by never terminating due to a forgotten thread; the latter risks interrupting a sequence of I/O operations writing to disk, thereby leaving an external data in an inconsistent state. To avoid both of these problems, ensure that your program keeps track of all threads it creates and joins them all before exiting (including shutting down any thread pools). Java threads can be labeled as daemon4 or nondaemon, using the setDaemon() method call. Daemon threads are killed on exit (and therefore are useful for services that don’t leave the disk in an inconsistent state), whereas returning from main continues to wait for all threads that aren’t daemons to terminate before exiting the program.
15.1.4 What do you want from threads? What you want is to be able to structure your program so that whenever it can benefit from parallelism, enough tasks are available to occupy all the hardware threads, which means structuring your program to have many smaller tasks (but not too small because of the cost of task switching). You saw how to do this for loops and divideconquer algorithms in chapter 7, using parallel stream processing and fork/join, but in the rest of this chapter (and in chapters 16 and 17), you see how to do it for method calls without writing swaths of boilerplate thread-manipulation code.
15.2 Synchronous and asynchronous APIs Chapter 7 showed you that Java 8 Streams give you a way to exploit parallel hardware. This exploitation happens in two stages. First, you replace external iteration (explicit for loops) with internal iteration (using Stream methods). Then you can use the parallel() method on Streams to allow the elements to be processed in parallel by the Java runtime library instead of rewriting every loop to use complex thread-creation operations. An additional advantage is that the runtime system is much better informed about the number of available threads when the loop is executed than is the programmer, who can only guess. Situations other than loop-based computations can also benefit from parallelism. An important Java development that forms the background of this chapter and chapters 16 and 17 is asynchronous APIs. Let’s take for a running example the problem of summing the results of calls to methods f and g with signatures: int f(int x); int g(int x);
For emphasis, we’ll refer to these signatures as a synchronous API, as they return their results when they physically return, in a sense that will soon become clear. You might 4
Etymologically, daemon and demon arise from the same Greek word, but daemon captures the idea of a helpful spirit, whereas demon captures the idea of an evil spirit. UNIX coined the word daemon for computing purposes, using it for system services such as sshd, a process or thread that listens for incoming ssh connections.
Synchronous and asynchronous APIs
367
invoke this API with a code fragment that calls them both and prints the sum of their results: int y = f(x); int z = g(x); System.out.println(y + z);
Now suppose that methods f and g execute for a long time. (These methods could implement a mathematical optimization task, such as gradient descent, but in chapters 16 and 17, we consider more-practical cases in which they make Internet queries.) In general, the Java compiler can do nothing to optimize this code because f and g may interact in ways that aren’t clear to the compiler. But if you know that f and g don’t interact, or you don’t care, you want to execute f and g in separate CPU cores, which makes the total execution time only the maximum of that of the calls to f and g instead of the sum. All you need to do is run the calls to f and g in separate threads. This idea is a great one, but it complicates5 the simple code from before: class ThreadExample { public static void main(String[] args) throws InterruptedException { int x = 1337; Result result = new Result(); Thread t1 = new Thread(() -> { result.left = f(x); } ); Thread t2 = new Thread(() -> { result.right = g(x); }); t1.start(); t2.start(); t1.join(); t2.join(); System.out.println(result.left + result.right); } private static class Result { private int left; private int right; } }
You can simplify this code somewhat by using the Future API interface instead of Runnable. Assuming that you previously set up a thread pool as an ExecutorService (such as executorService), you can write public class ExecutorServiceExample { public static void main(String[] args) throws ExecutionException, InterruptedException { int x = 1337;
5
Some of the complexity here has to do with transferring results back from the thread. Only final outer-object variables can be used in lambdas or inner classes, but the real problem is all the explicit thread manipulation.
368
CHAPTER 15
Concepts behind CompletableFuture and reactive programming
ExecutorService executorService = Executors.newFixedThreadPool(2); Future y = executorService.submit(() -> f(x)); Future z = executorService.submit(() -> g(x)); System.out.println(y.get() + z.get()); executorService.shutdown(); } }
but this code is still polluted by the boilerplate code involving explicit calls to submit. You need a better way of expressing this idea, analogous to how internal iteration on Streams avoided the need to use thread-creation syntax to parallelize external iteration. The answer involves changing the API to an asynchronous API.6 Instead of allowing a method to return its result at the same time that it physically returns to the caller (synchronously), you allow it to return physically before producing its result, as shown in figure 15.6. Thus, the call to f and the code following this call (here, the call to g) can execute in parallel. You can achieve this parallelism by using two techniques, both of which change the signatures of f and g. The first technique uses Java Futures in a better way. Futures appeared in Java 5 and were enriched into CompletableFuture in Java 8 to make them composable; we explain this concept in section 15.4 and explore the Java API in detail with a worked Java code example in chapter 16. The second technique is a reactive-programming style that uses the Java 9 java.util.concurrent.Flow interfaces, based on the publish-subscribe protocol explained in section 15.5 and exemplified with practical code in chapter 17. How do these alternatives affect the signatures of f and g?
15.2.1 Future-style API In this alternative, change the signature of f and g to Future f(int x); Future g(int x);
and change the calls to Future y = f(x); Future z = g(x); System.out.println(y.get() + z.get());
The idea is that method f returns a Future, which contains a task that continues to evaluate its original body, but the return from f happens as quickly as possible after the call. Method g similarly returns a future, and the third code line uses get() to wait for both Futures to complete and sums their results.
6
Synchronous APIs are also known as blocking APIs, as the physical return is delayed until the result is ready (clearest when considering a call to an I/O operation), whereas asynchronous APIs can naturally implement nonblocking I/O (where the API call merely initiates the I/O operation without waiting for the result, provided that the library at hand, such as Netty, supports nonblocking I/O operations).
Synchronous and asynchronous APIs
369
In this case, you could have left the API and call of g unchanged without reducing parallelism—only introducing Futures for f. You have two reasons not to do so in bigger programs: Other uses of g may require a Future-style version, so you prefer a uniform
API style. To enable parallel hardware to execute your programs as fast as possible, it’s
useful to have more and smaller tasks (within reason).
15.2.2 Reactive-style API In the second alternative, the core idea is to use callback-style programming by changing the signature of f and g to void f(int x, IntConsumer dealWithResult);
This alternative may seem to be surprising at first. How can f work if it doesn’t return a value? The answer is that you instead pass a callback 7 (a lambda) to f as an additional argument, and the body of f spawns a task that calls this lambda with the result when it’s ready instead of returning a value with return. Again, f returns immediately after spawning the task to evaluate the body, which results in the following style of code: public class CallbackStyleExample { public static void main(String[] args) { int x = 1337; Result result = new Result(); f(x, (int y) -> { result.left = y; System.out.println((result.left + result.right)); } ); g(x, (int z) -> { result.right = z; System.out.println((result.left + result.right)); }); } }
Ah, but this isn’t the same! Before this code prints the correct result (the sum of the calls to f and g), it prints the fastest value to complete (and occasionally instead prints
7
Some authors use the term callback to mean any lambda or method reference passed as an argument to a method, such as the argument to Stream.filter or Stream.map. We use it only for those lambda and method references that can be called after the method has returned.
370
CHAPTER 15
Concepts behind CompletableFuture and reactive programming
the sum twice, as there’s no locking here, and both operands to + could be updated before either of the println calls is executed). There are two answers: You could recover the original behavior by invoking println after testing with
if-then-else that both callbacks have been called, perhaps by counting them with appropriate locking. This reactive-style API is intended to react to a sequence of events, not to single results, for which Futures are more appropriate. Note that this reactive style of programming allows methods f and g to invoke their callback dealWithResult multiple times. The original versions of f and g were obliged to use a return that can be performed only once. Similarly, a Future can be completed only once, and its result is available to get(). In a sense, the reactive-style asynchronous API naturally enables a sequence (which we will later liken to a stream) of values, whereas the Future-style API corresponds to a one-shot conceptual framework. In section 15.5, we refine this core-idea example to model a spreadsheet call containing a formula such as =C1+C2. You may argue that both alternatives make the code more complex. To some extent, this argument is correct; you shouldn’t thoughtlessly use either API for every method. But APIs keep code simpler (and use higher-level constructs) than explicit thread manipulation does. Also, careful use of these APIs for method calls that (a) cause long-running computations (perhaps longer than several milliseconds) or (b) wait for a network or for input from a human can significantly improve the efficiency of your application. In case (a), these techniques make your program faster without the explicit ubiquitous use of threads polluting your program. In case (b), there’s the additional benefit that the underlying system can use threads effectively without clogging up. We turn to the latter point in the next section.
15.2.3 Sleeping (and other blocking operations) considered harmful When you’re interacting with a human or an application that needs to restrict the rate at which things happen, one natural way to program is to use the sleep() method. A sleeping thread still occupies system resources, however. This situation doesn’t matter if you have only a few threads, but it matters if you have many threads, most of which are sleeping. (See the discussion in section 15.2.1 and figure 15.3.) The lesson to remember is that tasks sleeping in a thread pool consume resources by blocking other tasks from starting to run. (They can’t stop tasks already allocated to a thread, as the operating system schedules these tasks.) It’s not only sleeping that can clog the available threads in a thread pool, of course. Any blocking operation can do the same thing. Blocking operations fall into two classes: waiting for another task to do something, such as invoking get() on a Future; and waiting for external interactions such as reads from networks, database servers, or human interface devices such as keyboards.
371
Synchronous and asynchronous APIs
What can you do? One rather totalitarian answer is never to block within a task or at least to do so with a small number of exceptions in your code. (See section 15.2.4 for a reality check.) The better alternative is to break your task into two parts—before and after—and ask Java to schedule the after part only when it won’t block. Compare code A, shown as a single task work1(); Thread.sleep(10000); work2();
Sleep for 10 seconds.
with code B: public class ScheduledExecutorServiceExample { public static void main(String[] args) { ScheduledExecutorService scheduledExecutorService = Executors.newScheduledThreadPool(1); work1(); scheduledExecutorService.schedule( ScheduledExecutorServiceExample::work2, 10, TimeUnit.SECONDS); scheduledExecutorService.shutdown(); } public static void work1(){ System.out.println("Hello from Work1!"); }
Schedule a separate task for work2() 10 seconds after work1() finishes.
public static void work2(){ System.out.println("Hello from Work2!"); } }
Think of both tasks being executed within a thread pool. Consider how code A executes. First, it’s queued to execute in the thread pool, and later, it starts executing. Halfway through, however, it blocks in the call to sleep, occupying a worker thread for 10 whole seconds doing nothing. Then it executes work2() before terminating and releasing the worker thread. Code B, by comparison, executes work1() and then terminates—but only after having queued a task to do work2() 10 seconds later. Code B is better, but why? Code A and code B do the same thing. The difference is that code A occupies a precious thread while it sleeps, whereas code B queues another task to execute (with a few bytes of memory and no requirement for a thread) instead of sleeping. This effect is something that you should always bear in mind when creating tasks. Tasks occupy valuable resources when they start executing, so you should aim to keep them running until they complete and release their resources. Instead of blocking, a task should terminate after submitting a follow-up task to complete the work it intended to do.
372
CHAPTER 15
Concepts behind CompletableFuture and reactive programming
Whenever possible, this guideline applies to I/O, too. Instead of doing a classical blocking read, a task should issue a nonblocking “start a read” method call and terminate after asking the runtime library to schedule a follow-up task when the read is complete. This design pattern may seem to lead to lots of hard-to-read code. But the Java CompletableFuture interface (section 15.4 and chapter 16) abstracts this style of code within the runtime library, using combinators instead of explicit uses of blocking get() operations on Futures, as we discussed earlier. As a final remark, we’ll note that code A and code B would be equally effective if threads were unlimited and cheap. But they aren’t, so code B is the way to go whenever you have more than a few tasks that might sleep or otherwise block.
15.2.4 Reality check If you’re designing a new system, designing it with many small, concurrent tasks so that all possible blocking operations are implemented with asynchronous calls is probably the way to go if you want to exploit parallel hardware. But reality needs to intrude into this “everything asynchronous” design principle. (Remember, “the best is the enemy of the good.”) Java has had nonblocking IO primitives (java.nio) since Java 1.4 in 2002, and they’re relatively complicated and not well known. Pragmatically, we suggest that you try to identify situations that would benefit from Java’s enhanced concurrency APIs, and use them without worrying about making every API asynchronous. You may also find it useful to look at newer libraries such as Netty (https://netty.io/), which provides a uniform blocking/nonblocking API for network servers.
15.2.5 How do exceptions work with asynchronous APIs? In both Future-based and reactive-style asynchronous APIs, the conceptual body of the called method executes in a separate thread, and the caller’s execution is likely to have exited the scope of any exception handler placed around the call. It’s clear that unusual behavior that would have triggered an exception needs to perform an alternative action. But what might this action be? In the CompletableFuture implementation of Futures, the API includes provision for exposing exceptions at the time of the get() method and also provides methods such as exceptionally() to recover from exceptions, which we discuss in chapter 16. For reactive-style asynchronous APIs, you have to modify the interface by introducing an additional callback, which is called instead of an exception being raised, as the existing callback is called instead of a return being executed. To do this, include multiple callbacks in the reactive API, as in this example: void f(int x, Consumer dealWithResult, Consumer dealWithException);
Then the body of f might perform dealWithException(e);
The box-and-channel model
373
If there are multiple callbacks, instead of supplying them separately, you can equivalently wrap them as methods in a single object. The Java 9 Flow API, for example, wraps these multiple callbacks within a single object (of class Subscriber containing four methods interpreted as callbacks). Here are three of them: void void void
onComplete() onError(Throwable throwable) onNext(T item)
Separate callbacks indicate when a value is available (onNext), when an exception arose while trying to make a value available (onError), and when an onComplete callback enables the program to indicate that no further values (or exceptions) will be produced. For the preceding example, the API for f would now be void f(int x, Subscriber s);
and the body of f would now indicate an exception, represented as Throwable t, by performing s.onError(t);
Compare this API containing multiple callbacks with reading numbers from a file or keyboard device. If you think of such a device as being a producer rather than a passive data structure, it produces a sequence of “Here’s a number” or “Here’s a malformed item instead of a number” items, and finally a “There are no more characters left (end-of-file)” notification. It’s common to refer to these calls as messages, or events. You might say, for example, that the file reader produced the number events 3, 7, and 42, followed by a malformednumber event, followed by the number event 2 and then by the end-of-file event. When seeing these events as part of an API, it’s important to note that the API signifies nothing about the relative ordering of these events (often called the channel protocol). In practice, the accompanying documentation specifies the protocol by using phases such as “After an onComplete event, no more events will be produced.”
15.3 The box-and-channel model Often, the best way to design and think about concurrent systems is pictorially. We call this technique the box-and-channel model. Consider a simple situation involving integers, generalizing the earlier example of calculating f(x) + g(x). Now you want to call method or function p with argument x, pass its result to functions q1 and q2, call method or function r with the results of these two calls, and then print the result. (To avoid clutter in this explanation, we’re not going to distinguish between a method m of class C and its associated function C::m.) Pictorially, this task is simple, as shown in figure 15.7.
374
CHAPTER 15
Concepts behind CompletableFuture and reactive programming
q1
x
r
p
q2
Figure 15.7 A simple box-andchannel diagram
Look at two ways of coding figure 15.7 in Java to see the problems they cause. The first way is int t = p(x); System.out.println( r(q1(t), q2(t)) );
This code appears to be clear, but Java runs the calls to q1 and q2 in turn, which is what you want to avoid when trying to exploit hardware parallelism. Another way is to use Futures to evaluate f and g in parallel: int t = p(x); Future a1 = executorService.submit(() -> q1(t)); Future a2 = executorService.submit(() -> q2(t)); System.out.println( r(a1.get(),a2.get()));
Note: We didn’t wrap p and r in Futures in this example because of the shape of the box-and-channel diagram. p has to be done before everything else and r after everything else. This would no longer be the case if we changed the example to mimic System.out.println( r(q1(t), q2(t)) + s(x) );
in which we’d need to wrap all five functions (p, q1, q2, r, and s) in Futures to maximize concurrency. This solution works well if the total amount of concurrency in the system is small. But what if the system becomes large, with many separate box-and-channel diagrams, and with some of the boxes themselves internally using their own boxes and channels? In this situation, many tasks might be waiting (with a call to get()) for a Future to complete, and as discussed in section 15.1.2, the result may be underexploitation of hardware parallelism or even deadlock. Moreover, it tends to be hard to understand such large-scale system structure well enough to work out how many tasks are liable to be waiting for a get(). The solution that Java 8 adopts (CompletableFuture; see section 15.4 for details) is to use combinators. You’ve already seen that you can use methods such as compose() and andThen() on two Functions to get another Function (see chapter 3). Assuming that add1 adds 1 to an Integer and that dble doubles an Integer, for example, you can write Function myfun = add1.andThen(dble);
CompletableFuture and combinators for concurrency
375
to create a Function that doubles its argument and adds 2 to the result. But box-andchannel diagrams can be also coded directly and nicely with combinators. Figure 15.7 could be captured succinctly with Java Functions p, q1, q2 and BiFunction r as p.thenBoth(q1,q2).thenCombine(r)
Unfortunately, neither thenBoth nor thenCombine is part of the Java Function and BiFunction classes in exactly this form. In the next section, you see how similar ideas of combinators work for CompletableFuture and prevent tasks from ever to have to wait using get(). Before leaving this section, we want to emphasize the fact that the box-and-channel model can be used to structure thoughts and code. In an important sense, it raises the level of abstraction for constructing a larger system. You draw boxes (or use combinators in programs) to express the computation you want, which is later executed, perhaps more efficiently than you might have obtained by hand-coding the computation. This use of combinators works not only for mathematical functions, but also for Futures and reactive streams of data. In section 15.5, we generalize these box-and-channel diagrams into marble diagrams in which multiple marbles (representing messages) are shown on every channel. The box-and-channel model also helps you change perspective from directly programming concurrency to allowing combinators to do the work internally. Similarly, Java 8 Streams change perspective from the coder having to iterate over a data structure to combinators on Streams doing the work internally.
15.4 CompletableFuture and combinators for concurrency One problem with the Future interface is that it’s an interface, encouraging you to think of and structure your concurrent coding tasks as Futures. Historically, however, Futures have provided few actions beyond FutureTask implementations: creating a future with a given computation, running it, waiting for it to terminate, and so on. Later versions of Java provided more structured support (such as RecursiveTask, discussed in chapter 7). What Java 8 brings to the party is the ability to compose Futures, using the CompletableFuture implementation of the Future interface. So why call it CompletableFuture rather than, say, ComposableFuture? Well, an ordinary Future is typically created with a Callable, which is run, and the result is obtained with a get(). But a CompletableFuture allows you to create a Future without giving it any code to run, and a complete() method allows some other thread to complete it later with a value (hence the name) so that get() can access that value. To sum f(x) and g(x) concurrently, you can write public class CFComplete { public static void main(String[] args) throws ExecutionException, InterruptedException {
376
CHAPTER 15
Concepts behind CompletableFuture and reactive programming
ExecutorService executorService = Executors.newFixedThreadPool(10); int x = 1337; CompletableFuture a = new CompletableFuture(); executorService.submit(() -> a.complete(f(x))); int b = g(x); System.out.println(a.get() + b); executorService.shutdown(); } }
or you can write public class CFComplete { public static void main(String[] args) throws ExecutionException, InterruptedException { ExecutorService executorService = Executors.newFixedThreadPool(10); int x = 1337; CompletableFuture a = new CompletableFuture(); executorService.submit(() -> b.complete(g(x))); int a = f(x); System.out.println(a + b.get()); executorService.shutdown(); } }
Note that both these code versions can waste processing resources (recall section 15.2.3) by having a thread blocked waiting for a get—the former if f(x) takes longer, and the latter if g(x) takes longer. Using Java 8’s CompletableFuture enables you to avoid this situation; but first a quiz.
Quiz 15.1: Before reading further, think how you might write tasks to exploit threads perfectly in this case: two active threads while both f(x) and g(x) are executing, and one thread starting from when the first one completes up to the return statement. The answer is that you’d use one task to execute f(x), a second task to execute g(x), and a third task (a new one or one of the existing ones) to calculate the sum, and somehow, the third task can’t start before the first two finish. How do you solve this problem in Java? The solution is to use the idea of composition on Futures.
First, refresh your memory about composing operations, which you’ve seen twice before in this book. Composing operations is a powerful program-structuring idea used in many other languages, but it took off in Java only with the addition of lambdas
CompletableFuture and combinators for concurrency
377
in Java 8. One instance of this idea of composition is composing operations on streams, as in this example: myStream.map(…).filter(…).sum()
Another instance of this idea is using methods such as compose() and andThen() on two Functions to get another Function (see section 15.5). This gives you a new and better way to add the results of your two computations by using the thenCombine method from CompletableFuture. Don’t worry too much about the details at the moment; we discuss this topic more comprehensively in chapter 16. The method thenCombine has the following signature (slightly simplified to prevent the clutter associated with generics and wildcards): CompletableFuture thenCombine(CompletableFuture other, BiFunction fn)
The method takes two CompletableFuture values (with result types T and U) and creates a new one (with result type V). When the first two complete, it takes both their results, applies fn to both results, and completes the resulting future without blocking. The preceding code could now be rewritten in the following form: public class CFCombine { public static void main(String[] args) throws ExecutionException, InterruptedException { ExecutorService executorService = Executors.newFixedThreadPool(10); int x = 1337; CompletableFuture a CompletableFuture b CompletableFuture c executorService.submit(() -> executorService.submit(() ->
= new CompletableFuture(); = new CompletableFuture(); = a.thenCombine(b, (y, z)-> y + z); a.complete(f(x))); b.complete(g(x)));
System.out.println(c.get()); executorService.shutdown(); } }
The thenCombine line is critical: without knowing anything about computations in the Futures a and b, it creates a computation that’s scheduled to run in the thread pool only when both of the first two computations have completed. The third computation, c, adds their results and (most important) isn’t considered to be eligible to execute on a thread until the other two computations have completed (rather than starting to execute early and then blocking). Therefore, no actual wait operation is performed,
378
CHAPTER 15
Concepts behind CompletableFuture and reactive programming
which was troublesome in the earlier two versions of this code. In those versions, if the computation in the Future happens to finish second, two threads in the thread pool are still active, even though you need only one! Figure 15.8 shows this situation diagrammatically. In both earlier versions, calculating y+z happens on the same fixed thread that calculates f(x) or g(x)—with a potential wait in between. By contrast, using thenCombine schedules the summing computation only after both f(x) and g(x) have completed. y
f(x)
g(x)
z
y+z
time
Figure 15.8 Timing diagram showing three computations: f(x), g(x) and adding their results
To be clear, for many pieces of code, you don’t need to worry about a few threads being blocked waiting for a get(), so pre-Java 8 Futures remain sensible programming options. In some situations, however, you want to have a large number of Futures (such as for dealing with multiple queries to services). In these cases, using CompletableFuture and its combinators to avoid blocking calls to get() and possible loss of parallelism or deadlock is often the best solution.
15.5 Publish-subscribe and reactive programming The mental model for a Future and CompletableFuture is that of a computation that executes independently and concurrently. The result of the Future is available with get() after the computation completes. Thus, Futures are one-shot, executing code that runs to completion only once. By contrast, the mental model for reactive programming is a Future-like object that, over time, yields multiple results. Consider two examples, starting with a thermometer object. You expect this object to yield a result repeatedly, giving you a temperature value every few seconds. Another example is an object representing the listener component of a web server; this object waits until an HTTP request appears over the network and similarly yields with the data from the request. Then other code can process the result: a temperature or data from an HTTP request. Then the thermometer and listener objects go back to sensing temperatures or listening before potentially yielding further results. Note two points here. The core point is that these examples are like Futures but differ in that they can complete (or yield) multiple times instead of being one-shot. Another point is that in the second example, earlier results may be as important as ones seen later, whereas for a thermometer, most users are interested only in the
Publish-subscribe and reactive programming
379
most-recent temperature. But why is this type of a programming called reactive? The answer is that another part of the program may want to react to a low temperature report (such as by turning on a heater). You may think that the preceding idea is only a Stream. If your program fits naturally into the Stream model, a Stream may be the best implementation. In general, though, the reactive-programming paradigm is more expressive. A given Java Stream can be consumed by only one terminal operation. As we mention in section 15.3, the Stream paradigm makes it hard to express Stream-like operations that can split a sequence of values between two processing pipelines (think fork) or process and combine items from two separate streams (think join). Streams have linear processing pipelines. Java 9 models reactive programming with interfaces available inside java.util .concurrent.Flow and encodes what’s known as the publish-subscribe model (or protocol, often shortened to pub-sub). You learn about the Java 9 Flow API in more detail in chapter 17, but we provide a short overview here. There are three main concepts: A publisher to which a subscriber can subscribe. The connection is known as a subscription. Messages (also known an events) are transmitted via the connection.
Figure 15.9 shows the idea pictorially, with subscriptions as channels and publishers and subscribers as ports on boxes. Multiple components can subscribe to a single publisher, a component can publish multiple separate streams, and a component can subscribe to multiple publishers. In this next section, we show you how this idea works step by step, using the nomenclature of the Java 9 Flow interface. Subscriptions
Component (an object)
Component
Component Subscribers
Publishers
Figure 15.9 The publish-subscribe model
380
CHAPTER 15
Concepts behind CompletableFuture and reactive programming
15.5.1 Example use for summing two flows A simple but characteristic example of publish-subscribe combines events from two sources of information and publishes them for others to see. This process may sound obscure at first, but it’s what a cell containing a formula in a spreadsheet does conceptually. Model a spreadsheet cell C3, which contains the formula "=C1+C2". Whenever cell C1 or C2 is updated (by a human or because the cell contains a further formula), C3 is updated to reflect the change. The following code assumes that the only operation available is adding the values of cells. First, model the concept of a cell that holds a value: private class SimpleCell { private int value = 0; private String name; public SimpleCell(String name) { this.name = name; } }
At the moment, the code is simple, and you can initialize a few cells, as follows: SimpleCell c2 = new SimpleCell("C2"); SimpleCell c1 = new SimpleCell("C1");
How do you specify that when the value of c1 or c2 changes, c3 sums the two values? You need a way for c1 and c2 to subscribe c3 to their events. To do so, introduce the interface Publisher, which at its core looks like this: interface Publisher { void subscribe(Subscriber