228 96 8MB
English Pages [179] Year 1993
CHESS A
N
D
MACHI NE INTUITION
GEORGE ATKINSON
44 0342748 0
CHESS AND MACHINE INTUITION
w ithdraw n
u n i v e ;:: ;t y c
i r o f ,c s h ir e LIBRARY. HATFIELD CAMPUS, HATHS.D. AL10&AD
CONTROL
15ViSb Q2
CLASS
\ '7 COLLECT.Op
otvi-
. -«ATU3
ITEM
Copyright © 1993 by Ablex Corporation All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, microfilming, recording, or otherwise, without per mission of the publisher. Printed in the United States of America
Library o f Congress Cataloging-in-Pubiication Data Atkinson, George W. Chess and machine intuition / George W. Atkinson. p. cm. — (Ablex series in artificial intelligence) Includes bibliographical references and index. ISBN 0-89391-901-2 1. Chess—Data processing. 2. Artificial intelligence. I. Title. II. Series. GV1449.3.A87 1993 794.1'72—dc20 93-28529 CIP
Ablex Publishing Corporation 355 Chestnut Street Norwood, NJ 07648
Contents
Preface
vii
1 Did Someone Say Ten Years?
1
An overview of the most conspicuous branch of artificial intelligence, machine chess, in which a breakthrough is always expected within a decade; a synopsis of the book.
2 The Rise of Mechanical Automata
15
Von Kempelen’s 1769 “Turk” takes on all comers at chess; Charles Babbage designs an Analytic Engine; Torres y Quevedo builds an electro-mechanical endgame machine.
3 B. P.
23
A group o f chess players and mathematicians at a secret British wartime facility build machines to crack ciphers and, for recreation, design chess machines. 4 Minimax 37 Assigning values to chess positions allows comparison o f alternative branches o f a move tree; Claude Shannon describes Alan Turing tests, and first programs run.
5 Brute Force
53
Amateur’s chess knowledge proves codifiable, the chessmaster’s does not; knowledge-based machines yield to brute-force computation; computer tournaments become a spectator sport. V
vi
CONTENTS
6 Human Intuition 73 Humans are poor calculators, but exploit brain's pattern recognition to play terrific chess; psychologists show expertise is result o f trained intuition. 7 Human Versus Machine 85 John Henry beats the steam drill, David Levy conquers CHESS 4.7, and Garry Kasparov outplays Deep Thought; calculating amateurs lose while intuitive masters win. 8 Custom-Built Hardware Chess machines on a chip and custom-built circuitry amplify brute-force capability.
103
9 Computable Subgames 119 Human theoretical knowledge o f chess increases through machine-assisted computation; exact endgame computations provide unexpected results. 10 Machine Learning 131 It ain’t smart if it always makes the same mistakes; some machines learn from experience; others induce rules from examples, and can acquire intuitive knowledge. 11 Machine Intuition 149 Computing devices coupled to an environment mimic neural systems for intuitive information processing and offer hope for knowledge-based chess machinery. Appendix A:
Chess Notation
Appendix B:
Torres y Quevedo’s Mating Algorithm
157
Appendix C: Recursive Programming and the Minimax Algorithm
161
References
163
Author Index
169
Subject Index
171
Preface
In the summer of 1956 a newspaper article about a chess game be tween a man and an “electronic brain” caught my attention. A group of scientists at Los Alamos had, according to the article, taught a ma chine called MANIAC to play a form of chess. Now this was a game I enjoyed immensely but, owing to lack of opponents, rarely found oppor tunity to play. My young imagination conjured up a vision of a think ing machine that could serve as an intellectual companion, and I was eager to learn more. But immediate follow-up was impossible: Los Alamos Scientific Laboratories, wrapped in national security, was quite inaccessible to a thirteen-year-old, however enthusiastic.1 That fall I went off to prep school near Philadelphia. On one of many Saturday excursions to the Franklin Institute, I arrived too early and, while wandering about waiting for the doors to open, happened on the Free Library across the street. There I found a book so enthralling that it colored my entire subsequent career. Just as the eleventh edition of the Encyclopedia Britannica (1910) can be considered a final attempt to encapsulate all human knowledge, Lord Bowden’s 1953 opus Faster Than Thought covered the entire field of automatic digital computa tion. The most exciting part of this treasure was Chapter 25, “Digital Computers Applied To Games,” contributed by Alan Turing. This chapter went well beyond mere speculation on the possibility of ma chine chess, for it included an algorithm and a sample game. In those days libraries were not equipped with photocopy machines, so it was necessary to return another Saturday with my notebook to copy Chap
1 But only a few years later I did write programs for MANIAC during its sojourn at the University of New Mexico on its way to the computer graveyard, and a decade after that I found my first postdoc employment at LASL. vii
viii
PREFACE
ter 25.2 Further weekends were happily spent playing chess against “paper machines” to work out improvements to the algorithm. Even today I recall my fascination with this approach to mechanizing chess, and my desire to meet the author. I am a Class B chessplayer (strong amateur), and thus fit Cleve land’s characterization: “Blunders are recognized at once, when pointed out, but in spite of resolution to avoid them, the same ones are committed over and over.” Chess machines I have known and used have provided me with much entertainment and will give me a good game, which is all one can hope for in a chess partner. I expect that in years to come people will create a still broader range of genial intellec tual companions, and that machine intuition will figure largely in their makeup. I have attempted to tell some of the fascinating stories of these innovations, of the emergence and development of some cur rent concepts, and of the remarkable people who contributed to their evolution. My intent in this work is to share my enjoyment of these creations with a general audience. With the ability to conceive of—and construct—ever more compli cated machinery, expectations have increased that we may yet compre hend, and perhaps even duplicate, some of our own mental abilities. In examining how we think, and act, and enjoy the complexities of prob lem solving in such activities as chess play, we obtain an increased appreciation of our own minds, and of potentialities for further under standing the mysteries of human thought—the most fascinating topic of a l l . George Atkinson University o f the South Pacific Suva, Fiji Islands [email protected]
2 At that time, no library user would ever dream of mutilating a book to steal pages; three decades later I revisited the Philadelphia Free Library to seek this book and was saddened by the report “missing, presumed stolen.”
chapter 1
Did Someone Say Ten Years?
Within ten years a digital computer will be the world’s chess champion, unless the rules bar it from competition. — Herbert Simon, 1957 Within ten years a computer program will beat IM David Levy in a match under tournament conditions. — Four Professors of Artificial Intelligence, 1968 In ten years headlines will be made when a mere human is able to win a chess game from a machine. — David Levy, 1978
Two minds compete in an engagement so complex that, save for certain endgame situations, exact knowledge is unattainable. In this limited arena of conflict, circumscribed by chessboard, pieces, and rules, it is never certain which direction a game might take. Both players study the position intently in their effort to understand its properties, to decide on appropriate strategies, and, perhaps, to discov er ways to outmaneuver each other. The prospect of outsmarting an opponent exerts such a powerful, universal appeal that chess has become much more than a perennially popular pastime. It can inspire passion. Enthralled by the game, amateurs often go on to become profession al players, opening theoreticians, endgame specialists, or chess prob lemists. More is written about chess than about all other games com bined. Its variety is so immense that centuries of intensive study and ever deepening analytical knowledge have not exhausted the game’s possibilities or diminished its attraction. Because of its reputation as an intellectual activity, chess has served as a vehicle for studying intelligence and knowledge acquisition. Researchers attempting to construct an artificial intelligence once felt that a demonstration of I
2
CHESS AND MACHINE INTUITION
machine chess would convince doubters that machine intelligence is possible. But is machine intelligence, even in the restricted arena of chess, possible? Deduction and logic alone do not suffice for good, much less brilliant, chessplay. It is informed guesswork—intuition—that deter mines the strategic direction and, ultimately, the outcome of a chess game. Whether rank amateur, club player, master, or even grandmas ter, a player faced with a chess position chooses moves by “feel.” When not hurried, a certain amount of calculation follows the selection of a plausible move before it is actually played. Yet even this process— examining a chain of potential moves and countermoves—relies on feel, on the player’s intuitive assessment of the balance of forces that will result at the end of the calculated sequence. Perhaps because players spend so much time thinking “if I move here, then my opponent will answer so, whereupon I can counter with . . . oops, no, that won’t do, but still there’s . . . ,” one imagines that chess consists mostly of calculation. Indeed, many players believe that masters are good at chess because they calculate rapidly. The opposite is true: Strong players consider fewer potential positions than do ama teurs. Somehow, the master sees more while calculating less. The club player can only watch in awe as a master overwhelms dozens of opponents during a simultaneous exhibition, moving quickly from board to board to select and play the right moves to maintain the positional balance. The master’s superior play is due to “sense of position,” an intuitive form of knowledge gained from experiencing a great variety of chess situations. Intuitive knowledge is perhaps the most important compo nent of expertise in any human discipline, yet intuition has been one of our least understood phenomena. Intuition is an automatic, uncon scious part of every act of perception. It is often associated with emo tion: an intuitive realization can “feel right” and evoke a sense of satisfaction or, equally, can elicit that sinking feeling of impending frustration. Intuition is nonverbal, yet trainable. Intuition is not part of present-day machine chess . Chess-playing programs are algorithmic—they follow a fixed set of rules. Most simply carry out a “brute-force” search, exploiting the computational speed of the computer to generate and evaluate multi tudes of potential positions by applying a set of rules that assign a score to each. Other automata use chess knowledge to decide which alternatives are relevant; moves that can be safely ignored to reduce the amount of search are again specified by another set of rules. Rulebased systems can be quite effective when a process can be formally described, butamong humans only novices and bureaucrats follow
DID SOM EONE SAY TEN YEARS?
3
formal rules. As proficiency and expertise increase, people’s actions are guided not so much by conscious application of rules as by intuition based on individual experience. The past few decades have brought a substantial shift in our com prehension of intuitive processes. Intuition is no longer regarded as mysterious or “psychic,” but is recognized as a fundamental part of day-to-day existence. Some of the mechanisms by which information is transformed in organic nervous systems are sufficiently well under stood to be modeled. Neuromimetic computers to emulate these pro cesses are now appearing, once again raising hopes of artificial intel ligence researchers that machine intelligence may be on the horizon. Development of computing machinery helped accelerate a mathe matical revolution. Until well into this century, mathematics was dominated by the analog notions of continuity reflected in the calculus, and engineers and physicists described the world in terms of infinitesi mals and differential equations. Notions of granularity began to find application in more and more disciplines. The physicist’s view of ener gy changed from the continuous to the quantum; evolutionary thought shifted from gradualism to a saltatory view; and with the rise of the digital computer and its automatic control of sequential processes, discrete mathematics gained enormous importance.'' The mechanical automata that came into vogue during the last two centuries hinted at the coming transition from analog to digital think ing. From the earliest stages of development, their designers sought ways to carry out ever more complex sequences of operations automatically. As complexity increased, attention shifted to building devices that could select among alternative action sequences; continuous mo tions became subordinate to step-by-step actions. Once mechanisms for discrete processes had been developed, it was easy to imagine the possibility of an artificial intelligence that could play chess. Near the end of the 18th century, Baron von Kempelen constructed the Turk, a showy mechanism that moved pieces about on a chessboard under (unseen) human control. Charles Babbage gave impetus to the auto mation of counting processes with his design of mechanical devices to carry out arithmetic operations under program control. He designed an “Analytical Engine”—the first automatic computer—and described a theory of digital computation broad enough to include intellectual games, “such as chess.” By the end of the 19th century, Leonardo Torres y Quevedo had developed electromechanical control devices, which he used to construct a machine that could win the chess end game of King and Rook against King. With the appearance of machines able to carry out discrete opera tions, it became inevitable that enciphering devices would be devel
4
CHESS AND MACHINE INTUITION
oped to substitute text characters automatically. By the Second World War, the Enigma cipher machine had become standard equipment for all German military branches. Properly used, such a device provides secure communication, but any lapse in cryptographic discipline such as use of a stereotyped message form can give a cryptanalyst an “entry” to decipher a message. A branch of the British Secret Service was established at Bletchley Park to extract intelligence from en ciphered signals. Recruits for this cryptanalytic effort were selected for their trainable intuition, their presumed ability to discover patterns in ciphertext that might provide a clue to the message lurking beneath. Those deemed most capable of noticing regularities in a cipher were mathematicians and (surprise!) master chess players. The chief eccentric in this rather oddball collection of cryptogra phers was Alan Turing, a mathematician who in 1936 had published On Computable Numbers, an examination of the limits of digital com putation and mathematical logic. Inspired by a Polish device that could “unbutton” Enigma messages by enumerating key combinations mechanically, Turing conceived of a machine to attack the cipher directly. He imagined an automaton that could run through all possi ble Enigma settings, testing for logical consistency when matching a guessed phrase with an enciphered message. In an epic collaborative effort, the Bletchley cryptographers designed, built, and tested an electromechanical device based on this idea. Machines began breaking machine-created ciphers. Intelligence gleaned from this work was of inestimable value in the conduct of the war, since it consisted of the enemy’s own thoughts, intentions, and plans. Extreme secrecy surrounded Bletchley activity, for it can never, never be admitted that a cipher is being read lest it be replaced. The master chess players and mathematicians working in Bletchley’s tense atmosphere found occasional relief from the ever urgent activity in chess games. Chess was a topic that could be discussed openly, and the chess players working on the automation of digital processes naturally considered the possibilities of an algorithm to select a chess move. During excited discussions they developed the ideas of forward search, evaluating at quiescent positions, and using the minimax algorithm to assess the relative merit of available moves. In the United States, Bell Laboratories brought together similar teams of dedicated and talented people to work on urgent wartime projects. Despite the secrecy, the establishments were not entirely disjoint: Alan Turing visited Bell Labs as part of the effort to share technology. During this visit he found opportunity to discuss artificial intelligence and machine chess with his intellectual counterpart, Claude Shannon.
DID SOM EONE SAY TEN YEARS?
5
Occasionally one finds an entire intellectual discipline described in a single, definitive work, which provides a valuable viewpoint, a sort of surveyor’s monument that marks a particular set of concepts at a specific time. Shannon’s enthusiastic 1950 paper Programming a Com puter for Playing Chess was more than a mere summary of machine chess. It emphasized the fascinating unsolved theoretical problems, alerted a wide audience to the possibilities of machine chess, and inspired a generation of chess programmers. Shannon’s paper contrasted the search and knowledge approaches to machine chess. A search-based device imitates the behavior of a beginning chessplayer who, knowing only the moves, gropes about systematically until time is called, and then selects the best continua tion found thus far. A full-width search program examines all possible sequences of moves and countermoves of a specified length and assigns a value to each terminal position. At every branch point in the move tree, the minimax algorithm selects the highest-valued alternative from among the (low-valued) leavings that a rational opponent will allow. The only chess knowledge is contained in the evaluation func tion that determines the value associated with a position. A knowledge-based device imitates the behavior of an amateur player who has learned a few rules-of-thumb. The program detects positional features such as mobility and development, and selects goals appropriate for that type of position. These might include maintaining material balance, controlling the center of the board, protecting the King, or developing pieces. Specific reasons impel each proposed move: A piece is defended because it is attacked; a pawn is advanced in order to control the center. Since most potential moves are not relevant to any goal, they need not even be considered and, unlike full-width forward search, only a few continuations are explored." The knowledge-based approach is intellectually much more appeal ing than a search-based scheme. If a program is to exhibit intelligent behavior, choosing good moves for the right reasons is preferable to blind stumbling. But an apparently great quantity of specialized chess knowledge for recognizing exploitable imbalances in a position had to be supplied somehow, and programmers soon realized just how difficult it is to codify chess knowledge. The obvious l ack of success in finding any practical algorithm to categorize chess positions dampened the initial optimism of the knowledge school, and brute-force forward search came to dominate machine chess. Machine play improved steadily, chiefly because advances in com puter hardware took place at an exponential rate; every few years both memory capacity and processing speed doubled. With the appearance of ever more capable programs came, inevitably, computer chess tour
6
CHESS AND MACHINE INTUITION
naments, a form of competition which enjoyed instant popularity. Unlike human tournaments, at which onlookers are urged to silence, discussion is animated, for computers never complain about kibitzers or noise level in the playing rooms. Programming teams raise partisan cheers when their programs choose strong moves and groan loudly at blunders and missed opportunities. Second-guessing the reasons be hind this action or that is every bit as lively as with other spectator sports. But chess is much more than just a sport. Thomas Henry Huxley saw in the game a microcosm for experiment in which the chessboard is the world and the rules are laws of Nature. Psychologists recognized an appealing research tool for studying memory, learning, and percep tion, for chess contains a seemingly limitless variety of pattern within the limited number of variables desirable for experimental work. Moreover, chessplay is a knowledge-rich activity that requires thou sands of hours of concentration to develop competence. Early studies by Binet and Cleveland focused on the meaning found by chess players in their games; investigations by de Groot and the team of Chase and Simon attempted to uncover the nature of chess expertise by contrast ing expert and novice performance. The vast performance difference between master and amateur is a consequence of trained perception. Without conscious effort, a master notices features in a newly shown position that mark it as belonging to some familiar category, which calls to mind general techniques of dealing with that type of position. Chess skill is a result of the intuitive experience brought to the act of perception. Only those players famil iar with the relevant features (whatever they might be) are likely to discover the appropriate board action. A chess player misses an “obvi ous” move because no pattern is recognized that suggests it; to the club player who lacks this intuitive knowledge, features that serve to cate gorize a position are apt to remain stubbornly invisible. W hile watching people learn to play chess, psychologists noticed how intuition, and in particular intuitive pattern recognition, is trained. In every area of expertise, the path from novice to expert starts with the conscious application of a few rules-of-thumb sufficient to enable the trainee to begin accumulating experiences.Instructional material is based on the assumption that the trainee is able to general ize from examples to formulate more rules. The trainee learns to recognize and deal with exceptions to the rules; these metamorphose into more refined rules, with new exceptions. With gain of experience a “feeling” develops for the cases in which the now highly specialized rules can be applied. By exploring a variety of situations and discover
DID SOM EONE SAY TEN YEARS?
7
ing relationships, the unusual gradually becomes familiar and the trainee gains confidence. Then suddenly, during ordinary practice, something extraordinary occurs: activity that required conscious application of rules can be carried out automatically, without conscious attention. The perceptual process has changed: “know that” has been transformed into intuitive “know how.” Skilled chess perception, like other forms of expertise, is obtained only through practice, through examination of thousands of patterns. The process of recognizing types of positions, which calls to mind plans and playing methods associated with them, is mostly nonverbal. Chess intuition is taught in the same way other expert skills are imparted: by tutorial example. Advice is given in the form of rules (when you notice this pattern, consider trying that plan) which, when sufficiently exer cised, become assimilated into the trainee’s perceptual process. From the start, chess competition between people and machines has been a contest between intuitive pattern recognition and brute-force calculation. The first tournament game played by a computer—a loss to an expert—took place in 1967. Within three months the program, called Mac Hack Six, was awarded a trophy in an amateur tournament for obtaining two wins and two draws. Other programs began to com pete in human tournaments, at first as curiosities, but soon it became apparent that the better programs could serve as worthy opponents for the lower-rated players in the reserve section. Master play was another matter. A wager between International Master David Levy and four professors of Artificial Intelligence sus tained interest in machine chess over a ten-year period. The professors backed with £1000 the proposition that within a decade Levy would lose a match to a computer. This long-term wager was based on opt imism rather than a realistic appraisal of odds, and over the interven ing years estimates of the outcome fluctuated greatly as machine performance improved along with Levy’s understanding of brute-force style, strengths, and weaknesses. The wager was settled in August 1978 when Levy scored three wins, a draw, and a loss in a six-game match against reigning World Computer Chess Champion CHESS 4.7. Levy continued to offer himself as a target for chess programmers, and machine chess continued to improve. Programs running on gen eral-purpose computers were beaten by those using supercomputers. Supercomputers were superseded by machines with custom hardware, which first achieved master-level chess play. With them came the beginning of a renaissance in the application of specific chess knowl edge, which in a search-based machine is entirely contained in the
8
CHESS AND MACHINE INTUITION
evaluation function. With the increased parallelism that comes with add-on circuitry, more can be done at each step of the search. A much more complex evaluation function is possible, one more able to deal with discrete cases by applying chess knowledge appropriate to the particular board situation. With their increased speed, these formi dable machines could routinely, as part of process of determining a single move in a tournament game, examine more potential board positions than a human master does in a lifetime. Levy was beaten in a 4—0 rout, and brute-force machinery could now compete with grand masters. In 1989, after more than two decades of human-machine chess competition, a match between the reigning human and machine world chess champions was played. Garry Kasparov lacked Deep Thought’s computational power, but Deep Thought had the greater lack—that of expert intuition. Kasparov’s play was so overwhelming that he felt it unnecessary to employ the recommended anticomputer strategy of avoiding tactical scrambles while exchanging pieces to simplify rap idly to the endgame. It is the handling of the endgame that separates masters from amateurs; even from an equal position, a master can almost always outplay a club player, and it is here that brute-force forward search is most obviously inferior to the master’s specialized knowledge. Curi ously, the greatest contribution of computers to chess knowledge has been in the endgame. The increase in endgame knowledge came through retrograde enu meration, or “maximin.” In its simplest form, maximin is a process of constructing a table that includes all legal positions that can occur in a particular endgame such as KRKN (King and Rook vs. King and Knight). Starting with terminal positions, that is, those in which a reduction in material results in a known outcome, a brute-force back ward search is undertaken in which best-move predecessors to already categorized positions are computed and entered in the table until the status of every position has been ascertained. Afthough the KRKN exhaustive table contains fewer than three million legal positions, this is still far too many for a human to consid er worth examining. Prior to exhaustive enumeration by computer, it had been widely supposed, even by masters, that KRKN is a dead draw. The tables show, however, that the Rook side can force a win from most positions, and they even include the moves that force the win most rapidly as well as the replies that best prolong the game. Perfect play is possible— if one could only remember that exhaustive table that contains an optimal move for every possible position. Exhaustive tables have now been computed for a number of chess endgames once considered beyond human analysis, such as KBBKN.
DID SOM EONE SAY TEN YEARS?
9
In all but the simplest endgames perfection is not to be expected in over-the-board play. The reason is simple: No explanation of the opti mal play seems possible save for the unilluminating “ look in the move table.” The information contained in these endgame databases is not in the form of concepts understandable by human chessplayers, but ongo ing studies by endgame specialists are gradually producing new strate gies which will surely find their way into tournament play. Although computational chess enjoyed ever-greater success, all ma chines were burdened with the same defect: inability to learn from mistakes. Indeed, in one important sense, no computer ever learns. When a finite-state machine driven by a fixed algorithm finds itself in the same state as on some previous occasion, it will take the same action. Until recently, a human duffer could beat even a very capable chess program by simply repeating move for move any game the program had lost before. Random selection from among the lines in a program’s opening book makes replay of a game less likely, but vari ability is not the same as learning. Nor can rote memorization be considered learning. Since very few board positions are worth remem bering, and these cannot be reliably identified, merely recording posi tions and their estimated values is as useless to the computer as memorization of material without comprehension is to the human. A chess player recognizes continuations that past experience has shown unsatisfactory and avoids them in favor of more promising alternatives. With the simpler game of checkers as his experimental testbed, Arthur Samuel spent decades investigating how a computer also might improve its play with experience. The machine’s inability to profit from its mistakes might be remedied by adjusting weights in the evaluation function according to the results of play. A change of behavior—a sort of learning—takes place and, if the altered parame ters are recorded, an improvement in play could persist beyond the next power outage. Parameter-adjustment learning employs a set of board features such as “relative piece advantage” for potential inclusion in a linear evaluation polynomial. The idea is to generalize on experience after each move by adjusting evaluation coefficients and, occasionally, re placing an irrelevant feature with one of the reserves. But without trustworthy performance criteria, strong and weak moves are indis tinguishable, the scoring polynomial is modified erratically, and bad habits quickly replace good ones. Samuel eventually treated position evaluation as a patternrecognition task. To distinguish position types, he developed “signa ture tables” indexed via combinations of feature values for easy subdi vision by case. Samuel constructed his tables with the help of a large sample of positions together with master moves. He found that an
10
CHESS AND MACHINE INTUITION
evaluation function that used a signature table could handle nonlinear interactions among features, and was superior to the best linear poly nomials. But independent learning remained a dream. Modification of signa ture tables according to the results of play also requires reliable perfor mance criteria, and forming parameters from board features depends on human estimations of relevance. Samuel considered the inability to generate new parameters—to induce rules from examples—a major defect of his program. The behavior of computer programs could change, but the only real learning was by the human beings studying them. Although no convincing demonstration was produced that com puters could not learn, they clearly did not. and acquisition of intuition by an algorithm-driven machine appeared quite remote until the late 1970s. Inductive reasoning—formulating general laws from particular cases—and its part in scientific discovery have long intrigued philoso phers, who were more interested in justifying this (illogical) activity than in discovering how the process might take place. The psychologist Earl Hunt commented that an intelligent device must adapt by classi fying slightly different environmental states as equivalent or not. and decided to mimic this process by algorithm. He wanted to find some way to generate decision rules, rules for recognizing membership in classes which are known only through their samples. This ability would be useful to a chess machine, for if a position is recognized as belonging to a particular type, an appropriate evaluation function can be selected. A concept, to Hunt, is a category of situations that have some combination of features in common. He developed a concept-learning system that would produce a decision tree when supplied with a (precategorized) training set. A decision tree is a compound classification rule that is logically equivalent to a conditional expression in a pro gramming language, and can serve as a program to classify new material. Hunt’s student Ross Quinlan improved the algorithm and used it to induce a rule for determining if. in the KRKN endgame with side-to-move specified, a given position is drawn. His algorithm could generalize from a few examples drawn from the space of three million KRKN positions to produce a rule that classified all positions correctly. Quinlans automatically synthesized rule, though complete and correct as verified by exhaustive test, was formulated in Martian terms, for it contained combinations of features quite unlike those used by chess players to describe a position. However useful as an embodiment of information, it was incomprehensible to chess masters and thus use less as knowledge.
DID SOM EONE SAY TEN YEARS?
11
Concepts are not based on some fixed set of attributes, but are established piecemeal during processes of perception and recognition. The philosopher Ludwig Wittgenstein suggested that a previously unobserved object is classified according to its resemblance to familiar objects, which are linked by a network of “overlapping and crisscross ing” similarities. To better represent the intuition employed by organ isms, some sort of network structure must supplant the too-simple decision tree. An animal’s brain tissue is organized as networks of often densely interconnected neurons. In the 1940s, neurologists Warren McCulloch and Walter Pitts studied how a network might process information, and showed that an ensemble of threshold gates could carry out com putations. John von Neumann proved that multiple data paths en abled a net to carry out arithmetic or logical operations with arbitrary accuracy. Further studies of reliable processing with unreliable compo nents showed how an ensemble of neurons in an organic brain might provide full function even when individual neurons are fatigued, or damaged. Donald Hebb proposed a mechanism by which a network might learn. He supposed that simultaneous activity at both ends of a neural pathway would strengthen the connection and argued that a network containing adjustable connections would automatically adapt to, and learn, patterns of signals. This notion guided Frank Rosenblatt in the 1950s as he developed the perceptron, a network in which each computational element adds weighted inputs and signals whenever the sum exceeds a threshold. A grid of photocells representing a retina supplied input signals; another layer of gates would react to patterns it remembered. In supervised learning, the trainer presents a series of input patterns and adjusts the weights of gates that contributed to wrong answers. As weights are tweaked, the number of misclassifications gradually decreases, and once trained, a perceptron responds instantly, an impossibility for a computer program that must be executed step by step. The connectionist approach to understanding (and imitating) the biocomputer produced other schemes for learning by adjustment of connection strengths. In “relaxation,” weights of the best solution found thus far are altered slightly in the hope that an even better solution will appear. A trained relaxation network can, when pre sented with a partial pattern of inputs, guess the states of the remain der. With back-propagation, an input pattern is presented to a multi layer network, and the difference between actual and desired responses changes not only output layer weights, but also those of earlier layers. Training a fixed network constitutes but one component of skill
12
CHESS AND MACHINE INTUITION
acquisition; the other involves reorganization of the network itself. In the 1960s, John H. Holland explored methods to improve a network’s performance by systematically altering its structure. He articulated the “genetic algorithm”—a machine embodiment of Darwin’s “descent with modification”—in which a population of programs, each repre senting a solution to a specified problem, is augmented by new solu tions composed of parts of the best solutions. Each solution is tested for fitness: The well-adapted participate in further competition, the lessfit are discarded. Although recombination is random, the genetic algo rithm is not a random search. Holland showed that parent solutions selected in proportion to their fitness produce a near-ideal population of fit offspring. Connectionist and genetic machines are educated rather than pro grammed; training of the intuitive machine is chiefly by example. Still, artificial intelligence enthusiasts were unable to imitate the kind of intelligence necessary to get along in the real world. Though they could easily train devices to discriminate among combinations of sig nals, distinguishing relevant from irrelevant input proved enormously difficult to mechanize. It had been presumed that training could take place in isolation, detached from the environment in which the ma chine must function. Instead of being guided by an algorithmic set of rules, a learning entity could be immersed in an environment, such as a chessboard with an active opponent, in which it must discover appro priate ways to behave. The intuitive chess machine forms its own chess concepts through active exploration, and, like the human chess player, discovers new structures—concepts—in a configuration of pieces. John Holland suggested that a message-passing mechanism might support concept formation within a network. He envisioned a classifier system that accepts messages from the environment, matches them with rule templates, and generates additional messages, which are tested against further templates. From time to time, the system issues output messages that change the environment. Selection of matches for further processing is based on specificity of the template and a measure of usefulness called strength, which serves as specie that can be transferred among message-matching rules. Each rule acts as a player in a Monopoly game. Upon selection, credit is transferred from winning rules to those that led to their invocation. Additional credit is distributed among recently-active rules when reinforcement comes from the environment. Finally, genetic operators modify existing rules to create new, often more specific rules, which join the competition for invocation. The KRK endgame, as Torres y Quevedo found, is an appealing microworld for experiment. Instead of following a prescriptive algo
DID SOM EO N E SAY TEN YEARS?
13
rithm, the intuitive machine apprehends for itself how the constraints imposed by the presence of the pieces change with each move. It discovers how King and Rook can coordinate their actions in a dance of zugzwang that confines the lone King to ever-smaller regions. Training is carried out with the connivance of an active opponent. Feints show the trainee which forces are at work on the board. The trainer shows how the uncrossable barrier that results from the Rook’s presence vanishes when its King blocks control of the file. And the tutor is always ready to deliver the lesson that the Rook, if left un protected, will be snapped up. Perhaps the most interesting problem is how to introduce passion. To explore, to try out ideas just for the adventure of discovering what happens, the intuitive chess machine must be self-motivated. It must exhibit a zest for play. A player’s enjoyment of a game comes in little jolts as ideas fall into place. Something clicks; understanding comes in a rush; there is a convulsion of pleasure. To acquire the “instinct for mastery” remarked on by Freud, the unpassioned machine must some how become an impassioned machine. Organic nervous systems in which thinking takes place are so much more complex and wonderful than previously imagined that past pre dictions of breakthroughs in artificial intelligence could not have been met. Enthusiasts now point to many gaps in knowledge that have been filled and, despite an unbridged chasm here and there, express confi dence that neuromimetic devices with richly interconnected networks of processing elements will soon produce inorganic thought. An intu itive chess playing machine is surely just around the corner. (Did someone say ten years?) Now to the narrative itself. The path is convoluted, but the going is easy and the scenery enjoyable. Let us stroll. . . .
chapter 2
The Rise of Mechanical Automata
People have long found mechanisms fascinating and have taken great delight in exploring ideas for autonomous machinery. From Archi medes to Leonardo da Vinci, visionaries have conceived of marvelous devices that in an infant technology could only remain dreams. As metalworking techniques matured, designers perfected clockwork mechanisms of ever-increasing precision and complexity. One of the first applications of the new technology was the creation of clockwork toys. The complexity achieved in mechanical toymaking can be seen in the cam-driven automaton on display in the Franklin Institute that guides a doll’s hand through hundreds of predetermined motions to produce pen and ink sketches. In view of our fascination with autonomous devices, it seems almost inevitable that attempts would be made to embody game-playing in clockwork and, ultimately, to construct mechanisms that play chess. Baron Wolfgang von Kempelen built the first and most famous mechanical chess-playing machine in 1769 to entertain the Vienna Imperial Court of King Joseph II. The device consisted of a life-size mannequin in robe and turban to match the Viennese stereotype of the inscrutable, exotic Oriental. The “Turk” sat before a cabinet seemingly crammed with clockwork and topped by an inlaid chessboard. An ingenious ensemble of mechanical linkages permitted showy motions and gestures. The mannequin could nod, shake its head, roll its eyes, lean forward to peer at the board as if in concentration, and reach out with its left hand to grasp and move the pieces. Most astonishing, the Turk’s play was superb: Even against stiff competition the machine won most games. Von Kempelen had proven himself a superb engineer; now he learned to play the role of showman. The spectacle of a machine outperforming humans in an intellectual activity attracted no end of curious spectators eager to pay premium admission for the opportunity 15
16
CHESS AND MACHINE INTUITION
to observe this marvel. Now just as few believe that a magician really “pulls a rabbit out of thin air,” it is doubtful that anyone thought von Kempelen’s “automaton” was truly autonomous: The mystery was how a human operator might be concealed within the mechanism, how the board might be observed, and above all how such high caliber of play could be sustained under such uncomfortable conditions. Following the tradition of successful magicians through the ages, von Kempelen wisely refrained from revealing the details of his machine; instead, he provided his audiences plenty of opportunity to speculate. At a typical performance, von Kempelen would have the Turk wheeled onto a stage. With illumination provided by a candle held behind each compartment, the various doors of the cabinet would be opened one after another to reveal the clockwork. After this inspection, the bottom drawer would be opened and the chess pieces taken out and set up. When started, the Turk emitted the sounds of machinery in motion. Grinding and scraping of gear-works and squeaks from imper fectly lubricated bearings masked any sound that might be produced by a concealed operator. As one player after another retired in defeat, von Kempelen would allow more and more cabinet doors to remain open, but he was always careful to keep one section or another closed so that his audience had the impression that a player inside was continu ally changing position during play. Some speculated the diminutive master player would view the board through a peep-hole in the body of the Turk, perhaps with the help of mirrors. Others supposed the concealed player might follow the game by watching magnets beneath each square of the chessboard react to the presence of the iron chessmen above. Since midget chess masters must have been in very short supply, it is likely that the human player remained outside the machine. As in a mind-reading act, a chess master in the demonstration area probably signalled moves to a small-statured operator hidden within the machine. The Turk was to remain an immensely popular attraction for more than fifty years. Von Kempelen took his machine on tour and gave performances in dozens of cities throughout Europe. After his death in 1804, the Turk was acquired by Johann Maelzel, an accomplished showman who found a way to increase the dramatic effect of the presentation. He introduced a rope barrier to separate spectators and machine and had the Turk’s opponents play on a separate board; the transfer of moves from board to board permitted natural opportunities for inserting dramatic commentary. In Vienna in 1809, Maelzel achieved perhaps the greatest public relations coup of his career when, on three occasions, Napoleon played against the Turk, losing handily each time.
THE RISE OF MECHANICAL AUTOMATA
17
During the following years, Maelzel toured central Europe with his machine. His players were always the best obtainable: Wherever the Turk appeared, the local master was somehow nowhere to be found. In 1826 Maelzel took the Turk to America, where the machine proved just as popular as in Europe. Edgar Allan Poe published his best guesses of how the Turk might be controlled. (His surmises were in many re spects quite wrong, and in part hilarious. He argued, for example, that the Turk could not be mechanical because it was, on occasion, fallible.) In 1837, William Schlumberger, Maelzel’s regular player for most of the American tour, died, and with him died Maelzel’s desire to contin ue. Deprived of a good showman and supporting player, the Turk spent the remainder of its years behind glass in Philadelphia’s Chinese Museum. It was destroyed by fire in 1854. Although the Turk and several imitators were under direct human control, these devices inspired further efforts to develop machinery that could autonomously exhibit intelligent behavior. The most ambi tious of these enterprises was the design of computing machinery by the visionary (and notoriously eccentric) English mathematician Charles Babbage, who decided that tedious arithmetic would be the best candidate for automation. In about 1820, disgusted with the poor quality of the numeric tables that formed the basis of much of the engineering and navigation of his day, Babbage turned his attention to the problems of their verification and correction. Since tabular data were hand-calculated and tran scribed by bored clerks and typesetters, published tables were often riddled with errors. Babbage recalled an evening’s labor comparing two tables produced by different “computers” and, exasperated by finding many discordancies, found himself wishing that one could calculate by steam. Now Babbage was aware that the calculus of finite differences could provide a powerful tool for discovering errors in a table. This approach exploits the “smoothness” of an analytic function by calculating a sequence of differences between successive table entries to obtain a new sequence that is (usually) even smoother. A jump or discontinuity in the sequence of differences can pinpoint an error in the original sequence, such as a transposition of digits in one of the entries. This process can be applied again to compute differences of the differences; higher-order difference sequences tend to be more nearly constant. The technique also can be used to calculate additional table entries from a few known values. Since only simple arithmetic is necessary, Babbage felt he could automate table generation if only he could devise an appropriate mechanism to carry out addition and subtraction.
18
CHESS AND MACHINE INTUITION
Over the next two years Babbage occupied himself with the design of a device he called the “Difference Engine.” He decided to use toothed wheels to represent and store numbers because of the relative sim plicity of the carry mechanism (Pascal and Leibniz had also used toothed wheels in their calculators). Development was excruciatingly slow, for he had to design and fabricate his own parts. His machine tools were inadequate, and Babbage had to arrange for the manufac ture of some parts by others. He was careful to use different workmen so that his idea would not be stolen. Since ordinary mechanical drawings of the parts of the Difference Engine could not show the states of the machine during a calculation, Babbage found himself pressed to devise a notation that could clarify logical sequences of actions. Along with his increasing understanding of calculating engines, his notation evolved into a powerful formal tool for describing complex sequential operations. Babbage’s notation, “geared” for mechanical equipment, proved to be less applicable to later electromechanical and electronic control systems and was super seded when Boolean algebra was applied to switching systems. In 1822 he completed a prototype able to calculate two orders of differences with a precision of six decimal places. The w orking model helped him to attract the support of influential people and he peti tioned the government for a grant to build a more ambitious difference engine. With support from the Royal Society, Babbage won govern ment approval for his project the following year. Funds were allocated for the development of a difference engine with a capacity of twenty decimal places and six orders of differences. To eliminate transcription errors, Babbage decided to add an output device to the proposed engine to provide for automatic generation of stereotype plates for printing the computed tables. Babbage engaged Joseph Clement, a capable engineer and highly skilled mechanic, to oversee the machining of the parts and the assem bly of the Difference Engine. Construction proceeded slowly, for special machine tools had to be designed and fabricated. Not all development costs could be billed to the Treasury and Babbage expended a substan tial amount of his private fortune. Disputes over money multiplied, and in 1832 Clement stopped work on the Engine, discharged his workmen, and refused to deliver any parts or drawings. The Treasury eventually met Clement’s demands and the drawings and parts were released, but not the precision tools developed for machining them. Babbage declared himself willing to continue work but refused to invest any more of his own funds in the project. Slow progress, cost overrun, and change of political climate had all contributed to cessa tion of government funding and the project was abandoned.
THE RISE OF MECHANICAL AUTOMATA
19
In the meantime Babbage’s grasp of control principles had pro gressed to such an extent that he really would have preferred to redesign the Difference Engine along improved lines. Although he lacked funds to construct a more advanced calculating engine, he continued to experiment and improve his designs. In 1834, for exam ple, he worked out a mechanical implementation for fast addition now known as a look-ahead carry. The solution of the fast-carry problem marked the beginning of an extraordinarily productive two-year period during which Babbage achieved significant advances in understanding control and informa tion processing. His paper of 1837 described an “Analytical Engine” configured in the same four functional units (control, storage, arithme tic, and input/output) that we see in the modern stored-program digital computer. The Analytical Engine was to be controlled by instructions encoded on punched cards in the same way that the Jacquard loom controlled patterns of weave. His proposed peripheral devices also seem quite modern: Besides the card reader, Babbage designed a card punch, a printer, an engraver for output on copper plate, and even a plotter. Seeing no source of funds in Britain for construction of an Analyti cal Engine, Babbage attempted to stimulate interest abroad. No longer concerned about theft of his ideas, he journeyed to Turin to discuss machine computation with an audience of competent scientists. He emphasized the ability of his Engine to carry out conditional opera tions, that is, to select alternative actions depending on the results of a previous calculation by moving the instruction cards forward or back ward the requisite number of steps. A young mathematician called M enabrea took notes of these discussions, which when published—in French, not in English—comprised the first significant article on ma chine computation. Just before cessation of construction, a portion of the Difference Engine had been assembled and demonstrated. A marvel of precision engineering, it is still in perfect working order and can be seen in the Science Museum in London. One of those who had the privilege of turning its handle was Augusta Ada Lovelace, daughter of the poet Byron. She attended one of the early demonstrations and was quick to recognize the potential of machine calculation. Ada came to play an important role in Babbage’s life and work. She translated Menabrea’s article and followed Babbage’s suggestion to add a few explanatory notes. Her notes grew with her enthusiasm as she added examples and clarifications and soon the addendum was three times longer than the original article. The translation, together with Ada’s notes, constitutes a remark-
20
CHESS AND MACHINE INTUITION
able document that describes the art of computer programming a century before machinery that could execute a program existed. She included examples of mathematical calculations and explained in some detail the process of programming. Even more important to the devel opment of computational theory, Ada’s annotation emphasized the generality of the Analytical Engine as a symbol-manipulating device. To emphasize that computational power encompasses a great deal more than just arithmetic, she suggested that, just as a Jacquard-loom weaves flowers and leaves, the Analytic Engine might weave algebraic patterns (Menabrea, 1982). Babbage had designed his Analytical Engine for arithmetic opera tions, but he realized that the control principles he had discovered would have much wider application. After investigating non-numeric problems that might be solved by the Engine, he decided that every game of skill is susceptible of being played by an automaton. His diary notes: “After much consideration I selected for my test the contrivance of a machine that should be able to play a game of purely intellectual skill successfully such as . . . chess” (Babbage, 1864). He analyzed strategies for playing various games with the purpose of specifying programs that could drive his automaton. To answer the question of selecting the best move for any possible position, Babbage proposed a chaining scheme. His program would first test the legality of the current position (thereby establishing whether the game is already won or lost). Next it would direct the automaton to examine legal moves to check for an immediate win, then for an imminent loss to be averted, then for a move that threatens a win in two different ways, and so forth. His strategy included a randomizing method for choosing among equally good moves to ensure variety in his automa ton’s play. While devising this strategy, however, he was surely think ing of noughts-and-crosses (tic-tac-toe), and thus also pioneered in underestimating the computational complexity of the game of chess. Although no Analytical Engine was ever built, Babbage’s ideas an d achievements had a profound influence on the development of calculat ing machinery. Babbage himself explained the logical processes under lying the operation of his engine to George Boole, who was to develop the Boolean Algebra now used to describe the logic of digital circuitry. The automata of Leonardo Torres y Quevedo used ideas first developed in the Babbage calculating engines. Still other important contribu tions to computational theory remained dormant for nearly a century until revitalized and developed further in Turing’s classic 1936 paper on computability. About 1890 the Spaniard Leonardo Torres y Quevedo built the first of a series of mechanisms that could (autonomously) play a subgame of
THE RISE O F MECHANICAL AUTOMATA
21
chess against a human opponent. He had already established himself as a successful designer of electromechanical devices that used feed back loops for automatic control. Perhaps his best known invention was a torpedo that, after launch, would automatically maintain depth and heading. Pleased with the self-regulating and thus seemingly intelligent behavior of his torpedo, Torres investigated other activities that would require intelligence if done by humans, yet might be imitated by mechanical devices. He chose the KRK (King and Rook against King) chess endgame as his candidate. This subgame, although complicated enough to be nontrivial, did not appear too complex to be achievable, t King and Rook against King is the least complex chess endgame. It is covered in a single page of Reuben Fine’s Basic Chess Endings (1941), which includes a diagram of a position requiring sixteen moves to mate. Fine’s treatment of the KRK endgame does not include a stepby-step procedure showing how to mate; instead, starting from the diagrammed position, he explains how the lone King can be herded to one side of the board and mated. He illustrates with several alterna tives how positions can be created that constrain the movement of the King. Finally, he warns of the possible stalemate positions. A human following these examples has little trouble understanding, and win ning, KRK endgames. The automaton could not, however, learn from examples. The first difficulty was the formulation of an algorithm, or formal step-by-step procedure, that would guarantee a win from all legal starting positions. Torres swiftly discovered that programming is not an easy task. To avoid the complications of board reflections and rotations, he decided to force the lone King to the first rank for the mate rather than to the nearest side of the board; he simplified the problem further by assuming that the two Kings were already on opposite sides of the rank controlled by the Rook. His final algorithm assumed a fixed starting position for the automaton’s King and Rook, but allowed the human opponent’s King to be placed on any unchecked square in the first six ranks. With these simplifications, Torres was able to specify a set of rules sufficient to effect mate, and precise enough to be applied by a mechanical device. To ensure an uncompli cated set of rules (and consequent simplicity of the mechanical portion of the automaton), he assumed no diagonal moves by the automaton’s King. The resulting algorithm (see Appendix) is not very efficient; with cunning choice of starting square and best delaying tactics, the human opponent can postpone mate for 61 moves (and thus could claim a draw by the fifty-move rule). Apart from this defect, the algorithm is complete and correct, and guarantees a win.
22
CHESS AND MACHINE INTUITION
Having found a demonstrably correct mating procedure, Torres was able to proceed to the more interesting engineering problems of design ing and building a mechanism to carry out this procedure. He never bothered to improve the algorithm, but instead made sporadic im provements to his basic machine, adding “bells and whistles” to pro vide a more impressive demonstration. Operation starts with the White King and Rook in their starting positions on the metal chessboard atop the table. The human adver sary places the Black King on the selected initial square and the automaton makes its first move. An electromagnet moves under the board to draw the piece over the smooth metal surface. Each square of the board consists of three metal plates separated by rubber insulators; when centered on a square, the Black King’s metal base makes contact with all three plates to complete circuits that identify both horizontal and vertical position. If the human attempts an illegal move, a “first mistake” light goes on and play stops until a legal move is chosen. Another false step triggers a “second mistake” light. A third illegal move must be an attempt to cheat—the machine refuses to continue. Each time the Rook checks to force the Black King one step closer to the edge, a phonograph produces a spoken “jaque al rey” (check to the King) and, if the Black King is already on the bottom rank, adds a triumphant “mate." Although long since surpassed by microprocessor-driven chess ma chines, the automaton is still on display at the Polytechnic University in Madrid. It remains in perfect working order and stands ready to demonstrate its mating prowess: a monument to the engineering skill of Leonardo Torres y Quevedo.
»
chapter 3
B. P.
By 1938 Commander Alastair Denniston had become convinced of the inevitability of war with Germany. As director of the Government Code and Cypher School, the only British agency dealing with cryp tographic matters, he considered it his duty to prepare for the coming explosion of wartime signals intelligence analysis. As continuing grim news from the continent reduced public hope that Britain could suc ceed in staying out of war, he felt himself driven by an ever-increasing urgency. No ordinary bureaucrat, Denniston had risen through the cryp tographic ranks. During the Great War, he had worked in “Room 40” of the Admiralty. This organization, staffed mostly by civilians recruited from universities and schools, had succeeded in reading a great variety of encoded wireless and cable intercepts. The cryptographic activities of Room 40 had continued at a reduced level after the Armistice. In a 1920 reorganization, Denniston’s group was removed from direct con trol of the Navy and made part of the Foreign Office. Their charter was to study methods of cipher communication used by foreign powers, and to advise on the security of British codes and ciphers. The new organi zation was named the Government Code and Cypher School. Although the staff was by later Cold War standards quite small, the Government Code and Cypher School (GCCS) played a significant role in the politics of the 1920s and 1930s. With only thirty civilian cryp tographers (called Assistants) and fifty service personnel consisting of clerks and typists, GCCS cryptanalysts routinely read communica tions of Russia, Italy, and Japan. Despite these codebreaking suc cesses, very little effort had been expended on German cryptosystems. A major reason for this lapse was economic. The Treasury kept a tight hold on its purse strings, insisting on limiting GCCS staff to thirty assistants and fifty clerks. From time to time a few additional service personnel were authorized as six-month temporaries, and Den23
24
CHESS AND MACHINE INTUITION
niston found himself spending more and more time pleading with the Treasury for an increase in staff to meet the ever-growing volume of work. Only in 1937 did the Treasury begrudge him an increase in cryptographers, but the eight new recruits were sorely needed to deal with the growing workload of Japanese and Italian intercepts. The real reason for the GCCS neglect of German communications was technological. In 1937 it had become apparent that the Reichswehr, the Kriegsmarine, very likely the Luftwaffe, and almost cer tainly the SS were using slightly different versions of the same cipher system. This system was based on the Enigma cipher machine, which had been marketed in the 1920s and was in use at a number of banks. GCCS analysts had broken a primitive version of this machine cipher used by the Germans, Italians, and Spanish Nationalists, but the advanced versions of Enigma employed by the various German organi zations appeared unbreakable. Since his expectation of success determined Denniston's allocation of scarce personnel resources, he focused GCCS efforts on crypto systems that offered some promise of successful attack. Almost all his senior cryptographers had come from the original Room 40 personnel. Although familiar with traditional cryptosystems, they were illprepared for the challenge of ciphers generated by machine and had little idea of how to proceed against the complexities of Enigma. By 1938 the Enigma machine had become the chief problem of the British intelligence community. It was clear that enciphered wireless communication would be of utmost importance in modern mechanized warfare and that an ability to read enemy signals could have a consid erable, if not decisive, effect on the outcome of the coming war. Al though he doubted that Enigma could be defeated, Denniston could foresee the necessity of a rapid expansion of the GCCS to handle analysis of the German message traffic and to take advantage of lapses in cryptographic discipline. He devoted himself to preparing for this expansion. Denniston’s major challenge was the recruitment of a pool of poten tial cryptanalytic talent. He was well aware that the formal method of advertisement and credential evaluation commonly used for filling government positions would, besides attracting undesirable attention, be unlikely to produce the caliber of people needed. He decided that, as in the recruitment of the original Room 40 personnel, informal referral would be essential and that the universities of Oxford and Cambridge should serve as the primary sources of cryptanalytic trainees. He selected Bletchley Park, a Victorian country mansion about halfway between the universities, for the wartime home of the GCCS. It fulfilled several additional conditions Denniston thought desirable.
B. P.
25
Several acres of surrounding grounds provided isolation, and it lay a comfortable distance from nearby towns, yet close to a main-line rail way junction providing direct connection to London’s Euston Station fifty miles away. As part of the site preparation, Denniston arranged for construction of the single-story “huts” that would provide work space for a rapidly expanding signals-intelligence organization that would ultimately employ thousands of men and women. Recruitment began with polite notes to selected university dons, asking whether they would be willing to serve should war break out. Respondents were approached quietly and asked to recommend further talented people; these in turn would be discreetly interviewed, and additional potential candidates would be named. Recruiters simply followed the threads of the existing old-boy network of the English educated middle class. Of course they could not reveal the intended use of the talents they sought, but guarded hints during private discus sions made the object of the search perfectly clear: bright chaps for most-secret work. Denniston’s recruiters must have used a curious list of qualifica tions to select people with potential cryptanalytic talent. Recruits came predominately from the ranks of mathematicians, chess players, and winners of national crossword puzzle competitions. Denniston understood that chessplayers tend to make good cryptographers; the combinatorial element of chess is similar to that of cryptanalysis and both activities depend on trained intuition, the ability to recognize patterns within specific contexts. Of course anyone who expressed interest in cryptography was likely to attract the attention of Den niston’s talent scouts, and it was probably a chance enthusiastic re mark that early on brought the mathematician Alan Turing into the net. Turing was a fellow of King’s College in Cambridge whose interests included a long-time fascination with automata. In 1936 he had writ ten “On Computable Numbers” (1937), a refutation of Hilbert’s view that any mathematical problem can be solved by a fixed and definite process. As part of his proof, Turing defined an elegantly simple auto matic machine and proved that any computation that can be done by an automaton also could be carried out by his “Universal Turing Machine.” He had thus proved the universality of the digital computer and deepened the understanding of automatic computation achieved by Babbage a century earlier. In the summer of 1938, a crash course in cryptography at GCCS headquarters directed Turing’s talents toward machine ciphers. In July of 1939 a GCCS contingent including chief cryptographer Dillwyn Knox attended a meeting of French and Polish cryptographic
26
CHESS AND MACHINE INTUITION
experts at a hide-out near Warsaw. At this meeting the Polish cryp tographers revealed an astonishing breakthrough: They had deduced not only the wiring of the Enigma rotor system, but also the procedures used to generate individual message keys. The Poles also demon strated the mechanical aids they had developed for recovering the daily key from a sufficiently large sample of message traffic. These included a set of perforated charts and a mechanical contraption called a “Bomba” (because of its loud ticking during operation) consisting of six electromechanical equivalents of an Enigma—one for each possible combination of rotors—ganged to step in parallel through all key combinations. Most important, they supplied Polish-made working copies of the secret German version of the Enigma machine. By the summer of 1939 the Bletchley Park grounds had been fenced and the first work huts for the cryptographers were complete. In August, just a few weeks before war was declared, the Government Code and Cypher School moved to Bletchley under tight security. The perimeter guards taken from a nearby RAF regiment were told their purpose was to keep the “inmates” of this purported lunatic asylum from wandering away. Denniston’s security-minded staff cautioned the inmates to avoid direct reference to Bletchley and cryptography. GCCS became the “Golf Club and Chess Society” with the official address Room 47, Foreign Office, Whitehall, London; to the many who worked there, it was simply “B.P.” Recruitment of potential cryptanalysts continued at a feverish pace, with emphasis on attracting first-rate mathematicians and outstand ing chess players. The outbreak of war in September 1939 overtook the British chess contingent at the International Team Tournament in Buenos Aires. The team had done well in the preliminary rounds and had qualified for the final, but opted for an immediate return to England. Stuart Milner-Barry recounted a thirteen-day unconvoyed voyage from Argentina on a blacked-out ship. The safe return of the chess team added Milner-Barry and the reigning British champion Hugh Alexander to Denniston’s talent pool. In view of this concentra tion chess proficiency, newcomers to B.P. were well advised not to play chess for money. Denniston’s careful preparation permitted an immediate start on signals traffic analysis. Aided by the cryptographic windfall provided by the Poles, the Bletchley cryptographers began regular recovery of Enigma keys. The continuing urgency of reading the signals enciph ered by Enigma resulted in a tense, but cooperative, atmosphere at B.P. In his memoir on Hugh Alexander, Milner-Barry expressed this intensity in terms of chess: “For both Hugh and myself it was rather
B. P.
27
like playing a tournament game (sometimes several games) every day for five and a half years.” The first source of worry was the indicator system used by the Germans to specify the rotor setting used for encipherment. The Polish Bomba exploited a weakness of the indicator system to gain an “entry” to an Enigma cipher. The genial idea of using a machine to unbutton a machine-generated cipher could be countered at any time by substitut ing another form of indicator, which would render the Bomba useless. The cryptographers felt a pressing need for a more general way to reconstruct the initial rotor setting. The “probable-word” approach seemed promising. If a cryptanalyst could correctly guess a word or phrase and its exact position in a message— often possible with stereo typed military communications—a machine could, in principle, simply work through the possible initial rotor settings until it found one that transformed ciphertext into the guessed plaintext. A new complication arose: the addition of a jumbling plugboard to Enigma resulted in a proliferation of possible encipherments that made such a naive bruteforce enumeration impossible. Alan Turing found a way to circumvent the combinatorial explosion by using the notion of logical consistency. For assumed plugboard and rotor settings, the ciphertext characters could be fed into the rotor circuits, and the output tested for consistency with the assumed proba ble word. He saw that a contradiction in the logical implications of the assumed settings could eliminate billions of possible plugboard config urations, and realized he could beat the combinatorial explosion by exploiting the logical properties of the cipher system. With a test for logical consistency, it was not necessary to enumer ate every combination of plugboard hypothesis and rotor position. Turing could show that, as in mathematical logic, a single contradic tion implies the truth of any proposition and hence all other hypothe ses also must be self-contradictory. If the position of the rotors were correct, then either the plugboard hypothesis also would be correct and would lead to no contradiction or it would be incorrect and would lead to every possible plugboard assertion except the correct one. The cir cuitry need only test if exactly one or exactly twenty-five of twenty-six lines were active. Gordon Welchman saw a possibility of improving the Turing design and found an elegant way to generate even more logical implications with the help of a simple piece of circuitry. With this addition, any assertion about a potential plugboard pairing would produce such a proliferation of implications that a very strong consistency condition would be imposed by even a short probable phrase.
28
CHESS AND MACHINE INTUITION
Construction of the British “Bombe” was not simple. The machine had to work through an average of a half million rotor positions in hours, which meant that the consistency test had to be applied some twenty times per second. Electromagnetic relays were used to recog nize consistency and to signal when the logical constraints were satis fied for halting the process. A prototype Bombe became operational in May of 1940, and others with the Welchman improvement soon followed. They were impressive and beautiful machines, clicking away like a battery of knitting nee dles as the relays worked through proliferating implications. A Bombe could not fully automate the decipherment process, for a “stop” could arise by chance, especially with a short probable word. When the consistency conditions were met and a halt occurred, the rotor position had to be tested on an Enigma to see if it turned the rest of the ciphertext into German. Yet once the daily setting had been solved, the entire message traffic could be converted directly into plaintext. This was not just the breaking of individual messages, but of an entire communication system—thousands of messages every day in each network. The sheer mass of decrypted messages could not in itself provide comprehensive, or even useful, intelligence. To make sense of the wealth of information contained in individual Enigma decrypts, related information from prior messages and corroborating intelli gence from noncryptographic sources had to be considered. Clues found in message texts needed cross checking for possible fit within a larger mosaic, and repeated determinations of the potential significance of each item were required. To exploit this intelligence, relevant information had to be quickly relayed to the appropriate commanders in the field. The significance of any message had to be clearly and simply expressed and explanatory comments had to be provided. Above all, great care had to be taken in the use of this intelligence to conceal its cryptanalytic source, for any suspicion that a cipher system might be compromised could lead to its replacement and a possibly disastrous information blackout. To protect this “most precious secret,” the head of British intelligence, Colonel Stewart Menzies, created a specialized organization called Ultra that was solely responsible for communicating information obtained through Enigma decrypts. Special Liaison Units would brief field com manders personally upon receiving reports sent in unbreakable onetime-pad ciphers. Transformation of Enigma intercepts into useful intelligence re quired not only individual talent and imagination, but a flexible orga nization. GCCS quadrupled in size during the first sixteen months of
B. P.
29
the war, its rapid growth outrunning the abilities and experience of the administrators. The assorted collection of mathematicians and chessplayers at B.P. lived up to their reputation as an oddball, Bohe mian lot; to a great extent they simply ignored the remnants of the Room 40 administrative structure. They spontaneously formed their own dynamic organization, a loose collection of independent, yet coop erating, groups. This informal organizational structure evolved to match the ever-changing situation, and often reflected the structure of the German institution under scrutiny. The Naval Enigma group that evaluated U-Boat communications and plotted the paths of U-Boats and convoys adopted the same structure as the tracking and monitor ing facilities of the German U-Boat command. Bletchley’s flexibility permitted rapid reallocation of interception, decryption, and evalua tion resources to respond to changing German strategies and organi zation. Denniston’s organization took on a life of its own, becoming what Churchill called a “creative anarchy.” It could no longer be adminis tered along Room 40 lines. Officious emphasis on procedure was a constant annoyance to people who recognized ability rather than rank and to whom “approval chains” meant only unnecessary and undesir able delay. In the autumn of 1941 a crisis was reached and the cryp tanalysts rebelled: Turing, Welchman, Alexander, and Milner-Barry broke all rules by writing directly to Churchill to set forth their chief concerns. Response was immediate, and B.P. requirements were soon being met on a priority basis. The head of the British Secret Service, Stewart Menzies, was partic ularly miffed at this insubordination. For some time he had been concerned about lukewarm efforts to organize Bletchley into the inte grated signals intelligence factory he thought necessary for the mass production of Enigma decodes. Menzies felt that as long as Denniston remained in charge, B.P. would remain a loose collection of cliques, and soon compelled Denniston to relinquish the directorship to secondin-command Edward Travis, who ran Bletchley Park for the remainder of the war. Denniston was posted to London to work on diplomatic ciphers. With this demotion, all hopes for a knighthood were dashed, and Denniston remained embittered until his death. His friend and admir er, the American cryptologist William Friedman (who won acclaim for deducing the internal wiring of the Japanese PURPLE cipher ma chine), in a letter to Denniston’s daughter deplored “that so few should know exactly what he did.” Friedman was surely thinking of Den niston’s cryptanalytic successes rather than his arguably greater lega
30
CHESS AND MACHINE INTUITION
cy: the achievement of bringing together this particular collection of creative anarchists whose contributions to technology would transcend cryptology. As a natural leisure activity, chess remained an important pastime at B.P. It was a subject that could be freely discussed off duty. In the midst of developing machines to automate logic, it is not surprising that conversation would turn to speculations on machine chess, nor that I. J. Good would be a participant. “Jack” Good had strength in both chess and mathematics: He had won the Cambridgeshire chess championship and held the post of research mathematician. He was also interested in automata and had written an article on mechanized chess playing for the house magazine of the Cambridge mathematics students. His first eighteen months at Bletchley were spent in the section headed by Turing, for whom he acted as statistical assistant. Chess was a frequent topic during their off-hours discussions. Turing’s interest in chess extended well beyond recreation. In “Com putable Numbers” (1937), he had proved there could be no “definite process” for deciding whether a sequence of steps could be found to establish a mathematical truth. He considered the analogous question of whether there might be a “definite method” for playing chess, that is, a set of rules that could be followed by an automaton. His real concern was the nature of intelligence in human and machine. Mathe maticians and chess players clearly carry out intelligent steps in pur suit of goals, even though there may be no mechanical method to specify a sequence of steps leading to any particular goal. The notion of steps was playing an increasingly important role in mathematics and physics. Mathematics had been dominated by contin uous, smooth representations that permit concise descriptions of physi cal phenomena with differential equations such as those of Maxwell, which portray electromagnetic fields. But discontinuous, jumpy de scriptions of the world are also useful: Quantum mechanics assumes that electromagnetic radiation is released as discrete packets, or quanta. The rise of discrete mathematics began with a mid-nineteenth cen tury work of George Boole that bore the pretentious title “An Investi gation of the Laws of Thought” (1854/1960), which defined an algebra for manipulating logical variables restricted to two discrete values. Discrete phenomena were soon noticed everywhere. One of the contri butions of “Computable Numbers” dealt with the representation of numbers in terms of computational steps. Cryptography involves the substitution of one string of discrete symbols for another. Even chess play is discontinuous, for one must select either one move or another with quite different consequences. ’
B. P.
31
The minimax1 principle was formulated in terms of chessplay. A game tree can be produced from any given position by forming branches to those positions immediately reachable by legal moves. If these successors are at all comparable, the obvious procedure for move selection is to take the branch that leads to the best immediate posi tion. A human player also considers an opponent’s possible responses to a move and counters to these counters. A rational choice of move, and the best that one can expect, is the most favorable alternative among the least favorable leavings that a capable opponent acting in self-interest will allow. A recursive procedure—one that refers to itself as part of its definition—for calculating the value of a position is easy to formulate (see Appendix C). After swapping board sides (and doing some book keeping to ensure that the process won’t go on forever), a chess pro gram simply calls on itself to play the role of the opponent. Turing’s theoretical work on computability dealt largely with recursive func tions, and the recursive character of the minimax process would seem quite natural to him. Indeed, the idea of recursively extending a move tree to reach easily evaluated quiescent positions seemed so obvious that Jack Good thought it was not worth publishing and dissuaded Turing from writing it up. At the end of 1941, a new rotor in a fixed position was added to the Naval Enigma, providing a 26-fold increase in the number of combina tions. The classic cipher-clerk blunder of repeating a message in old and new systems allowed the British analysts to deduce its wiring, and they were able to break three days of message traffic, but at the cost of six Bombes working for seventeen days. Extending the Bombe by adding a fourth rotor that would spin through its 26 positions at high speed was not practical: Electro mechanical relays were just too slow. Electronic components would be needed for the high-speed logical operations. T. H. Flowers, in charge of the switching group at the Post Office Research Station, joined the Bletchley engineers to design electronic relays. The Flowers team demonstrated the use of hot cathode gas discharge tubes as switching elements, and soon established themselves as leading exponents of electronics at B.P.
1 The term “minimax” was coined later in connection with the game theory of von Neumann and Morgenstern in which one player seeks to minimize over a payoff matrix” while an opponent attempts to maximize. In the game trees of chess, minimax refers to selecting the move that minimizes the opponent’s maximum potential gain, or, equivalently, finding the maximum of the minima left by the opponent. To compound the confusion, the process of finding predecessor nodes of a game tree was called maximin.
32
CHESS AND MACHINE INTUITION
In 1941, Bletchley analysts began to investigate another form of cipher traffic called Fish. This cipher was markedly different from the Enigma signals, for it was transmitted in teleprinter code rather than in Morse code. A teleprinter provides a completely automated system, with no human intervention between keypress at the transmitter and printing of a character at the receiving end. A happy guess that a certain Fish message had been transmitted twice, the second time with a one-character offset of the key, provided an entry that permitted recovery of both plaintext and key. In a breakthrough comparable to the Polish achievement with Enigma, W. T. Tutte discovered that the automatically generated Fish key was not random, but contained discernable patterns.2 M. H. A. Newman, who had been a Lecturer in Mathematics at Cambridge since 1924, directed the attack on Fish. It had been one of Newman’s lectures on Hilbert that had inspired Turing (who took Newman’s phrase “a purely mechanical process” to mean “something that could be done by machine”) to develop the ideas of Computable Numbers. Now the teacher found himself using statistical tools devel oped by his former student in an extension of Tutte’s work. Newman conceived of an approach that could be automated, but to test his ideas, new machines for rapid counting would have to be developed. Turing traveled to the U.S. at the end of 1942 as part of a coopera tive Anglo-American cryptanalytic effort. In Washington, he ex plained the British procedures to the American cryptanalysts and Enigma rotor settings soon passed back and forth across the Atlantic as rapidly as they were discovered. After this liaison, Turing spent two months at Bell Laboratories in New York City, where for recreation he worked on speech encipherment systems. Some of his most fruitful discussions were with his intellectual counterpart Claude Shannon, who had been working at Bell Labs since 1941. The work of Shannon curiously parallels, and complements, that of Alan Turing. Turing had shown that a universal computing machine could function with just two symbols; within a year of publication of “Computable Numbers,” Shannon completed his Master’s thesis at MIT on the formal description of switching circuits with the two-state logic developed by George Boole. Before the war, Shannon and Turing had each spent a year at Princeton, and each had worked with John von Neumann. Both found cryptology fascinating and had investigated how cryptanalysis might be automated. 2 Tutte later stated “It was at Bletchley Park that I first acquired some standing as a mathematician. In 1942 I found myself elected as a Fellow of Trinity, though only one or two of the electors can have known what it was for.”
B. P.
33
Shannon had studied methods of encoding messages that might ensure reliable transmission over an unreliable (noisy) channel, and had come to the startling conclusion that, with proper encoding, per fect transmission of information can be achieved no matter how noisy the channel. He had written up preliminary results in 1940, but pub lication of his study was delayed until 1948, when it appeared in the Bell System Technical Journal under the title “A Mathematical Theo ry of Communication.” Shannon showed that information can be ex pressed in terms of uncertainty and can be quantized with the help of a fundamental unit, the bit. Even more surprising, he was able to formu late mathematical equations relating information to other physical variables. This single work made the abstract nature of information concrete and formed the basis of a new science: Information Theory. Like Turing, Shannon was interested in machine intelligence and in the possibilities of automating the game of chess. Their shared enthu siasm quickly brought Shannon into the Bletchley machineintelligence family as an associate member. Turing showed him “Com putable Numbers,” and he found the idea of a universal machine fascinating. Shannon had long been interested in possibilities of a machine that could imitate the brain; he had studied neurology as well as mathematics and logic. Wartime security could not inhibit their enthusiastic speculations on artificial intelligence for, unlike their secret work, these subjects could be freely discussed in the Bell Labs cafeteria. Hugh Alexander took over Naval Enigma in Turing’s absence, and soon proved a much more capable administrator. The first high-speed four-wheel Bombe was completed in June 1943, but by August the Americans were producing more and better Bombes; they took over the U-Boat work entirely by the end of the year. Now, with electronic engineers joining the cryptologists at B.P., the attack on Fish began in earnest. Donald Michie and Jack Good became Newman’s first staff in his work on Fish. Michie was one of the few cryptanalysts who was neither chess player nor mathematician (he claimed to have been recruited by mistake), but he had the distinction of having won a major scholarship at Oxford’s Balliol College, in classics. Applying Turing’s statistical ideas of “weight of evidence” and “sequential analysis,” he soon learned enough mathematics to play a valuable role in developing the Bletchley machines. Turing declined the invitation to play a direct part, for he was busy with a speech encipherment project at Hanslope Park. He still took time to confer with Michie, who continued to refine the Turing approach. By the beginning of 1943, a certain amount of the Fish traffic was being read regularly.
34
CHESS AND MACHINE INTUITION
The Post Office engineers installed the first electronic counting machine in Hut F, where Newman and his assistants worked, about April 1943. They called this mechanism “Heath Robinson” after the cartoonist who, like his American counterpart Rube Goldberg, drew absurd machines. Other Robinsons followed, but all proved too unreli able to be useful beyond research. They did, however, demonstrate the feasibility of Newman’s approach and prepared the way for the devel opment of Colossus. The Robinsons were plagued by problems with the mechanisms that had to read, and synchronize, the two paper tape inputs. Even before the first Robinson was finished, Flowers had made a revolutionary proposal that solved the synchronization problem. The idea was to store the Fish key-patterns internally in electronic form, thus elim inating one of the tapes. This would, however, require extensive use of electronic valves. He designed a new machine containing the colossal number of 1500 valves (whence the name Colossus), far more than any electronic application hitherto attempted. By conventional wisdom it was impossible to make such a large assemblage of components work reliably in concert for long periods; Newman was unable to obtain official support from the B.P. administration for such an impossible project. Flowers obtained authorization through the Post Office Research Station at Dollis Hill and used the top priority granted by Churchill to requisition the equipment, even though it consumed half the resources of the Post Office laboratories. Construction began in February of 1943. During the next eleven months, the machine was assembled, wired and trouble-shot in separate sections. Secrecy demanded that no one but the designers be permitted to see all the parts. Flowers and his engineers completed installation of the first Colossus at Bletchley in December of 1943. The all-electronic machine was not only faster, but it proved much more reliable than the Robinsons. It included several features that were important for later developments of electronic computers, in particular a clock signal to synchronize operations throughout the machine, thereby eliminating cumulative timing errors. The clock controlled two-state circuits so that the machine could be exercised at arbitrarily slow speeds and single-stepped for test purposes. Using the new electronic Colossus, Jack Good and Donald Michie soon discovered that by manually reconnecting the wired logic pro cesses while the machine was in operation and observing the resulting output, they could often guess which test would be most effective for matching a given piece of cipher pattern with the text. Not sur prisingly, Good characterized this interactive analysis and decision
B. P.
35
making as similar to that of playing chess. Later versions of Colossus included hard-wired facilities to automate the process of varying the piece of pattern. Acts of decision well beyond the simple stops of a Bombe could now be carried out without human intervention; the result of one counting process would determine what Colossus did next. What had been pure speculation—machines making intelligent deci sions automatically—was now electronic reality. Decision trees resem bling those of the mechanical chess-playing schemes were written to specify actions of the machine operators. This meant that some of the work of the intelligent analyst had been replaced by automatic pro gram steps in the Colossus. The principles of digital computation formulated by Babbage in 1837 and developed further in Turing’s 1936 paper had finally been implemented in working electronic hardware. Development of ever more complex electronic circuitry continued for the rest of the war. The Mark II Colossus, which became operational five days before D-Day, contained a logic switching panel on which Boolean functions could be selected to control the logical operations. The additional counters and registers in the Mark II brought the total number of valves to about 2400. Although externally programmed, the Colossi included many features of the modern stored-program comput er; the only missing component was read-write memory. Since the type of processing used to unravel Fish could be carried out without additional storage, this final step to a universal machine was not taken. Curiously, despite outstanding mathematical ability and intense interest in chess, Turing was an absolute duffer at the game. Harry Golombek, who had returned from Argentina with the British chess team but spent two years in the infantry before joining B.P., occa sionally played chess with Turing, giving Queen odds in order to make the game more equal. Even then, he always won. After one resigna tion, Golombek was able to turn the board around and win from the hopeless position. Donald Michie recalled:
Turing was very interested in chess and primarily from the point of view of mechanising it. In Bletchley, either people didn’t play chess at all or they were chess masters because of the means of recruiting and I was the only other person in the place who was bad enough at chess to give him a level game. We got into the habit o f meeting in a pub once a week to play chess and he and I.J. Good and I used to discuss what we now call AI, almost exclusively in the game playing context. A number of us got quite inspired by Turing’s interest in that field and went on to work in that area, but it wasn’t a feasible thing to do until maybe the late 1950s because of lack of adequate hardware. (Michie, 1983)
36
CHESS AND MACHINE INTUITION
Michie noted specifically that the ideas spawned at these meetings included the generation of a look-ahead tree, backing up by minimax, using an evaluation function to assign values to terminal nodes of the game tree, and the notion of cutting off the look-ahead process at quiescent positions. Although he attributed these developments to Turing, it is difficult to apportion credit for ideas that crystallize gradually during individual and group discussions. Turing and Shan non were central participants in a remarkable group of cryptographers who were responsible for the genesis of modern machine chess. With the end of war came a wide dispersion of the Bletchley person nel. Max Newman accepted the Chair of Pure Mathematics at Man chester; he took Jack Good with him, and initiated the calculating machine laboratory at Manchester University. Turing joined the Na tional Physical Laboratory to design a general-purpose electronic com puter dubbed ACE (Automatic Computing Engine— in honor of Charles Babbage). Michie returned to Oxford; when computing power became more generally available, he made significant contributions to artificial intelligence, particularly in areas of machine induction and computer chess, and became Professor of Machine Intelligence at Edin burgh University. Welchman was to continue computer work in the United States, where he headed the applications research phase of the Whirlwind computer project at MIT for a few years. He gave the first course on digital computers in the Electrical Engineering Department at MIT. Flowers remained head of the Switching Division in the Post Office Research Department, designing electronic telephone ex changes. The Bletchley chess players, as one might expect, resumed active play. During the wartime years their tournament chess play, although reduced, had never completely stopped. A match held between B.P. and Oxford University in December 1944, was won 8-4 by B.P. Alexander and Golombek played in the 1946 radio match with the Soviet Union and, with Milner-Barry, took part in the championship tournament in Nottingham that year. The Bletchley experience as summarized by Flowers found wide echo among other participants: “It was a great time in my life— it spoilt me for when I came back to mundane things with ordinary people.”
chapter 4
M inim a x
The master squirmed on his chair. The unease was not due to the board position, where his carefully built up advantage had become over whelming, but to another urgency. He had only a few easy moves to play before time control and enough time—a minute or so—remained on his clock. His opponent, having carefully hoarded his allotted time, now had almost an hour left to stare at a lost position. Now, with the opponent’s clock running, would be a good time for a brief absence. He began to rise, but his adversary reached out as if to move, and the master sat down again. If a move were made and the clock punched in his absence, his remaining precious seconds would quickly drain away and the game would be lost by time forfeit. His opponent hesitated, withdrew his hand, and appeared deep in thought. “If he just spends another five minutes. . . .” thought the master and pushed his chair back. Again the hand moved forward as if to play, again the departure was postponed, and the master resumed his writhing. Once more he started to rise and, sure enough, the opponent feigned a move. “Damn your cat-and-mouse game!” he muttered and jammed his chair forward, “But I’m still going to win.” (And he did.) A human chess player’s choice of move depends not just on the position, but on the entire game situation. The chess player must weigh the relevance of a proposed move to the current campaign plan, consider the appropriateness of attack or defense, and judge the sharp ness of the position. The player’s decisions are strongly influenced by his own form, the tempo of the game, urgency of time control, prox imity to moment of crisis, frustration level, and familiarity with the type of position. Playing conditions are also important. Pressures re sulting from seemingly slight distractions such as lighting, noise, and smoke contribute to revision of the player’s current plan. The revision might be a decision to relinquish one type of strategic, tactical, or material advantage in favor of another; to seek or avoid complications; 37
38
CHESS AND MACHINE INTUITION
to follow the comfort of one’s own playing style; or to deviate in ways likely to confound one’s opponent. Current schemes for machine evaluation of a chess position are based on two assumptions: First, for any two positions, it is possible to decide which is more favorable, and second, that this relationship is transitive.1In consequence of these assumptions, a number that serves as a measure of worth can be assigned to each possible chess position. For the sake of computational convenience, the assignment of numbers to positions is usually normalized so that the number zero corresponds to a dead draw and change of sign is the same as change of side. From the viewpoint of the human chess player, any attempt to order chess positions according to a numeric scale would require comparison of incommensurables, for any set of fixed values associated with posi tions must necessarily neglect some aspects of the game situation. Yet the assumption that a number can be associated with each position makes the problem of machine position evaluation manageable and has led to the astonishing success of “brute-force” computation in which the arithmetic abilities of the computer are exploited to enumer ate myriads of potential positions and quickly compare their values. The practical difficulty is specifying a set of useful rules for assign ing numbers to chess positions. A small number of positions could be simply ranked in a table from low to high. Now only a tiny fraction of chess positions has ever been (or ever could be) considered, yet some computational scheme must be specified that will produce a value for any reachable position. Most functions to evaluate chess positions emphasize material. All other things being equal, games are decided in favor of the side with material advantage, and material is easy to count. The “other things,” positional factors such as space, time, mobil ity, and initiative, are rarely equal and very much more difficult to quantify. Still, the evaluation need not be perfect. Forward search over a tree of potential moves provides a possibility for improving an evaluation. The deeper the search, the further the branches of the tree are followed before the evaluation function is applied, the more accurate the effec tive value produced by minimax backup is likely to be. Furthermore, if a potential position is not acceptable to the side on move, it is usually possible to select a different move that denies the opponent oppor tunity to create that position. 1 That is, if Position A is at least as good as Position B, and Position B is at least as good as Position C, then it always follows that Position A is at least as good as Position C. This may seem trivial, but many games have nontransitive relationships between positions as, for example, Scissors-Paper-Stone.
MINIMAX
39
The Bletchley pioneers in information machinery remained in con tact after the war and continued their enthusiastic exchange of ideas about the possibilities of machine intelligence. In a 1947 lecture on ACE to the London Mathematical Society, Turing suggested that stud ies of machine learning might make real progress if one confined one’s investigations to some rather limited field such as the game of chess. He alluded to a hand-calculated simulation by Shannon, which won games by applying rules of thumb. In that same talk, he offered the observation that just as human experts (such as mathematicians) require extensive training, acquisition of expertise by a machine also might be a prolonged process. Turing put forth the suggestion that the machine must be allowed to have contact with human beings in order that it may adapt itself to their standards. The game of chess may perhaps be suitable for this purpose, as the moves of the machine’s opponent will automatically provide this contact. (Hodges, 1983, p. 361)
With his friend David Champernowne, he had devised a “paper ma chine” they called Turochamp that carried out a fixed-depth search augmented by all chains of captures. Their evaluation function was complex enough to include a bonus for the positional advantage of a rook on the 7th rank. In a 1948 letter to Turing, Jack Good mentioned a “chess machine” devised by Donald Michie and Shaun Wylie that “suffers from the very serious disadvantage that it does not analyse more than one move ahead.” No one bothered to publish any results of all this informal activity. The term “minimax” entered the literature in 1944 in the work of von Neumann and Morgenstern on the new discipline called Game Theory. They had shown that there is an optimal strategy for any twoplayer game with fixed rules. Von Neumann was able to prove that in any two-person zero-sum game— in which any gain for one player is a loss for the other—each player would have to follow a minimax strate gy to avoid ceding an advantage to the other. Claude Shannon was the first to publish a coherent description of its application to the game of chess. In March 1949, Shannon presented a paper on “Programming a Computer for Playing Chess” at the National IRE Convention. It appeared in a shorter form in the February 1950 issue of Scientific American and the detailed version was published in the March 1950 Philosophical Magazine. This comprehensive article introduced cur rent thoughts on machine chess to a much wider audience and pro vided direct inspiration for the game programs of the following de cades.
40
CHESS AND MACHINE INTUITION
According to game theory, chess is a game with perfect informationand thus for any position, either one side can force a win with correct play or each player can hold the other to a draw. Shannon neatly showed that this “existence theorem” is in practice unhelpful (and satirized the view that a chess master owes his superior performance to lightning calculation) by imagining a game between two players who possess the unlimited intellect assumed by game theory. After sitting down at the chessboard and surveying the pieces, one of these omni scient players must either resign on the spot, or offer a draw which the other would immediately accept. Shannon showed that even though it is computationally impossible to categorize all possible positions in the manner of his imaginary players, it is feasible to imitate the type of chess calculation performed by humans if one supposes a suitable method for assigning values to positions. Most important for the development of game programs, he provided a published description of forward search and the use of minimax. Shannon gave as an example a symmetric evaluation function con sisting of the material difference between White and Black expressed as a number. Counting each pawn as a unit, the White piece values (a Knight being valued at three pawns, a Rook at five, and so forth) are added and the Black values subtracted. Fractional adjustments are made for pawn defects3 and for difference in mobility. Since few moves result in changes in material or number of pawn defects, this evalua tion function relies predominately on its assessment of mobility. Al though mobility is indisputably relevant to chess play (its lack being highly correlated with losing positions), it can neither help discover tactical opportunities nor plan strategic breakthroughs. In an appen dix, Shannon suggested several board features that might reasonably be included in an evaluation function such as passed pawns, doubled Rooks, and pawn control of the center. If a program is to play under practical time constraints, some means of limiting the number of evaluations must be specified. The simplest approach, which Shannon called a type A strategy, involves calculat
2 Perfect information means complete knowledge of the state of the game, with no uncertainties such as the order of cards in a poker deck or the throw of dice in backgam mon. Despite the name, perfect information does not include knowledge of an opponent’s plans or intentions. 3 Pawn defects refer to the presence of doubled, backward, or isolated Pawns. Rules for detecting and counting these positional features can be easily formulated: two Pawns of the same color on the same file are doubled; a Pawn is backward if all Pawns of the same color on adjacent files are further advanced, or isolated if there is no friendly Pawn on an adjacent file.
MINIMAX
41
ing all variations to a fixed, predetermined depth and then alternately minimizing and maximizing. He felt that with a three-move (six-ply) search depth, a machine following this type A strategy would be both slow and weak because no provision has been made for evaluating only at quiescent positions. He explained that in a dynamic situation, such as when a check is pending or exchanges are in progress, an evaluation function intended for use in a stable position will miscalculate the positional value, occasionally with disastrous consequences for the game. The type A approach is too rigid. Shannon suggested imitating the human player who examines vari ations to a point of relative quiescence, concluding that to improve the speed and strength of play the machine must examine forceful varia tions out as far as possible and evaluate only at reasonable positions, where some quasi-stability has been reached. Alternatives should be deliberately selected by some process so that the machine does not waste its time in totally pointless variations. He called this improve ment a type B strategy. He suggested that within depth constraints of, say, between two and ten moves, investigation of further alternatives be continued whenever any piece is attacked by a piece of lower value, or by more pieces than defences or if any check exists on a square controlled by the opponent. Shannon offered several additional comments and suggestions. Like Babbage, he advocated use of “a statistical element” to choose ran domly from among the highest rated moves whenever they are of nearly equal value. He also recommended use of a precalculated “open ing book” with random selection to provide variety. He added the observation: The above strategy gives an impression of relying too much on “brute force” calculations rather than on logical analysis of a position. It plays something like a beginner at chess who has been told some of the principles and is possessed of tremendous energy and accuracy for calcu lation but has no experience with the game. A chess master, on the other hand, has available knowledge o f thousands of standard situations . . . in a given position he recognizes some similarity to a familiar situation and this directs his mental calculations. (Shannon, 1950)
Shannon surmised that a program based on such “type positions” could eventually be constructed and pinpointed some of the difficulties in doing so. He mentioned a particular need to induce from examples, noting that chess books are written for human consumption, not for computing machines and that a person can be given a tew examples of a situation and will understand and apply the general principle in volved.
42
CHESS AND MACHINE INTUITION
Shortly after his chess paper appeared, Shannon was guest of honor at a symposium in London on Information Theory. He made a point of traveling to Manchester to visit Turing and view the new computer. After the war Turing had completed the design of his Automatic Computing Engine but, like Babbage’s Difference Engine, its realiza tion had become mired in politics. In 1949 he abandoned the project to accept an appointment at Manchester University where he had oppor tunity to experiment with one of the two universal machines then in existence. The 1943 discussions of Shannon and Turing on minds and machines, the possibilities of machine chess, and artificial intelligence had now been opened to a wide audience. In 1951 Dr. B. V. Bowden asked Turing for a contribution to Faster Than Thought, a popular account of the new “Electronic Brains.” With a chess machine in mind, Turing agreed to write a chapter and chose as his topic “Digital Computers Applied to Games.” Shannon’s paper had covered the theoretical ideas behind proposed chess machines so thor oughly that any novel discussion would have to include observations on actual play, preferably a human-computer engagement. Despite advances in computing machinery, no existing computer had sufficient speed or memory for the task, and effects of chess-playing algorithms still had to be simulated by hand. Turing selected an evaluation scheme based primarily on material, but with a second “position-play” calculation for those positions arising from equal exchange of material. The position-play evaluation func tion was asymmetric, for it was based only on the potentialities of the White (machine) side. Points were awarded for positive features such as advanced pawns, mobility, multiple defense, and a bonus for vul nerability of the opponent’s King, but there were no corresponding deductions for favorable features of the opponent’s position. He speci fied a type B strategy with a search depth of two plies, that is, ply and re-ply, which would be extended to follow chains of captures. Turing found the mediocre player he needed as a test subject in Alick Glennie, a young scientist recently graduated from Edinburgh University who was visiting Manchester to learn the techniques of electronic computation.4 The experiment took place one afternoon in Turing’s office in the Royal Society Computing Laboratory. The game proceeded slowly. Turing’s rules often selected inferior moves and occasionally, after incorrectly anticipating the result of his calculation, 4 Glennie was soon to excel in the still-infant field of computer science, in which he pioneered in the theory of syntax-directed translation of programming languages: That same year he defined the first working high-level computer language, AUTOCODE, and developed a compiler for it.
MINIMAX
43
he found it necessary to backtrack with much shuffling of papers to locate the appropriate rule section. Glennie reported Turing’s reaction as: . . . exasperation at having to keep to his rules; difficulty in actually doing so; and interest in the experiment and the disasters into which White was falling. Of course, he could see them coming. I remember it as a rather jolly afternoon and I believe Turing must have enjoyed it too— in his way. (Bell, 1978)
The score of this pioneer effort at rule-driven chess illuminates several difficulties which were to plague the designers of machine chess programs during the following decades (an explanation of the notation is given in Appendix A): White: Turing’s Algorithm Black: Glennie e5 1. e4 2. Nc3 Nf6 Bb4 3. d4 4. Nf3 d6 5. Bd2 Nc6 Nd4 6. d5 Bg4 7. h4 Nxf3 8. a4 The unprovoked advances of the wing pawns were deemed “most inappropriate moves” in the footnotes, which Turing prepared with the help of Hugh Alexander. Exactly these coffeehouse moves were to appear again and again during the early years of machine chess as a consequence of simplistic evaluation functions that award blanket credit for advancing pawns and for increasing Rook mobility. 9. 10. 11. 12. 13. 14. 15.
gf Bb5 + dc cb Ba6 Qe2 Rg1
Bh5 c6 0-0 Rb8 Qa5 Nd7 Nc5
It might seem curious that Turing’s procedure did nothing to coun ter the coming queenside loss, but instead initiated a pointless stab at the Bishop. It was obvious to the experimenters that the Bishop could
44
CHESS AND MACHINE INTUITION
retreat safely, and White would still be faced with the original diffi culty: the impending loss of the knight pawn. This “spite move” illus trates a phenomenon of forward search algorithms termed the “horizon effect.” As a result of limited search depth, an unpleasant eventuality is “countered” by selecting (often inferior) moves that postpone its occurrence, in a sense pushing it beyond the search horizon. The move Rg5 is not pointless— it ensures that no loss of material will take place within the two-ply search extent.
16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29.
Rg5 Bb5 0-0-0 Bc6 Bd5 Bxc3 Kd2 Rg4 Qd3 Bb3 Bc4 Rg3 Bxb5 Qxd6
Bg6 Nxb7 Nc5 Rfc8 Bxc3 Qxa4 Ne6 Nd4 Nb5 Qa6 Bh5 Qa4 Qxb5 Rd8
On move thirty, with Queen pinned against King, Turing became the first of many designers of chess algorithms to resign on behalf of his program. He described the algorithm’s behavior as a caricature of
MINIMAX
45
his own play and observed that it made oversights very similar to his own, because the moves selected for consideration were inappro priately chosen. By the mid-1950s, available computing hardware was sufficiently powerful to make attempts at machine chess feasible. A group at Los Alamos Scientific Laboratories (Kister, Stein, Ulam, Walden, and Wells, 1957) reported on their explorations of the Shannon-Turing approach to machine chess in a program they wrote for the newly installed MANIAC computer. They had decided to use a type A strate gy with a four-ply search depth, but their initial estimate of two hours of computation per move would make a game unacceptably tedious. Instead of reducing the already-minimal search depth, the Los Al amos group elected to reduce the game. They defined “anti-clerical” chess, a bishopless miniature on a 6 x 6 board, in which the six pawns on each side were limited to single-square moves, and castling was eliminated. This simplification, claimed to retain the flavor of chess, reduced computation time to about twelve minutes per move. The evaluation function was based first on material and second on mobil ity, defined as the number of available legal moves; order of evaluation served as a tiebreaker. The computer played against itself in the first game, selecting White and Black moves alternately by applying its evaluation rules. This initial trial revealed the difficulty of predicting the effects of even rather simple criteria included in the evaluation function. The mobil ity measure selected by the team, for example, seemed quite plausible: a count of the number of legal moves available. This choice had an unexpected consequence: Since the number of legal moves is very limited when the King is in check, the machine appeared to have a mortal fear of checks and tended to sacrifice material to avoid check. The programmers corrected the most glaring weaknesses with ad hoc modifications. After this initial fine-tuning, the anticlerical program was paired with Martin Kruskal, a strong player who gave Queen odds. Several hours into the game he had made little headway and became the first to experience that curious change of attitude which was to become common in man-machine interactions: attributing personality to the machine. Kruskal began referring to his opponent as “he” rather than “it.” MANIAC maintained the upper hand until the nineteenth move, when a weak continuation permitted Kruskal to set a three-move-deep trap—just beyond the program’s two-move search horizon. The ma chine sacrificed its Queen to avert mate, but was left with a hopeless endgame.
46
CHESS AND MACHINE INTUITION
A third game was played to measure the performance of the pro gram against a beginner. A young member of the laboratory staff with no chess experience was taught the game and coached for a week, playing several practice games with players of moderate strength. Since a beginning player, lacking intuitive experience, must rely en tirely on computation, it is not surprising that exhaustive mechanical computation prevailed and the experimental subject missed a winning line to achieve the dubious distinction of being the first human to lose an (anti-clerical) chess game to a computer. In Turing-Shannon type programs, all chess knowledge beyond the definition of legal moves is contained in the evaluation function. Even though a large number of terms representing such positional features as material, mobility, and king safety might be made part of a static evaluation function, the very much greater number of features recog nizable as special cases by expert players cannot even be enumerated, much less included. In principle, this is a minor defect, for the neces sarily crude evaluations are in a sense made more precise by examina tion of a sufficiently large tree of potential positions and the use of minimax.5 A consequence of the limited amount of chess knowledge that can be embodied in an evaluation procedure is the necessity to deal with the “combinatorial explosion” of potential positions to be evaluated. Much has been written about the inexhaustible variety of chess. Numerical estimates attest to the astronomical number of possibilities and pro vide a feel for the meaning of “combinatorial explosion.” The first two moves (four plies) can lead to some 170,000 different positions. The number of legal positions is on the order of 1040; the number of differ ent games that can be played has been estimated at about 10120. Of course these numbers reflect only the possibilities: The overwhelming majority of these games would be excruciatingly boring. Surprisingly, it is possible to estimate the number of interesting games. The studies of de Groot revealed that the average number of moves that a master examining a position from a well-played game would regard as good is about one and three-quarters. Together with the observation that the average number of moves in a game between masters is 42, a little arithmetic provides an estimate of 1020 for the number of possible master games.6 5 I say “ in a sense” because in most cases the minimax procedure is applied to partial, rather than complete, game trees and results in minimaxing estimated values rather than estimating the minimax value, a distinction which is often crucial. 6 Even this number is astronomical; for those inclined to split hairs, this is approx imately the distance, in hair-breadths, from our solar system to the next star.
MINIMAX
47
With the proliferation of more capable computers came a rapid increase in the number and variety of chess programs. A team of programmers led by Alex Bernstein of IBM made one of the first attempts to implement full legal chess on a machine. A preliminary study had convinced them that acceptable play would require a mini mum search depth of four plies; it also revealed that, with a branching factor of 30 at each ply, some 800,000 positions would have to be examined to decide on a move. To avoid the catatonic behavior of a computer conducting a full-width search, the Bernstein team applied “forward pruning” to reduce the extent of the search tree. This was achieved by considering only the seven most plausible moves. For every position, a decision routine carried out a series of tests, starting with “Is the King in check?” to add relevant moves to a list up to a maximum of seven. A plausible-move generator narrows the search considerably: The number of terminal positions examined by the Bernstein program at a depth of four plies shrank to 2,400. Deciding which moves might be relevant in a given position is far from trivial. A simple-minded plausible-move generator will overlook, or reject outright, good and sometimes critical moves. In one game, on three successive moves the program’s Bishop was pushed back a square by pawn thrusts, none of which had been proposed as plausible moves. Using a plausible-move generator to circumvent the combinatorial explosion seemed promising. At least it made over-the-board competi tion possible instead of slowing play to that of a postal game. Allen Newell, John Shaw, and Herbert Simon (known collectively as NSS) decided to carry the idea a step further and embarked on a bold, new plan to employ specialized chess knowledge to select plausible moves. For their experiments in artificial intelligence during the 1950s, the NSS trio developed a series of “Information Processing Languages” designed to permit easy definition of recursive (self-referent) processes. Their interest in chess as an intellectual problem-solving activity led naturally to a design of a chess-playing program formulated in one of these languages. This effort marked the beginning of a machine-chess tradition at Pittsburgh’s Carnegie-Mellon University that in the fol lowing three decades would make CMU the world leader in design of specialized hardware and algorithms for brute-force chess computa tion. The NSS group was seeking an alternative to brute-force tree searching. Their chief interest lay in understanding human problem solving activities and how similar processes might be performed by machine. To make good moves, and to make them for the right reasons, the NSS team decided that their program would have to imitate hu
48
CHESS AND MACHINE INTUITION
man reasoning; that is, the program should specify goals and then seek ways to achieve them. Specification of goals is far from easy. The long term goal of bringing about checkmate is too vague to admit of formu lation, and even the short-term goal obtaining an advantage is too subjective for rigorous definition. In a two-person zero-sum game, any imbalance in a position repre sents an advantage to one player and a corresponding disadvantage to the other. A player must constantly weigh the various types of imbal ance and strive to maintain an overall balance. Advantage expresses a player’s sense of this balance, which is guided by certain combinations of elementary features that form meaningful patterns. A player de scribes and compares positions with the help of chess concepts such as relative piece value, piece cooperation, mobility, and development. The NSS team suggested that such notions are essential, for without them even a very fast computer could not explore deeply enough to make reliable evaluations and discover strong moves. Furthermore, it is not realistic to expect such concepts to arise spontaneously during the course of computing values for specific positions, particularly when the computation is based on unrelated positional features. Lacking a method of inducing chess concepts from examples, NSS simply sup posed that their program already possessed sufficient chess knowledge to recognize a (potentially achievable) favorable imbalance that might be considered a goal. Their idea was straightforward: For each move, the program would examine the position to determine its category. Each category was associated with a set of appropriate goals, which might include King safety, material balance, center control, development, or promotion. The selected goals were added to a list, which guided the processing that followed. Each goal was paired with a move generator that would propose moves relevant to achieving that goal. The NSS approach emphasized reasoned action based on chess knowledge. Since move generators carry the burden of finding positive reasons for doing things, each generator must be individually designed to detect particu lar features of the position and to find legal moves relevant to those features. The center-control generator might propose advancing a cen ter pawn to the fourth rank; the material balance generator would propose defending or moving a piece found to be en prise. A sequence of evaluations, one for each goal, determined each move’s value. This value took the form of a vector, in which each component signalled acceptability of the move from the viewpoint of its associated goal. Such a flexible programming approach permitted easy addition of goals during testing: a material balance goal to assess gain or loss of material; a development goal to keep track of relative gain or
MINIMAX
49
loss of tempi; and a pawn structure goal to avoid the creation of pawn defects. To compare two values for minimaxing, the vector components were tested sequentially. Only if one pair of components were equal would the next pair be compared: High-priority components always domi nate. The programmers placed no a priori limit on width or depth of search. Exploration of continuations was based on quiescence. If no move would result in significant change in some component of the scoring vector, the position would be considered dead. The NSS team was the first to use the alpha-beta (a -b)7 algorithm in a chess program. Alpha-beta, sometimes called “backward prun ing,” supplements minimax by avoiding generation of unprofitable or irrelevant move sequences during lookahead. It does this by using a technique employed almost instinctively by a human chess player during lookahead search: to stop investigating successors of a given move or move sequence when a refutation move has been found. In other words, if an opponent has one response that establishes that a potential move is inferior, there is no need to check other responses to that move. Alpha-beta, like minimax, is an imitation of the behavior of a human chess player. The most promising move is evaluated first and its minimax value serves as a standard (the alpha value) for evaluating the remaining alternatives. During examination of less promising moves, whenever the opponent has a reply that would leave one worse off than the established standard, other possible replies need not be evaluated. The tree can be pruned, for once it is known that a continua tion is bad, there is no need to calculate just how bad. If a move is found during the course of evaluation that is better than the standard, it becomes the new, higher standard against which replies to subse quently examined moves will be gauged. Now any shortcut useable in the maximization process also can be applied, in reverse, during minimization. The opponent’s best re sponse, the most likely minimum, is examined first to establish a standard (the beta value). Any subsequent alternative with an evalua tion exceeding beta provides opportunity to discontinue further inves tigation of that line, for a rational opponent is not going to allow its selection. If an opponent’s response should prove better than the stan dard (is less than beta, because the opponent is seeking to minimize), the old beta is replaced by the new minimum. 7 John McCarthy, perhaps best known for his design of the programming language LISP, coined the name from the two bounds, a and 0, used in an early description of the procedure.
50
CHESS AND MACHINE INTUITION
It was soon proven that, for a forward search to a fixed depth, use of the alpha-beta technique is without risk. In other words, no critical branch will be erroneously pruned; the final move chosen will always be the same as the one that would have been selected by exhaustive minimax. Just as the chess player examines the most promising moves first, alpha-beta pruning is most effective if the branches are examined in order of descending merit. The number of cutoffs is greatest when the initial alpha standard is high and the initial beta standard is low. Even with no attempt to optimize search order, some improvement usually results from using alpha-beta, but the number of branches pruned depends on the accidental ordering of moves. In the worst case, the branches can be so badly ordered that alpha-beta does nothing. By simply neglecting evaluation of moves he knew would be elimi nated anyway, Turing had actually used the alpha-beta principle in his hand-simulation. In one case he failed to carry out the move demanded by his algorithm because he judged, erroneously, that the calculated value of that alternative would not exceed that of the line he was pursuing. Curiously, he failed to identify this activity as a sepa rate, programmable algorithm. With use of a - b pruning, both the number of branches that must be generated and the number of static evaluations can be dramatically reduced. Reducing the amount of branching at each node increases the search extent achievable in a fixed time. With best ordering, the maximum ply is nearly doubled. Because of this substantial saving, a - b pruning is almost essential for any game program that employs forward search. The NSS team supplied their program with positions from master games to observe its behavior. They were gratified to find that the program considered only about three alternatives in each position. As in human play, only a few continuations were explored; the average number of potential positions considered for each move was thirteen. Even more encouraging, more than half the time the program’s choices agreed with the human master’s selection. Their prototype lacked most of the basic “tools” of the human chess player and was unable to detect even the simplest forks and pins. Once beyond the opening, its play resembled that of a beginner who knows only the moves, but none of the basic patterns of play. Embodiment of chess knowledge in move generators proved arduous and the NSS group never developed their program beyond the initial experimental stage. Only three goals were ever implemented (center control, material balance, and development), and two of these are appropriate only in the opening, surely an insufficient number for a
MINIMAX
51
fair test of their approach. Apart from drawing attention to the poten tial importance of the activity, they made no headway in inducing concepts from examples, nor did they make any other provision for acquiring or extending chess knowledge. Still, their program’s behavior did resemble the forward search of the human chess player much more closely than a full-width bruteforce search, for most of the possible moves, not being relevant to any of the selected goals, were not proposed by any of the move generators and were not even considered. Furthermore, the program was not subject to the human amateur’s common lapse of hanging a piece— leaving it subject to capture without compensation—because mainte nance of material balance is a high-priority goal, and branches leading to immediate exchange of material are not likely to be pruned. It might be expected that chess knowledge is in only a few cases sufficiently codifiable to permit specification of appropriate goals and move generators. The initial success of the NSS approach fired the optimism of the trio, encouraging Simon and Newell in 1957 to predict confidently: “Within ten years a computer will be World Chess Cham pion.”
chapter 5
Brute Force
The rapid advance of knowledge-based machine chess anticipated by Newell, Shaw, and Simon did not take place. Indeed, the trio’s work helped dampen enthusiasm for their scheme; they had, after all, hinted that the important problems had been solved and all that remained was the dog-work of coding goals. Instead, two major difficulties pre vented further advance: Only the most simple chess concepts could be captured in rules precise enough to be programmed, and no one had been able to specify any practical algorithm for categorizing board positions. As the knowledge approach grew stagnant, the continuing increase of computer capability made brute-force schemes more attractive. Writing a recursive forward-search program is not entirely trivial (see Appendix), but a working version, once assembled, can be readily finetuned by adjusting its evaluation function. Any improvement in pro gram efficiency permits more extensive search, and thus better play. In view of the immediate rewards, it is not surprising that brute-force forward search programs came to dominate machine chess. Perhaps because the best-known chess program of the 1960s embod ied a remarkable amount of chess-specific knowledge, it was widely assumed that machine chess was based on skillful application of gener al principles, and not on sheer computing power. Developed at the MIT Artificial Intelligence Laboratory by the team of Greenblatt, Eastlake, and Crocker, the program, known as MacHack, was the first to com pete against humans under tournament conditions, and to be accorded an official USCF rating. Since the mid-1960s, all participants in chess tournaments sanc tioned by the United States Chess Federation have been rated with the help of a yardstick devised by Arpad Elo. The measure of playing strength he developed seems to satisfy an important need of chessplayers: Upon introduction it was enthusiastically adopted and 53
54
CHESS AND MACHINE INTUITION
has remained universally popular. The Elo rating indicates a precise position on a scale of chess proficiency ranging from archduffer (about 800) to supergrandmaster (about 2800). The USCF classifies players according to 200-point intervals along the Elo scale as follows: Master: Expert: Class A: Class B: Class C: Class D: Class E:
2200 and above 2000-2199 1800-1999 1600-1799 1400-1599 1200-1399 below 1200
Almost all International Masters weigh in above 2300 on this scale and International Grandmasters above 2500. These official titles do not depend on Elo rating, but are bestowed by the world chess body Federation Internationale des Echecs (FIDE) upon achieving certain “norms” in international competition. In Elo’s system, the difference between two players’ ratings is an expression of the probable outcome of a match between them. A player rated 100 points higher is likely to win 64% of the games; for a 200point difference, the expected win ratio is 76%. With increase of rating differential, the chances of the lower-rated player shrink dramatically. When a 600-point underdog wins a serious tournament game, that is, the favorite was not “embalmed,” the game may well be published in Chess Life as a rarity. The USCF rating pool can be viewed as a collection of economic entities whose currency is rating points. After every rated game, rat ing points are transferred from loser to winner or, if drawn, from higher- to lower-rated player.1Since this is a very simple economy (no banking system, no money lent at interest, and nearly constant total specie), it is easy to regulate, and the Elo rating system as used by the USCF remains remarkably stable. From time to time small adjust ments are made to compensate for drift in the rating system as players enter at a beginning level, amass rating points with increasing experi ence, and then become inactive, in effect taking along some of the coin upon leaving the system.
1 The specie transfer is a function of rating difference. The key to system stability wa Elo’s replacement of the sigmoid cumulative probability function by a bounded ramp approximation, which limits the number of points transferred when an upset occurs.
BRUTE FORCE
55
Elo’s rating system also serves to assess the performance of chess machines. Indeed, it has been suggested that in view of the very different styles of play, performance may be the only area in which comparison is possible. To establish a useful measure of machine performance, it is desirable to calibrate the ratings through some standardized form of chess competition between people and machines. In 1967, under the name MacHack Six, the Greenblatt program was the first nonhuman to participate in a rated chess tournament. (It lost its first game— to an expert.) Within three months the program, now an honorary member of USCF, had managed two wins and two draws in an amateur tournament to take the Class D trophy (based on its provisional rating of 1330). MacHack soon obtained an official rating at the 1400 level, and this success against human competition stimu lated the famous wager with David Levy recounted in the chapter on human versus machine. Although mid-1960s computers were substantially faster than their predecessors, the Greenblatt team decided it was still not computa tionally feasible to consider all moves from each position. They chose a Shannon Type B strategy with a variable branching factor (for tourna ment games, the nominal2 branching factor started at fifteen and was reduced as search depth increased). Several forms of evaluation were used in MacHack: A “plausible move generator” selected branches worth further exploration, and a “position evaluator” examined leaves of the move tree with the help of a “static board evaluator,” which assigned scores to quiescent positions. The chief function of a plausible move generator is to reduce the branching factor of the move tree by avoiding examination of irrele vant moves. In an attempt to cover a wide variety of board situations, Greenblatt supplied MacHack’s move generator with a bewildering collection of bits and pieces of chess knowledge contained in about fifty rules of thumb. Most of these were useful only in specific cases, say, during a particular stage of the game, or in certain regions of the board. This collection of rules resembled the hodgepodge of advice that beginning chessplayers have heard ever since the first kibitzer: “In opening a game, endeavour to bring your superior officers into action speedily” ; “Exchange pieces off when you are superior in power”; “It is mostly advisable to castle the King pretty early in the game, and to do so on the King’s side” ; “When the superior pieces have been taken off
2
In some cases, such as calculating the consequences of a free check, a greater number o f branches would be examined.
56
CHESS AND MACHINE INTUITION
the field, the King should be made to compensate for his previous inactivity by being busily engaged.” ' Such advisory tips can be of great didactic value to the beginning player, for they draw attention to aspects of a position worth further examination. MacHack’s “heuristics” served a similar purpose: to help select moves for detailed evaluation. The plausible move generator assigned scores to moves based on such diverse criteria as motion toward a more central square, tropism to the opponent’s King, develop mental value, and unblocking of files. As in human chess play, implau sible moves were not deliberately ignored: They simply went unno ticed. Those identified as plausible were ordered by score for a - b and added to the tree. At the specified maximum search depth, the position evaluator examined each node of the move tree. Under certain condi tions of turmoil, such as when several pieces are hanging, the plausible move generator would be invoked again, and the search extended. In more placid situations, the static board evaluator produced an initial estimate, which might then be refined after exploration of capture chains and pawn promotions. One innovation of MacHack almost universally adopted in subse quent brute-force chess machines was the transposition table, a tech nique to recognize previously evaluated nodes. The ability to retrieve a position’s value from a table offers tremendous potential savings in computational effort by avoiding reevaluation of positions reached by move transpositions. The actual reduction in MacHack’s computation was modest because the program searched no deeper than five plies and identical positions can appear only after the second ply. Later programs running on faster machines and thus capable of deeper search encountered the full combinatorial explosion of repetitions of position and could realize substantial computational savings.4 Despite its built-in advice, MacHack remained outclassed by expert players. The program seemed to have been modeled on the beginner: Its play was determined largely by its plausible move generator which, like the beginning chess player, emphasizes moves, not positions. MacHatk proved to be well-matched when paired with a Class D player. A proficient player watching such a contest could easily supply a running commentary of the thoughts going through the mind of the human, 3 From a section on “ Maxims and Advice for an Inexperienced Player” in (Staunton, 1847). 4 Robert Hyatt, whose team authored the 1983 World Computer Chess Champion Cray Blitz, lamented that the million-entry transposition table was not nearly large enough because the program regularly searches more than ten million positions during the three to five minutes it spends on a move. He noted that use of even this “ limited” table reduced the computational effort by nearly two-thirds.
BRUTE FORCE
57
and seeing the same sequences of disconnected moves being played by the computer would surely be struck by a feeling that a similar think ing process must be taking place. Beginning human and computer alike employ the same style, blindly obey the same general precepts, and commit very similar oversights. It is not surprising that they play a level game.5Both are stymied by the same problem: how to assess the particular combination of advantages and disadvantages in the board position. Good chess play demands correct assessment of advantage and dis advantage. Recognizing and understanding why a particular combina tion of features is advantageous permits the player to judge which action is called for. In the exploration of the boundary between compet ing options, the judgement of advantage can involve fine distinctions. Every tournament player, knowing well just how quickly a position can collapse, is acutely aware of the need to steer a careful path, wary of the possible overlooked nuance that the opponent might exploit, and feeling always on the edge of disaster. The player who can discern more relative differences has an edge, and is more often the better player. The reason is that any change in a chess position is likely to affect several board features and it is the change of features in combination that determines the direction of a game. A trifling difference in posi tion becomes critical as it is magnified over a cascade of moves. This sensitivity to initial conditions suggests that the game of chess might well serve as an experimental microcosm for the study of chaos. In deed, the brute-force chess machine can be thought of as one attempt to solve a particular problem of chaotic equilibrium—predicting the out come of a contest from a given position—by enumerating possible futures and selecting as the most likely that course prescribed by minimax. Curiously, the success of the brute-force machines is based on the least chaotic of evaluation functions. More than a century ago World Champion Steinitz emphasized the strategy of accumulating small advantages until the opponent’s posi tion collapses. A slight advantage is obtained at the cost of a lesser one; the opponent is granted maneuvering space in one region of the board in exchange for control over another; small profits are exacted from each transaction until an overwhelming position is achieved. Just as the merchant measures profit and loss in terms of cash and inventory (and to a lesser extent in terms of present and future desirability and obtainability of goods), the brute-force chess machine must be able to 5 The similarity of style was particularly evident in the game with Dreyfus recounted in the chapter on Human vs. Machine.
58
CHESS AND MACHINE INTUITION
compute changes in positional and material assets. The linear poly nomial has been the mechanism of choice. The chief advantage of a linear representation is that its compo nents can be readily taken apart and reassembled—a problem can be partitioned, its pieces solved separately, and the partial solutions con joined. MacHack’s designers selected the linear polynomial for the evaluation function in part because rules could be easily combined. They foresaw that the number of rules in their knowledge-based eval uation function was likely to increase rapidly and they could not allow the complexity of computation to grow exponentially with the number of features. Computing the value of a linear polynomial is easy: For each detected feature of a position, an associated coefficient, or weight, is added to a total. Because of its simplicity, the linear polynomial has become ubiquitous for such diverse applications as “measuring” IQ and assigning classroom grades.6 The linear evaluation function provides a convenient way to keep track of accumulated advantage. Material values, such as -1 for the pawn down, and positional values, such as + .3 for the far-advanced pawn or + .1 for the opponent’s isolated center pawn, can be separately computed and combined. With a linear polynomial, advantages add up. Positional and material values are weighed and a decision based on this comparison determines the course of the game. It is almost as if the alternatives were being weighed with the help of expert intuition. Some rules apply only in particular situations. The chief defect of a linear polynomial is the difficulty of introducing particular rules to handle special cases. General rules are— by definition—those that can be determined statistically; exceptional situations, being rare, are “squeezed out” by the statistical processes. If a score is to be assigned to some feature of a board, such as the King being open to a check, its relevance to the board situation must be determined. Relevance is not measurable on a continuous scale, but is a dichotomy: Either the feature is relevant and detailed analysis is needed, or it isn’t, and it should be disregarded. Although a linear polynomial can represent gradual transitions, it cannot describe such discontinuities. “ Linear” implies that a small disturbance results in a small, proportional re sponse. In a nonlinear world in which abrupt change is possible, sys tem output may be smooth for a time and then suddenly jump.
6 Few people bother to ask if it is ever reasonable to weight one test question with 10 points, another with 15, to add up the weighted scores, and then to attempt to interpret the total as meaningful. Still, the process is simple, avoids expenditure o f deliberative effort, and is therefore popular.
BRUTE FORCE
59
Since a linear polynomial handles combinations of advantages ade quately only when they are independent, some other mechanism is necessary to deal with conflicting values. For example, pawns in front of a castled position should move if a promising attack can be mounted, and should stay in place to safeguard the King. Conflicting values and abrupt changes can be handled by subdividing chess positions into discrete categories, which can be separately evaluated using only rele vant criteria. This analysis by case takes place automatically during the digital process of building, and then minimaxing, a move tree. With a sufficiently extensive move tree, it is the combination of contin uous linear polynomial with a discrete branching structure that makes brute-force computation so effective for evaluating chess positions. MacHack did not remain the sole mechanical contender in chess tournaments for long. The MIT program was soon joined by the longlived Northwestern University program CHESS 2.0.7 In early 1968, undergraduates Larry Atkin and Keith Gorlen decided to write a chess program to exercise the university’s new CDC 6400 computer. Upon hearing of their work, David Slate, a graduate student and chess expert, decided to write a competing program. The Atkin-Gorlen pro gram generated a lookahead tree and adjusted its computational effort according to the difficulty of the position; Slate’s program did not carry out a forward search, but had a superior evaluation function. Resistance to machine participation in tournaments was greater at Illinois than in Massachusetts. Intimidated, perhaps, by tales of the computational ability of the university’s new computer, some partici pants in the Northwestern Chess Championship circulated a petition to bar computers from the tournament. For its maiden effort, the Atkin-Gorlen program was paired with another first-time participant; the two took turns giving away the game, but the hapless human more decisively. The level of machine play was convincing, the anticomputer outcry diminished, and players were soon begging to be paired with the computer. The two programs never met over the board; instead, their comple mentary strengths and weaknesses suggested the joint venture that produced CHESS 2.0 in mid-1969. For a lark, the trio advertised their new program with a fake issue of the Control Data Software Availabil ity Bulletin. Its appearance revealed a surprising number of chess
7 The curious name was derived from a traditional programming custom of append ing two digits separated by a decimal point to the name o f a program to indicate the version; this number was increased by a tenth for each nontrivial modification, and raised to the next unit upon major rewrite.
60
CHESS AND MACHINE INTUITION
players among computer users, for it was informally copied and widely distributed. Requests began pouring in. They sent copies of the pro gram to more than twenty Control Data installations. In the spring of 1970 Slate received a letter containing suggestions for improving CHESS 2.0 from David Levy, who had tested its level of play at the University of London. The program had spread as quickly as gossip: An inquiry revealed London to be fourth (and by no means last) in a chain of university computing centers that had enthusi astically passed along copies. CHESS 2.0 even turned up in Australia, where it served as the focus of an exhibition to raise funds for sending a team to the chess Olympics (for which, not surprisingly, the program failed to qualify). Later that year, participation of a computer program being no longer resisted, CHESS 2.0 played in the Northwestern Uni versity Championship to finish with a score of 2-3. With several programs now taking part in tournaments, it seemed inevitable that some form of direct encounter between chess playing programs should take place, perhaps as a new form of intercollegiate rivalry. The first United States Computer Chess Championship was organized by Kenneth M. King and Monroe Newborn of Columbia University as an adjunct to the 1970 Annual Conference of the Asso ciation for Computing Machinery held in New York City. The orga nizers hoped that a competition among all the chess programs in the United States would not only bring together their authors for fruitful exchange of ideas, but also would stimulate interest in artificial intel ligence. There was even a possibility that computer chess might pro vide a social occasion as rich as other spectator sports. Ground rules to cover computer play had to be agreed upon. The immobility of main-frame computers, for example, required timely communication of moves between remote participants. Chess had been played by telegraph since 1844, when Washington beat Baltimore in an intercity match, and radio matches in the 1940s had established protocols for the use of clocks in conjunction with a communication link. In 1965, when the U.S. State Department denied permission for travel to an “off limits” country, Bobby Fischer had participated in the Capablanca Memorial Tournament in Havana via teletype link with the Marshall Chess Club in New York City. For a computer chess tournament, two-way communication had to be maintained between each remote participant and the playing hall where the official boards and clocks were located. Rules for human tournaments could generally be made to fit com puter play as well. Any human player unable to move the pieces and operate the clock may designate a proxy to carry out these motions when instructed. A machine also can employ some suitable means,
BRUTE FORCE
61
such as a telephone modem, to communicate its moves to a proxy who makes the moves and punches the clock. Programs are not hand icapped according to the speed of their host computers, but as in human competition, equal time is allotted to all programs. Each partic ipant had to complete forty moves within two hours, and thereafter ten every half hour. Some rules specific to machine play had to be established. Provision was made for time-outs during play to allow for recovery from the common and unpredictable seizures resulting from hardware, soft ware, or communication channel “glitches.” The prohibition on passing game-related information to a participant while play is in progress was relaxed to permit entry of the board state8 whenever “reboot” amnesia occurs. The tournament director (TD) was given the right to adjudicate a game after five hours of elapsed time. This rule eliminated a host of potential squabbles associated with adjourned games, and in twenty years of computer tournaments, adjudication protests have been no ticeably lacking, perhaps because TDs have always been stronger players than the participants. (Interminable endgames are common in computer chess tournaments, and adjudication has provided some re lief, but its use presented a problem with no satisfactory answer: To what extent should playing ability be considered? Adjudication by best play occasionally awards a win to a program which, without assis tance, would be unable to find the winning line.) Six programming teams accepted the invitation. The program J. Biit (Just Because It Is There) was the favorite, in part because it was written by an outstanding chess player, Hans Berliner, but chiefly because it was running on the top-of-the-line IBM 360/91 computer at Columbia University. David Slate had continued to extend CHESS 2.0, and completed the improved CHESS 3.0 in time for the tournament. The new version embodied a few new features but, most importantly, it was more efficient, running some 65% faster. Richard Greenblatt had been asked to enter MacHack, but he de clined, hinting that competition with other computer programs would be inappropriate for an artificial intelligence project intended to model human thought processes. Disappointed authors of competing pro grams who wanted a chance at MacHack considered this a flimsy excuse— it was known that on one occasion MacHack had played J. Biit to a draw— and Greenblatt’s steadfast refusal to allow MacHack’s
But only the board state, that is, current board position along with castling and en passant status, may be supplied. The time remaining may also be given, but only if the program originates the request. 8
62
CHESS AND MACHINE INTUITION
participation in further computer chess competition was likened to Staunton’s ungentlemanly evasion of every encounter with Morphy. The three-round Swiss-system9tournament took place on successive evenings. Several hundred curious spectators appeared at each round. Since off-site participants never complain about noise level in the playing rooms, there was no need for respectful silence. Indeed, at times the sound level resembled that of a boxing match. Programming teams and their supporters raised partisan cheers when their pro grams made good moves and produced loud groans when opportunities were missed or blunders made. The holiday atmosphere encouraged kibitzing, and second-guessing the reasons behind this play or that was every bit as lively as with other spectator sports. The favorite, J. Biit, after jumping to a quick lead with a first-round nine-move miniature, was paired with CHESS 3.0 in the second round for the pivotal game of the tournament. It proved to be an entertaining, Class C style of game, with plenty of missed opportunities and inac curacies on both sides— perhaps the ideal spectator game for the four hundred or so watching the display board. CHESS 3.0, playing Black with a Nimzo-Indian defense, soon snaffled a pawn, returned it for the exchange, weathered an attack by J. Biit’s Bishop pair, and used its positional advantage to grab pawns while heading for the endgame. The drama reached a climax when CHESS 3.0 returned the exchange for an unstoppable passed pawn. This move elicited a round of applause— another first for a computer! Three more moves brought checkmate. The new favorite easily won its third round game to be come the first computer chess champion. Computer chess tournaments provided an arena for (and quickly attracted) computer wizards eager to show off their programming virtuosity. Those who have never written a program for a computer (still an overwhelming majority on this planet) might imagine it a straightforward process: Understand the problem, express it in the form of simple steps, and watch the computer carry out these steps with electronic celerity. A few misspellings perhaps, quickly found and corrected, and— lo!—a working program. But people who write pro
9 This is the most popular form of chess competition in the United States. In the first round, entrants are ordered by anticipated playing strength and the top half o f the field is paired with the bottom half. In subsequent rounds, pairings are made according to accumulated score (one point for each win and a half point for each draw) with the goal of pairing entrants with identical scores under the restriction that no player may meet the same opponent twice during the tournament. Specific rules govern assignment of White and Black. The advantage of Swiss pairings is the tendency for each win to bring stronger opponents and each loss to bring weaker opponents, thus matching players of comparable strength while providing variety.
BRUTE FORCE
63
grams will tell you that, as in chess play, there are many more ways to go wrong than to go right and that programming oversights are the rule, not the exception. These result in “bugs”—puzzling, unintended behavior. In its very first game, for instance, Slate’s program found a novel answer to the e4 opening: It claimed a draw by stalemate! A substantial portion of the effort of creating a working computer program consists of “debugging,” of discovering the reason behind this or that inexplicable action and devising appropriate corrections. It is usually only through the detective work of debugging that the pro grammer learns how the program actually does work. Now how did it come to print “stalemate?” Is the move generator at fault? Could the legal move list inadvertently have been emptied? The search for causes of unexpected actions is often instructive, and occasionally suggests an improvement. But however successful the effort to correct known lapses, there can never be certainty that a program is bug-free, and the folklore of the chess programmer is rich with tales of errors lurking in well-tested programs. Some errors— like over-the-board blunders— astonish by their blatancy: In the Second Annual Computer Chess Championship, an en trant named “Mr. Turk” lost all three of its games by forfeit for specifying illegal moves. Other slips are memorable because of the programmer’s discomfiture. After a first round debacle at the 1982 North American Computer Championship, John Poduska achieved a computer chess version of a fingerfehler—the quick, obvious move that happens to be wrong—with a too-hasty repair, after which he could only watch with dismay as his program played the worst moves it could find. The most instructive bugs are subtle. During the 1971 tournament, the program COKO (named for its authors Dennis Cooper and Ed Kozdrowicki) offered a pawn which, if accepted, would permit a mating sequence 17 plies deep. The pawn was taken, mate was announced, and the predicted line was followed right down to the position diagrammed on the next page. Suddenly COKO exhibited a curious behavior: White: COKO 38. 39. 40. 41. 42. 43. 44.
III K c1 Kc2 K cl Kc2 K cl Kc2 K cl
Black f5 f4 g4 f3 fg gh = Q Qxf1 +
64
CHESS AND MACHINE INTUITION
Soon, in a hopeless position,10 the authors resigned for their program. They quickly discovered that the aimless play was a result of COKO’s evaluation: When faced with two alternatives, a mate-in-two and a mate-in-one, it selected the longer path because the resulting position would have a slightly higher strategic score; after moving, a similar calculation again indicated that the mate-in-two would be marginally better. Merely selecting any move that allows a win does not neces sarily bring the win any closer. While COKO seesawed in the doorway, GENIE marched back into the game. Occasional programming blunders marked the rapid development that took place following inauguration of the computer chess champi onship. As the tournament organizers had hoped, competition encour aged friendly, cooperative exchange of ideas and experiences. While waiting for their computers to respond, program authors discussed hopes and expectations and, with an eye to incorporation into their recipes, reexamined the rules thought to be used by human players. But whatever strong players might be doing while deciding on a course of action, the activity was largely intuitive and apparently not expres sible as a sequence of steps. With the emphasis on programming techniques, very few top-rank chess players were attracted to the generation of competitors attempting to create the most effective chess-playing program.
10 Hopeless, that is, if the opponent had been human. But GENIE, the opponent in this engagement, was equally incapable of playing an endgame and— had play con tinued— would probably have produced one irrelevant check after another until the game was drawn.
BRUTE FORCE
65
During the first decade of tournament competition, CHESS X.X remained the bellwether of algorithmic chess. The increase of the program’s version number hints at the metamorphosis needed to re main a perennial top competitor: North American Computer Chess Championships 1970 CHESS 3.0 1971 CHESS 3.5 1972 CHESS 3.6 1973 CHESS 4.0 1974 RIBBIT 1975 CHESS 4.4 1976 CHESS 4.5 1977 CHESS 4.6 1978 BELLE 1979 CHESS 4.9 Gorlen left Northwestern in 1970, but Slate and Atkin had continued the original enterprise and, by incorporating the most useful algo rithms as quickly as their utility could be shown, succeeded in keeping their program a top contender for a remarkably long time.11 The suc cession of programs chronicles the steady improvement of algorithms for brute-force forward search. Version 3.0 and its upgrades used a depth-first selective search in which the evaluation function not only scored the leaves of the move tree, but also served as a plausible move generator to select the “best n” branches at each node. This design provided a good balance between search extent and existing computing power, and sufficed to maintain a marginal lead over machine competition. Performance of selective search programs remained erratic, especially in quiet positions when overselective search often rejects good moves. Faster computers per mitted deeper search, which required ever more careful pruning of branches at shallow nodes and a more sophisticated plausible move generator. Development of better algorithms did not proceed as quick ly as computational capacity. A major turning point in brute-force chess occurred with the transi tion to full-width search in CHESS 4.0. Designers of early chessplaying programs had carefully avoided full-width searching because slow computers and combinatorial explosion resulted in poor play and overly tedious games. With the computational capacity of new comput 11 USCF rating records reveal a few unusual contemporaries whose skill at chess has closely matched that of the best brute-force programs for almost two decades.
66
CHESS AND MACHINE INTUITION
ers doubling every few years, full-width searching became increas ingly attractive. Slate and Atkin claimed that the chief motivation for this change was a desire for simplicity, but they admitted that a major attraction of full-width search was their own peace of mind. In game after game they had experienced long, anxious intervals waiting for their program’s next move, worrying whether the best continuation would happen to be included in the set of plausible moves. A full-width search guarantees that the best move will be examined. In a full-width search it is especially important to order the moves for efficient alpha-beta pruning. Slate and Atkin remarked on the irony of the lazy programmer who, seeking to avoid the complications of determining which moves are plausible by letting the machine examine all moves, must devote substantial effort to move selection anyway. Full-width search was, in effect, the same selective search they had been using all along, except that less plausible moves were also examined just to make sure nothing had been overlooked. The increase in computational effort is often negligible, for a - b imme diately rejects almost all of the additional moves. Using even a minuscule amount of chess knowledge in deciding which moves to examine first can make a tremendous difference in search effort. Consider the problem position from a 1988 game (Horvath-Jacobsen):
White to p la y and win
If forcing moves, especially checking moves, are examined first (an obvious procedure for the human who suspects disaster if Black is allowed to capture the pawn with R xf6 + ), even a small chess comput er is likely to find the forced mate in less than a minute. If, however, moves are generated without regard to priority, the large number of
BRUTE FORCE
67
available Queen moves produces such prolific branching that finding the solution can take many hours. An unexpected flood of programming teams asked to enter the 1973 ACM Championship. Tournament organizers decided to limit partici pation to the twelve strongest programs (selected on the basis of sam ple games) and to extend the tournament to four rounds. Even with the peace of mind provided by full-width search, Slate and Atkin had anxious moments, but CHESS 4.0 scored three wins and one draw to finish a half point ahead of the three-way tie for second—just enough to retain the championship. The win established the practicality and success of full-width search, and soon type B selective search had nearly vanished from computer chess. With this change of emphasis, a prolonged period of refining full-width search began; useful innova tions quickly found their way into the active programs. Refinements were often imitations of behavior observed in human chess play. One simple artifice, for example, is obvious move process ing. When there is no real choice, that is, all alternatives save one are obviously bad, further expenditure of evaluation time is not worth while. For better or for worse, the move might as well be made imme diately. Obvious must be defined precisely enough for use in a comput er program, for example, if there is only one move to get out of check or only one way to carry out an obligatory Queen recapture. Another useful technique to exploit pruning possibilities during full-width search is the ominous-sounding killer heuristic. A strong move in one position may well be strong in closely related positions. If a sharp refutation to a proposed move is found, say a Knight fork of King and Queen (ouch!), this “killer move” is worth trying early as a possible refutation of other moves; the tree will quickly be reduced to those cases in which the fork can be thwarted. Selecting the best killer can be difficult, especially when there are several ways to lose. On the other hand, trying a potential killer costs little, and a lucky choice can speed up search dramatically. Furthermore, an entire mob can find employment, with a hit man at each level, for the killer heuristic is useful throughout the move tree. During the first decade of the computer chess championship, CHESS X.X failed to win only twice. On the first occasion, in 1974, CHESS 4.2 was upset by RIBBIT, a Canadian program by Jim Parry, Ron Hansen, and Russell Crook of the University of Waterloo. RIBBIT had unexpectedly reached the final after opponents failed to press their advantages. In previous encounters, RIBBIT had lost twice to CHESS 4.X, but now avenged these losses in the deciding game of the champi onship by employing a standard tactic of human masters: the prepared opening line.
68
CHESS AND MACHINE INTUITION
GM Lajos Portisch asserted that the only task in the opening is to reach a playable middlegame. To achieve this aim, pieces must be developed to squares where they are likely to help in the coming action. Since the rules of thumb for development are guidelines, not recipes, they provide scant help in a particular situation. Even without a clear understanding of their goals, an amateur can memorize a few standard openings and play moves by rote to reach, often enough, a playable game. Rote opening play also can be helpful to the machine. An opening library guarantees acceptable performance early in the game while conserving clock time for the coming intense calculations needed for the middle game. But blindly following the moves of an opening book leaves the player vulnerable to an opponent’s finding a better varia tion than the standard, or perhaps setting a trap not included in the library. Chess literature is rich in mistaken evaluations— competent analysts who occasionally disgrace themselves in print are a constant reminder of the fallibility of book lines. Indeed, a master’s preparation often includes analysis of an opponent’s preferred openings to discover weaknesses, and ways to exploit them. Such preparation to counter a computer’s comprehensive opening book was anticipated in a 1962 science fiction story by Fritz Leiber. In his story, deep computations of a chess machine dominate play during a grandmaster tournament, but a human underdog still wins by playing for an opening variation misclassified as advantageous in Modern Chess Openings. The chief difficulty with rote play results from incomplete informa tion. An opening library contains only moves; the type of strategy appropriate for that position is omitted. Just as a human player who plays off a memorized opening can be suddenly left at sea when the opponent fails to oblige with the remembered line, the computer upon being thrust on its own is likely to find the position at variance with the balance mandated by its evaluation function. A player who lacks a feel for the positional forces is prone to thrash about after deviating from the main variation of a memorized opening; brute-force programs also tend to waste moves upon leaving the book. Some have even been observed to “unmake” the previous move because the resulting posi tion, however playable, did not satisfy the program’s criteria for bal anced force. Unless additional information, perhaps in the form of adjustments to the evaluation function, can be provided, book varia tions must be very carefully selected to fit the program’s style of play. Compilers of opening libraries for brute-force programs have found that a comprehensive encyclopedia is not necessarily an asset. Thompson and Condon invested long hours to enter hundreds of thou sands of positions into their opening book, and complained that the
BRUTE FORCE
69
machine played the favorite line of every opponent. It lost five tourna ment games by accepting inferior positions in the opening, and won only one by springing an opening trap. The prepared variation is an extension of an individual’s opening repertoire. Most grandmaster games are published sooner or later, and part of a top-rank player’s preparation for a contest is the careful study of an opponent’s games to discover flaws, and to work out improve ments. If a game can then be directed along the path of the prepared variation—and this is often possible, for opponents tend to feel com fortable following previously successful lines—the preparer has ad vance knowledge of pitfalls and is thoroughly familiar with the situa tion. Sometimes the opportunity to employ a prepared line can be long in coming. In one celebrated case, Frank Marshall saved a variation (the Marshall attack) for a decade before he could spring it on Capablanca in 1918. Capa well understood he was faced with a prepared variation but, supremely confident of his abilities to find his way through the coming complications, accepted the challenge and ulti mately won. Games played by computers are published, too, and are also subject to analysis by potential opponents (or their trainers). Indeed, program mers at computer chess tournaments are often observed updating their machines’ opening books to include variations just played by potential opponents and discussed by the masters present. RIBBIT’s prepared variation in the deciding game of the 1974 championship led CHESS 4.2 into an inferior position; unable to recover, game and title were lost. In 1975, CHESS 4.4 retook the championship. Its play was much better than that of CHESS 4.2, chiefly because the program had been transplanted to the faster CDC CYBER 170 computer. Each doubling of speed increases playing strength by about 100 points.12 Slate and Atkin did not just rely on increased computing power, but continued to refine brute-force technique. For CHESS 4.5, they added a transposi tion table similar to the one pioneered by Greenblatt in MacHack. It employed a programming trick called hashing to provide the computer with a sense of deja vu, of instant recognition of previously evaluated positions. Instead of examining a (possibly very large) table entry by entry to match the current position, a “hash function” converts the
12 By comparing full-width search programs with various fixed depths, Ken Thomp son found that a ply of search depth corresponds to about 250 rating points. With a branching factor o f six, doubling the speed produces an increase of about 100 points. Above the expert level (2000), tactics are less important, and speed increases provide diminishing returns.
70
CHESS AND MACHINE INTUITION
coded position to an index that points directly to the table entry (if present). Slate and Atkin also expanded the killer table to hold two entries for each ply level, the first and second most popular killers, so that an established killer move with a record of successful refutations would not be too hastily replaced. Now computers began to compete seriously in human tournaments. In the summer of 1976, CHESS 4.5 won all its games in a five-round Class B Swiss-system tournament held in California. The average rating of its opponents was 1735. Then it won the Minnesota State Open chess championship (but, as specified by the rules, the program was not eligible for cash prizes). In 1977, CHESS 4.6 achieved another first by conducting a simul—a simultaneous exhibition—against ten human players, including several of expert strength, winning 8, losing 1, and drawing 1. CHESS 4.6 also won from the other side of a simul, as one of 44 opponents of U.S. Champion Walter Browne. Increased participation in human events necessitated further re finement of machine play. It became apparent that chess programs need to recognize situations in which a draw can be claimed, and must be provided with rules for seeking or avoiding a draw. If a drawable position is evaluated as dead even, a program will automatically play for the draw when alternatives are unfavorable, and play for the win as long as it reckons winning chances. Since a dead draw is rarely an “equal” result (rating points are still transferred, and programmers are eager for their programs to increase their ratings), a non-zero score is usually assigned to positions that allow a draw to be claimed. If a draw is valued at a pawn-and-a-half, then the program would prefer a draw to a pawn advantage. (A club player taking part in a simul against a grandmaster might well consider a draw almost as good as a win, and be delighted to take a certain draw over a likely loss.) A negative value for the drawn position is called a “contempt factor,” for the program will then play for a win even when somewhat behind. Before each tournament game, CHESS 4.X was supplied with a con tempt factor based on estimated ratings of the opponent. This value usually represented a deficit of slightly more than a pawn so that even when a pawn down the program would avoid repetitions of position and disdain draw offers.13 In every tournament there is a time limit for a specified number of moves and every tournament player is well aware of the pressure imposed by the relentless clock. Computers must abide by the same
13 Although a program is permitted to offer a draw, this almost never happens: Too many programmers have been embarrassed by draw offers after every single move. In practice, it is the programmer who offers a draw.
BRUTE FORCE
71
rules, and search depth must be controlled to avoid a possible time forfeit. The a - b algorithm brings an additional complication, for the total time required to search to a given depth depends critically on the order in which the branches are examined. With a -p , when the first move examined is indeed the best, most of the search time is expended on its evaluation, and the remaining moves are dismissed rapidly. A simple means of dealing with the unpredictable variability of computing effort is an approach called iterative deepening, which en sures ready availability of a preliminary move, but one ply shallower than the current analysis depth. The position is analyzed first to a depth of one ply, then to two plies, and so on until the time allotted for that move has expired. Although this might seem a waste of time, additional processing overhead is actually slight because the number of static evaluations needed for all prior levels is substantially less than the number required for the deepest level.14 In some endgame positions, iterative deepening—with a large transposition table— has even resulted in less computation than direct deep search. Principal variation search is a popular refinement of a -p search. The idea is to guess an a -p window that will contain the final value. A search is conducted within these bounds. If the value does indeed lie in the interval, search effort may have been avoided. If not, there is a penalty, for some portion of the tree must be re-searched, with wider bounds. Choose your thresholds . . . place your bet.15 Jonathan Schaeffer’s history heuristic, a more general version of the killer heuristic, tallies every legal move examined during traverse of the move tree, and maintains a record of each move’s power to refute, that is, the a or b cutoff value. The algorithm is often implemented with two 6 4 x 6 4 tables (one each for Black and White to cover all from -to square combinations) to keep track of the histories of each move. The idea is to help select the order of available moves to examine successful ones first. With a history table, information on the success of moves in causing cut-offs— and thereby speeding evaluation—can be shared throughout the tree, not only from ply to ply, but potentially from game to game. One can imagine nodes having useful information being able to broadcast tips to other nodes about moves worth evaluating first. MThe ratio is about one less than the effective branching factor of the move tree. According to a rule o f thumb for programs using a - b and iterative deepening, each additional ply of search depth costs another factor of six in computation. 1S If it is likely that the first move examined is the best, the remaining moves can be searched with a zero window. If a better move is found, the high cutoff so indicates, but provides no value. If only one move is better, then it is simply selected and its value is not important; if more than one, well. . . .
72
CHESS AND MACHINE INTUITION
Nodes taking the advice reinforce the recommendation of good moves or suggest better. Although nodes are visited and revisited in se quence, one also can imagine all nodes being examined simulta neously, perhaps with the help of a physical computing device for each node, to obtain one picture of an artificial neural net. Many incidental aspects of chess competition could easily be incor porated in brute-force programs—for example, the “weariness factor,” a dynamic function that increases the program’s willingness to draw as a game drags on. But other chess concepts fundamental to human chess play, such as piece cooperation, have proven enormously difficult to describe, and this knowledge has existed almost entirely in intuitive form. Even now, upon observing obvious piece cooperation during machine play, it is startling to realize that this cannot be attributed to perception of pattern, but occurs simply because deep forward search indicated least disadvantage. In brute-force minimax, moves are discovered through the process of blind, Darwinian trial-and-error characteristic of evolving systems. (The ordering of moves suggested by the best variation at a shallower ply affects only speed.) Only after a move sequence has been generated and the terminal position evaluated can it be determined whether the effort was worthwhile. Brute-force computation has been enormously successful and demonstrates how well chess can be played without ideas, without plans, and without comprehension. But a special form of knowledge called intuition often permits even stronger chess play, and can be achieved without enumeration or calculation.
chapter 6
Human Intuition
A person observing a game of chess for the first time is likely to feel a sense of awe. Completely absorbed, the players scrutinize the board. After a time, one moves a piece; both shift position, and continue their intent study; long pauses between moves are occasionally relieved by brief flurries of exchanges. The players are so concentrated in what is patently an intellectual engagement of some skill, that the observer (quite rightly) hesitates to intrude. Afterwards, in the more relaxed postgame atmosphere, the players may well share their enjoyment with the nonplayer, and explain the movements of the pieces and the idea o f checkmate— a bare minimum to illustrate the richness of the game. But the number of things to keep in mind, the complexity of the moves, and the rapidly multiplying rules of thumb seem overwhelm ing. Unless urged on by strong motivation such as a child’s natural curiosity, the common reaction is to defer further pursuit of this pas time. Indeed, the human brain seems particularly ill-suited for chess. The apparent need for long, precise calculation and the consequent de mands on limited short-term memory ought to be so overwhelming that good play would be impossible. Yet somehow the chess player overcomes these limitations and, after a certain amount of practice, is able to play well beyond human capacity to calculate. This takes place through a process of “pattern recognition” in which the player draws on intuitive experience to discover meaning in a chess position. The abilities of the chess superstars—the grandmasters—are mar velous to behold. Janos Flesch gave an example of chess intuition he observed at the 1960 Olympiad in Leipzig: A group of five masters who had just spent the night analyzing a complicated adjourned game showed the position to the young Bobby Fischer. After examining the board for five or ten seconds, Fischer shook his head, commented that their proposed move did not fit the tactical situation, and walked away. 73
74
CHESS AND MACHINE INTUITION
Continued analysis failed to reveal a suitable alternative. At length, the returning Fischer reached over the heads of the assembled masters to show his solution. “ That is the move!” he remarked and wandered on, leaving the masters to verify that his suggestion did indeed best satisfy the requirements of the position. Fischer soon became well-known for dazzling displays of intuitive power that brought him the unofficial title of supergrandmaster. In 1970, in the “strongest blitz tournament of all time” held at Herceg Novi in Yugoslavia, he effortlessly outdistanced his opponents in a double round-robin1grandmaster tournament of five-minute games. In spite of Fischer’s well-known dislike of purely intuitive rapid transit chess (“ it kills your ideas”), he used only half his allotted time in superfast play to garner 19 of the possible 22 points. This amazing result belied the widespread belief that top performance in lightning chess requires special practice, for on this occasion Fischer’s victims included the three past world champions Tal, Petrosian, and Smyslov, none of whom could be considered a pushover. A central puzzle of psychology concerns how humans with their limited resources can learn to handle complex situations competently, and can even attain expertise in such ill-structured environments as chess. One of the first researchers to examine chess expertise was the French psychologist Alfred Binet. During the 1890s, he attempted to understand the mental processes underlying chess play by observing and interviewing proficient players, particularly those adept at blind fold chess. He had ready access to superb players, for at that time the Cafe de la Regence in Paris was the center of world chess. One of his first observations was that chess players do not perceive chessmen as having any particular form, but view them rather as symbols characterized by their individual moves and by their signifi cance at a particular point of the game. A player examining the chessboard hardly notices, and later will be unable to recall, material aspects of the game such as shape or color of the chessmen. The capable player does not, for example, perceive a Knight as a carved horse’s head on a pedestal, but rather as a piece with certain capacities that serves a particular function in its current position. Nominally valued at about three pawns, a Knight can be appraised quite differently depending on its potential. Its utility might be reduced if it is awk wardly placed, ineffective, or vulnerable; its worth might be enhanced if it controls critical space or is poised to participate in a decisive attack. The skilled player invariably perceives a chess piece in terms of its significance to the present course of the game. 1 In a double round-robin tournament, each participant meets every opponent twice, playing one game with White and one with Black.
HUMAN INTUITION
75
Binet also learned that the great chess players often have no exact visualization of chess positions. When recalling past games they often omitted details, especially isolated moves that did not fit the idea flow. This would be curious indeed if the game were being recalled move by move since one tends to remember, not forget, misfit. He realized that they were reconstructing the details of past games from remembered ideas and plans, and recollecting their general conduct. During blind fold games, Binet had noticed a hesitation of several seconds after being told the opponent’s move, and decided this could only be attrib uted to a time-consuming reconstructive memory process during which the board position becomes clear and with it the direction of the game. To any player beyond the woodpushing stage, chess is a game of ideas and plans, not of moves. A game is memorable not because its moves have been memorized, but because the underlying thoughts and plans are remembered. The moves are reconstructed by recalling the ideas and reasoning that took place during play. A master replaying a game does not think in terms of having moved, say, a Knight to a particular square, but rather in terms of the game strategy and specific plans of attack and defense that required the Knight move. Because each move is a consequence of the chess player’s thoughts and impres sions, the more a game follows a coherent sequence of ideas, the better its moves are remembered. After his speed-chess triumph in Herceg Novi, Fischer astonished his fans by rattling off the moves of the 22 games he played; for him, this was a trifling feat because grandmaster games are quite coherent, even at rapid-transit tempo. Binet identified the crucial component of a player’s recollection of a move sequence or a particular position: the association of a precise meaning with these moves or that position. He found an agreeable explanation by comparing chess playing with literacy, and remarked that a novice player’s difficulty in reproducing details of a game is like trying to remember a line of print composed of incomprehensible sym bols. If the meaning of the print is understood, a reader can reconstruct all the letters of a sentence after a single glance. To the experienced player, a game of chess is as meaningful as a literary work. Replaying a famous game is like reciting a poem, for the logical sequence of moves links them together in memory as are verses by their cadence. Recall of text depends as little on paper quality and type font as recall of a game relies on the chessboard being wooden or whether the pieces were Staunton or Regence. And just as a chess player during recon struction of a game tends to omit moves that do not fit, a reader in recalling the flow of a passage is no longer aware of typographical errors in the original. Other psychologists found chess an attractive experimental micro cosm for the study of human memory and learning. Alfred Cleveland
76
CHESS AND MACHINE INTUITION
in his 1907 work “The Psychology of Chess and of Learning to Play It” identified stages of chess expertise in terms of gradual removal of deficiencies as chess concepts take on meaning beyond the overly simple rules of thumb provided to the beginner. He was struck by the beginner’s inability to use even rudimentary chess knowledge in actu al play. Rules are helpful only in the most simple situations, and verifying correct application of rules is far easier than choosing the most applicable rule. The beginner finds blunders easy to recognize, once committed, but very hard to avoid. Cleveland decided that the beginner’s difficulties are not due to lack of attention, but to lack of understanding; the tyro concentrates on a single feature o f the game because it, of all the mass of impressions, has some meaning. He remarked that a common consequence of this tunnel vision is being surprised by checkmate when only a single move away from checkmat ing the opponent (Cleveland, 1907). To Cleveland, the meaning of a chess position encompasses “how-to” knowledge for dealing with it. The player knows a threatened piece must be sacrificed or withdrawn, the attack blocked, or a counter attack initiated, but only after perceiving the full significance of the board situation does the best alternative become apparent. Cleveland noted that when the case is so unfamiliar that experience suggests nothing, the reasoner is reduced to simple blind fumbling, examining one move after another to find one that seems superior. A configura tion of chess pieces acquires meaning gradually during a process of loop thinking. A player examines the board, obtains some glimmer of understanding, looks again in the light of that realization, gains fur ther insight, and continues in this manner until confident that the forces and pressures have been understood. At tournament pace, sever al examination cycles take place before the player feels sure of a position’s significance, and the way to proceed is clear. In an attempt to understand chess expertise, the Dutch psychologist Adriaan de Groot conducted a comprehensive study of chess playing behavior in the early 1940s (de Groot, 1946). De Groot was himself a strong player, aware of his own chess intuition, and confident he could find test positions to distinguish skill levels among players. He studied the conscious thought processes reported by his subjects,2 who ranged from Class C players to grandmasters. In individual sessions, he would show his subjects a sequence of quiescent, but unclear, positions and
2 Psychologists will tell you that the underlying causes of a subject’s behavior are not accessible to consciousness and that introspective protocols tend to consist of rationaliza tions. De Groot's use of this approach was to ascertain what his subjects were doing, not why.
HUMAN INTUITION
77
ask them to report their thoughts while choosing a move. He recorded the sessions and transcribed all statements with careful notations of timing. As he had expected, de Groot found that he could reliably match the moves proposed for further examination with his subject’s playing strengths. An example taken from one of his own games, which he referred to simply as “Position A,” has become the most thoroughly analyzed position in the history of chess.
DeGroot's Position A
A player who prefers B xd5 upon initial encounter with this position is probably a master; de Groot noted that every grandmaster participat ing in the study had considered this move, while few experts even mentioned it.3 During the experiments, de Groot was impressed by the chess mas ter’s rapid grasp of the possibilities in a newly shown position. He observed a vast difference between master and amateur in the amount of time taken to recognize relevant structure and dynamic pressure. The master seems to perceive the critical forces almost instanta neously and can immediately suggest specific, appropriate board ac tion. The most surprising observation was that this difference is not due to thinking speed. Contrary to the popular belief that strong players carry out deeper and more rapid calculation, de Groot found little difference in reasoning activity between masters and amateurs, whether measured by number of alternatives considered, time spent 3 De Groot’s assessment of playing strength based on choice of move in test positions does not always work well for brute-force programs. Unlike mediocre players, many mediocre programs prefer the “master” move in Position A, but for the wrong reason, as evidenced by the computed minimax continuations.
78
CHESS AND MACHINE INTUITION
examining each alternative, or depth of investigation. A master con siders about the same number of possibilities as weaker players (some times fewer, rarely more), but is very good at choosing the right moves for further attention while club players expend considerable effort analyzing the consequences of inferior moves. The process of examining potential lines of play and comparing their relative merit is similar for both strong and weak players. At each level of look-ahead, unpromising lines of play are discarded. Less promising lines are kept in reserve, to be examined only if nothing better appears. Players often investigate competing lines of play in parallel, with attention switching back and forth as understanding deepens. De Groot’s players offered a subjective evaluation of virtually every continuation they mentioned: “I can hold this position” or “this seems to win a piece” or “no good . . . my opponent can break through.” From such evaluative comments, de Groot concluded that the chief reason for the performance difference between master and club player is the amount of specialized knowledge each brings to the act of percep tion. When shown a new position, a master sees it in a strikingly different way, immediately noticing dynamic balance and important structural features that, even after long study of the position, may well remain invisible to the club player. Only when familiar with the relevant features— whatever they might be— are players able to notice appropriate moves. De Groot described how intuitive experience guides the skilled play er, who perceives a chess position as belonging to some unwritten category and instantly notices specific features that stand out against the background of that category. General techniques of dealing with that type of position spring to mind and, with no apparent effort, attention focuses on appropriate move opportunities. How, then, can an expert player miss an obvious move? The reason is simple: A move only becomes obvious when it is suggested by some pattern, some configuration of pieces. De Groot noted that chess perception is dis tinctly visual. Indeed, players often use visual metaphors to express their perception of the board, stating that they “overlooked” this threat or bemoaning that their opponents “saw” everything. But not everything can be seen, or foreseen. However capable the player, the combinatorial explosion guarantees that remote potential events cannot all be anticipated. How might a player plan the future course of a game without deep, exact calculation? In chess, one at tempts to exploit a weakness without being able to foresee exactly what gain might appear. A skilled player simply proceeds with the confidence that, once an advantage is obtained, every contingency can
HUMAN INTUITION
79
be dealt with in order to keep it. The future will be favorable if only the present situation is handled adequately. To become expert in a skilled activity, the novice starts by con sciously applying a few rules of thumb. These rules often merely prescribe, and seldom explain; they serve only to help the trainee start practice. Experience soon reveals how very approximate these precepts are. (After all, an activity completely describable by a fixed set of rules would hardly be considered to require expertise.) Furthermore, close examination reveals the trainee’s rules to be incompatible. Expertise lies not so much in the facile application of the rules of thumb as in the correct assessment of their competing requirements. During practice, the trainee discovers exceptions by probing the limits of rule application. Refinements to the rules become apparent, as do new exceptions. As experience accumulates, a feeling develops for those cases in which the now highly specialized rules can be ap plied. By exploring a variety of situations and discovering relation ships among the perceived features, the unusual gradually becomes familiar and the trainee gains confidence. Then all at once a marvellous transformation takes place: An activ ity that required awkward application of consciously applied rules can suddenly be carried out automatically, and even gracefully. The per ceptual process has changed: “ know that” has become intuitive “know how.” When noticed, this metamorphosis can astonish, just as when a child realizes that the person running along beside the bicycle to provide support isn’t actually holding on. Cleveland had also remarked on the perceptual metamorphosis that takes place with gain of expertise. As more chess situations take on meaning, pieces are gradually transformed in the player’s mind from static objects to forces that can be exerted. The expert no longer sees discrete squares and pieces or even abstract counters, just as the competent reader no longer sees individual letters. The novice’s super ficial, literal appreciation of chessboard events has become a percep tion of large, meaningful patterns that go far beyond any literal repre sentation of a position. The board has become an arena of overlapping zones of significant activity, regions in which certain events are taking place or are about to take place, and in which the impossibility of other events can be exploited. A position is seen in terms of pressures, of “fields o f force” that require additional force to be applied in a particu lar region to maintain equilibrium, or to divert the opponent’s influ ence. The expert’s knowledge is oriented around goals, procedures for attaining them, and the conditions under which these procedures are
80
CHESS AND MACHINE INTUITION
useful. Expert ability includes not only intuitive recognition of the familiar, but also a sense of the limits of one’s expertise— a feeling for what is not known. Knowing what requires attention, and what cannot be easily dealt with, the expert is able to apportion time and effort to best effect. Chess intuition is taught the same way most expert knowledge is passed from human to human: by tutorial example. Instructional ma terial such as Reuben Fine’s Basic Chess Endings (1941) takes the form of examples to illustrate some regularity—some concept—that the novice is expected to understand. The chess tutor still offers advice in the form of rules, but these rules are even harder to program than the initial precepts: “ it’s nice to have your knights pressing from the flank in this sort of position,” or “when you notice this pattern, consid er trying that plan.” A capable teacher comes to know each student’s difficulties and misunderstandings, and contrives sharp examples to illustrate important distinctions. De Groot remarked on a difficulty common to all teachers: a tenden cy to overestimate their students’ ability to see some obvious property. A competent chess player seldom recalls the tedious study and hardwon experience required to assimilate some principle, such as under mining a pawn chain by attacking its base, that now springs to mind as obvious. It is all too easy to forget that the ability to notice positional defects, such as lack of piece cooperation, is not innate, but the result of a gradual transformation of perception, through which the apprentice becomes an inhabitant of the master’s world, seeing as the master sees and playing as the master plays. With the assimilation of know how into the perceptive process, the player acquires a feeling for which features of a position are signifi cant. Tactics, all-important to the beginner, soon become subordinate to strategic plans; chess masters are less concerned with the move at hand than with the type of position they are creating. In common usage, an intuitive move is one made not because of calculation, but because the player has recognized that it will produce a position o f a type in which a familiar strategy almost always succeeds. Master play is rich in spectacular examples of intuitive moves. In a 1953 game with GM Averbakh, GM Kotov sacrificed his Queen in the position dia grammed on the next page. Kotov, playing Black, captured the pawn at h3 with his Queen to draw Averbakh’s King into a region in which it could not be protected. No forced mate is in sight, but Kotov felt confident that the resulting position could be won (and was, more than twenty moves later). Intuitive recognition of a winning situation is not the result of a sequence of conscious, rational steps even though the player could, if requested, supply a detailed justification for the conclu-
HUMAN INTUITION
81
sion. De Groot explained that the master’s forward search is motivated not so much by the need to find a plan as to verify the correctness of an already-selected plan. The expert player seeks not just the best move, but also a subjectively convincing argument to support that choice. Another experiment conducted by de Groot differentiates weak and strong players by their use of short-term memory. He showed his subjects a position perhaps twenty moves into a well-played game for five to ten seconds. When asked to reconstruct the position, a subject, if a master or grandmaster, would almost always set up the pieces per fectly, even though such a position usually contains more than twenty pieces; a player of less than expert strength manages to place about six pieces correctly. One might suppose a master chess player employs some sort of special visual imagery to manage this trick. But when the experiment is repeated with random placement of the same pieces, the ordinary chess player again manages about six pieces while the master’s ability suddenly collapses to the same level.4 The special perception of the master is thus closely linked with the types of positions that arise in well-played games. Herbert Simon of Carnegie-Mellon University conducted further experiments in chess perception. He remarked that the talent for reconstructing a position depends chiefly on the number of hours spent staring at chess boards, on how many patterns and their associated playing methods have become familiar. He suggested that expert intuition can be explained as an information process—that of recognition. 4 Unlike the human, a brute-force machine can play just as well in a position composed o f randomly placed pieces as it can in a well-played game.
82
CHESS AND MACHINE INTUITION
Simon realized that a chess player does not see the entire board as a unit, but rather as a collection of familiar groupings. Intuition exploits an ability to recognize significant configurations of pieces that stand out against the background of the board situation. A chunk is a group of elementary items linked in some meaningful way so that the group ing is perceived as a unit, for example, a castled position, or a particu lar pawn structure. Even a mediocre chess player will view a doubled pawn not as two separate pawns, but as a single entity, and will sense the potential strengths and weaknesses of this configuration. Once a chunk has acquired meaning, it can serve as a symbol that can cue specific ideas and plans when recognized. It also can be used as a building block to construct more abstract symbols. Cleveland suggested that by determining the span of chess atten tion during the different stages of learning, it might be possible to observe progressive fusion of elements into increasingly larger com plexes. William Chase and Herbert Simon found a more direct method of observing chunks. They varied de Groot’s original method by asking their subjects to set up a copy of a chess position on an adjacent empty board. They observed that a master sets up three or four pieces at a time, pauses, sets up another group of pieces, and continues to build up the position in quick runs. Different masters selected the same groups. This agreement suggested that pieces set up in a group somehow belong together, that is, they are perceived as chunks. Chase and Simon tried to determine which relationships might bind pieces to gether. They found that defense, particularly mutual defense, was the primary connection, and that co-attackers of a target often belong to the same chunk. They had expected an attack relation to result in chunking, but discovered that target and attacker rarely appeared in the same group. In the experiments involving reconstructing positions, errors made by strong players often involved placing a piece on a square adjacent to the one actually occupied, but in which the piece served a function typical for that type of position. This provides additional support for the conclusion that positions are perceived, and remembered, in terms of chunks rather than as isolated pieces. A 1976 study of chunking by Bratko, Tancig, and Tancig found that strong players tended to recall pawn configurations particularly accurately, and that piece locations were remembered relative to pawns. There was a tendency to perceive a position as a variation of a prototype, a corrected version o f a known pattern. Acquisition of chess expertise presupposes a passion for the game. Even a prodigy would have little chance of reaching the Class C level without 1,000 hours of concentrated study and play, and achievement
HUMAN INTUITION
83
of Class A proficiency would hardly be possible in less than 3,000 hours of diligent effort. Chase and Simon remarked that a decade of intense study and practice is necessary to sharpen the perception required of a grandmaster. Just as one never masters the violin by practicing one hour a day, one cannot achieve chess mastery by studying chess an hour a day. Practice is not rote memorization of thousands of positions, but becoming familiar with types of positions and the plans and play ing methods associated with them. A master’s stock of familiar chess situations is comparable to a literate person’s vocabulary of words, but this knowledge is almost entirely nonverbal. Because expertise is individual, a chess player’s methods are per sonal, and there may be wide disagreement about suitability of plans and which move might be best in some particular position. Disputes over correctness of play have fueled the activities of chess analysts for centuries. In all but the most elementary positions, there is no single optimal move. Another Dutch researcher, H. J. van den Herik, found an example when he interviewed several grandmasters to refine his collection of endgame strategies. He hoped to discover clues to how experts codify endgames by examining subgoals mentioned by grand masters. He showed the diagrammed KNPK position to four grand masters and requested that they express their first thoughts, then suggest a move for white:
All four instantly identified as initial goal the approach of the King to the Pawn to support its march to the promotion square. Beyond this, opinions diverged sharply. One grandmaster selected Kf8 -e8-d8- . . . , another liked Kf8 Kf6 , Nh1 Kg5, Kg7 Kh4, Nf2 K g 5 ,. . . , and a third preferred Ng4 Kg5, Kg7 . . . . Each emphasized his own special (and of
84
CHESS AND MACHINE INTUITION
course superior) perception of the chess universe. When asked to re mark on the approaches suggested by other GMs, they reacted emphat ically with arrogant, disparaging remarks on the inferior abilities of their colleagues. The GMs were not merely being stubborn, van den Herik suggested, for they were, after all, quite ready to modify their assertions if shown a refutation. Even simple endgames may have no single correct strategy; choice of strategy often depends on no more than personal preference or individual playing style. A grandmaster is guided by what appears interesting. Most moves are uninteresting, for they are irrelevant to the forces of the position. Other moves that club players might find interesting are characterized in GM annotations as “obviously bad” or as “unworthy of considera tion.” The recognition of interesting moves is an enjoyable process. GM Yasser Seirawan expressed his delight in exercising his intuitive pow ers: The computer is considering every conceivable possibility, but I don’t have to pretend that I’m seeing everything— I make no such pretensions. In fact I discard 99% of all possible variations, but what I’m seeing— almost as instantly as Deep Thought is calculating and reassessing thousands of variations— is the heart, the soul, of that position. And I ’m right there! (Seirawan, 1991)
Throughout the history of the game, chess has been the domain of the expert, and could be played competently only after long training of intuition. The novice can stare at a position for hours without ever noticing the winning move that the master sees at a glance. This skilled perception now has a powerful rival in the brute-force computa tion of this generation’s chess engines, and chess has become a new arena of human-machine competition.
chapter 7
Human Versus Machine
In 1870 the Big Bend Tunnel on the Chesapeake and Ohio Railroad in West Virginia was under construction. One of the two tunneling crews employed the new steam drill while the other relied on human muscle, which happened to include that of John Henry, the best steel-driver on the C.& O. and the only man able to drive steel with two hammers, one in each hand. The steam drill was considered a marvelous invention, but John Henry still “allowed he could sink more steel.” Money was put up and a contest arranged: John Henry was to get a hundred dollars if he could beat the steam drill in a 35 minute race. When time was called, the champion had drilled two holes seven feet deep, while the steam drill had only managed a single nine-foot hole. This epic performance soon became a poignant monument to man’s last stand against the machine, for John Henry went home with a “queer feeling” in his head and died that night of a burst blood vessel. His martyrdom in laying down his life to show himself and others that he could beat a machine was immortalized in song and John Henry became one of the best-known characters of American folklore— a lone, tragic figure fighting a rear-guard action against the incursion of new technology. Other last stands against machine takeover were to take place as the feasible arena of human-machine competition moved from the physical to the cerebral. When the first mechanical calculating ma chine came to Japan, a clerk’s nimble fingers made news by moving the beads of his soroban rapidly enough to out-multiply it. It was nonetheless clear that machines would soon relieve not only human muscles, but also human brains, of tedious and tiresome labor; by the mid-1950s nobody doubted that electronic computers would greatly surpass humans at arithmetic. It was also widely assumed that chess play was just another form of calculation to be mechanized. The NSS team of Newell, Shaw, and 85
86
CHESS AND MACHINE INTUITION
Simon had, after all, shown how chess knowledge could be embodied in move generators, and few doubted that the computer would soon be beating all comers at chess. Herbert Simon had even ventured the opinion that, if allowed to compete, a digital computer would be the world’s chess champion within ten years. This view was by no means universal. Hubert Dreyfus, a professor of philosophy who had conducted extensive studies of expert behavior, was a perennial, and vocal, critic of the idea of artificial intelligence. The mimeographed notes that would eventually evolve into What Computers Can’t Do (Dreyfus, 1979) were circulated in 1965 under the title “Alchemy and Artificial Intelligence.” Simon’s over-optimistic prediction was an obvious target, and Dreyfus could not resist the comment that no chess program could play even amateur chess— and with the world championship only two years off! During these two years, the only human-machine game of note was “the Dreyfus affair” in which the professor, a mediocre player, accepted the challenge o f a game against a computer. It was a typical amateur encounter, brim ming with reckless tactical maneuver and overlooked opportunity, and which, to the great glee of the affronted, Dreyfus lost. The program that achieved this feat was Richard Greenblatt’s MacHack. A demonstration of MacHack in August 1968 at the meeting of the International Federation of Information Processing held in Edin burgh made computer chess a hot topic during the following week’s fourth Machine Intelligence Workshop. One of those present was John McCarthy, a professor of Artificial Intelligence at Stanford University. His chess-playing program had competed in several postal games, which included a match with the Soviet program KAISSA. Upon losing an off hand game of chess to David Levy, the Scottish Chess Champion, McCarthy asserted that within a decade a chess machine would exist that could compute rapidly enough to beat Levy easily and consis tently. Levy, keenly aware that good chess play involves more than computation, countered that it was unlikely that any computer could beat a player of his stature in a serious match. This difference of opinion swiftly led to the celebrated wager: Levy suggested a sum of £500. Donald Michie, now Professor of Machine Intelligence at Edinburgh, joined the discussion and expressed his desire to take half the bet. Soon the two professors had backed with £500 the proposition that by the end of August 1978, Levy would be beaten by a computer program in a match of no fewer than two games to be played under tournament conditions. Two more professors of AI who were optimistic about the prospects of a chess machine, and therefore eager for a piece of the action, soon joined: Seymour Papert and Ed Kozdrowicki, each in for £250. This raised the total above
HUMAN VERSUS MACHINE
87
Levy’s annual salary. Michie later increased his wager to £500 and laid a second of £500 that Levy’s defeat would be through a program written under Michie’s direction. Even if a computer program could have been developed within the decade that could play at master level, the computer scientists made a bad bet. Every tournament player knows that, in an even position, it is considerably easier to avoid being beaten than to win. In a two-game match, Levy would need only one point out of two to collect his wager, while his electronic opponent would have to score one and a half points to defeat him. Still, during the ensuing decade, opinions favored first one side, then the other, and each party could enjoy the exciting uncertainty of a good wager. With the outcome ten years off, both sides had ample opportunity to prepare. The computer scientists encouraged (and vigorously contrib uted to) the development of the algorithms discussed in the chapter on brute-force chess play. Levy, finding himself in a position of carrying out a John Henry rear-guard action, had to become expert in the behavior—one might even say psychology—of brute-force chess ma chines, which exhibit a set of strengths and weaknesses noticeably different from human players. Chess players exhibit a great variety of individual playing styles. The late World Champion Tigran Petrosian adopted a martial-arts approach to chess— waiting for a foolhardy attacking move and then applying suitable counteraction to toss the off-balance opponent out of the ring. Former Champion Anatoly Karpov also has a penchant for inviting enterprising excursions, which he can deftly thwart. Current Champion Garry Kasparov delights in showing his combinatorial prowess with swashbuckling attacks and flamboyant sacrifices. Victor Korchnoi is known for his control of space and exact defense. Fischer’s hallmark is that of constant pressure, out to crunch every opponent, every time. Mikhail Tal relishes opportunity to enter tactically compli cated positions in which he can discover unexpected sacrifices that severely tax an opponent’s ability to find a refutation. To Mikhail Botvinnik (1970, 1984), chess play is an expression of his art, which lies in creating positions in which normal relative values cease to exist. Levy observed that search-based computer programs also exhibit distinctive playing-styles. Early type B programs such as CHESS 2.0 displayed all the flightiness of the human patzer: Oblivious to the obvious, they made aggressive-looking moves that could be easily parried, and failed to consider quiet moves that change the character of a position. In one respect, the selective search programs outshone the patzer: They would not, through inattention, hang a piece, that is,
88
CHESS AND MACHINE INTUITION
leave it under attack and under-defended. Although not prone to sim ple oversights, early programs with a rigid search horizon often played egregiously. When faced with the loss of a piece, even the weakest player will look around for some compensation; programs would, until the loss took place, steadfastly interpolate delaying moves, and often give up additional material. When no obvious tactical move could be discerned, computer play resembled that of a human tyro attempting to follow a memorized set of precepts— observers could often guess which strategic rule of thumb had prevailed. In his commentary on CHESS 3.6, Hugh Alexander remarked that the program was very fond of doubling its opponent’s pawns. Since its evaluation function scores this feature as a defect, if no better move turns up, the opponent is sure to be saddled with doubled pawns. Alexander went on to describe the program’s style in very human terms: It strikes me altogether as a pawky sort of player; never does anything much, but plugs steadily along. We all know players like this in the club; members for many years, slightly despised by the younger players, they sit stolidly at the board outlasting their opponents and bringing home one dismal win after another. I’m sure 3.6 is a pipe smoker. (Alexander & Birdsall, 1973)
Full-width searching brought a change in playing style. Quiet moves that contribute to control of space, although still misevaluated, were no longer overlooked. The limited amount of chess knowledge included in a program—a measure of material balance and a few easily applied rules of thumb— sufficed for excellent tactical maneuvers with in the search horizon. Strategic play remained shallow. Computers still made no long-range plans, nor did they make speculative sacri fices to obtain a position that, although too remote for exact calcula tion, would very likely be favorable. The machine would play a plod ding, accurate game, never giving up material unless checkmate or material gain were within its search horizon and always accepting sacrifices when no disadvantage could be predicted. Brilliant moves by a search-based computer were the result of calculating all the way to the end of a sequence that starts with an unlikely move. Through his observation of algorithmic play, Levy acquired another form of intuition: a sense of which types of position a program can play well and which types it is likely to misplay. Its chief strength was infallibility at short-range tactics; it never missed a chance to snatch material. The thoroughness of brute-force tactical search seemed quite beyond human capability. (De Groot had observed that a human con
HUMAN VERSUS MACHINE
89
siders a mere fifty positions on average when choosing a move and never as many as a hundred, and surmised this might reflect some essential limitation of human processing ability.) Chess literature is rich with examples of the world’s best players occasionally committing simple tactical oversights, which a program would quickly exploit. Levy saw this pattern many times: A club player pitted against a machine arrives at a good middlegame only to lose material through some tactical error, and then must endure a rapid exchange of pieces to an ending in which the machine’s material advantage is too great to be overcome even with perfect endgame technique. Levy also noticed that no chess program seemed to be concerned with sequences of moves— of combinations. A good player makes plans and follows a plan consistently over several moves, even to the disre gard of temptations that might otherwise dominate one’s thoughts. There is good reason for this single-mindedness: One of many ways in which a player can lose a game is to vacillate, to make moves in pursuit of one goal, then to switch to another plan, and again after a few moves to seek yet another objective. A common result of such thrashing about is that the player who remains faithful to a single plan is able to strike home first and decide the game. A program does not play a coherent game of chess. Instead, it chooses one move at a time based on the current position and . . . vacillates. The master’s usual rationale for a move is that it fits the flow of the game; the computer’s game seems to lack such continuity. Even when the opponent responds as anticipated, the two-ply exten sion of the horizon will, more often than not, result in a jump in the calculated value of the position. Erratic jumps in successive evalua tions produce inconsistent play. Levy found machine endgame play particularly appalling. In the endgame, subtle positional considerations and specific chess knowl edge are all-important: It is the endgame that separates the master from the club player. A master giving a simultaneous exhibition often makes no attempt to obtain early advantage, but simply exchanges pieces quickly to head for the endgame; even with no advantage, the master’s immense knowledge and superior ability will almost always prevail. Computer endgame play was even worse than that of a human amateur. On a board with reduced piece count, the amateur readily notices opportunities to apply chess knowledge. One might suppose that a smaller branching factor would allow the brute-force computer deeper search and therefore even stronger play than in the middlegame. Not so. Particularly in endgame play, chess machines reveal their rudi mentary sense of direction: They move aimlessly, mistime pawn
90
CHESS AND MACHINE INTUITION
pushes, miscalculate the ideal King position, and allow themselves to be easily outplayed. The traditional motto of the patzer “always give check—it might be mate” might well be the only rule for playing the endgame. Many of the early programs, Levy saw, could only effect mate by accident. In a brute-force program, all chess knowledge is contained in the evaluation function. While effective in tactical situations, this function is poorly attuned to the endgame. The reason is simple: An estimate of the value of a position in units of pawns is incommensurable with the true endgame values of won/drawn/lost. With no way to evaluate accurately, a program would not, for example, give up the advantage of the exchange for a winning pawn endgame, unless the pawn conver sion were imminent, that is, within the search horizon. Apart from vague positional knowledge, such as the value of centralization, little of the procedural knowledge needed for the conduct of specific end games was available. This lack could be remedied only through spe cialized endgame programs, which programmers (who were not end game experts) found hard to provide. By means of another wager, Levy tested his contention that even simple endgames are quite difficult to program. At the 1973 ACM tournament, he challenged the programming teams to write a routine that could correctly play the KRPKR ending, that is, with an extra pawn in a King-and-Rook against King-and-Rook endgame, to win whenever possible. One team bet $100 that they would produce the required program at the following year’s tournament, and quietly paid when the time expired. An offer to renew the wager found no takers. During a 1974 visit to Moscow, Levy revived the wager with Dr. Arlazarov (a co-creator of KAISSA) for the stake of a dozen bottles of national beverage. The full year later, Levy received confirmation that GM Averbakh—a leading endgame specialist—had examined KAISSA in direct play to verify that the program could correctly play a KRPKR ending; Arlazarov soon had a case of Scotch as proof that Levy could lose a wager. Another weakness Levy observed in machine play was its inap propriate allocation of time. The rhythm of a tournament game ap pears somewhat irrational to the uninitiated. Because surely a compe tent player can move rapidly in a thoroughly familiar opening, a casual observer is likely to be surprised at the slow play. This is not a ploy intended to “out-psych” the opponent, but is simply an adjustment of pace to a careful rate of play. (There is always a subliminal aware ness of past hasty moves that left one staring for hours at a lost position.) Good players do not allocate their time uniformly. A player
HUMAN VERSUS MACHINE
91
may spend ten to twenty minutes examining a critical position while selecting an appropriate plan and then, provided the opponent re sponds as expected, quickly play the following few moves. One might suppose that, in a game between experienced players, the rationed time would be carefully conserved so there would never be need to hurry. Still, just before time control, players usually have fewer minutes than moves and, to the great delight of the spectators, games often end in a mad scramble of moves and clock punching.1 There is a perfectly reasonable explanation for this seemingly odd behavior. In chess play, as in most activities, people tend to slow down when the cost of making a mistake goes up. During the middle game, special care must be taken to avoid error because one’s opponent has ample time to exploit inaccurate play; when pressed by the urgency of imminent time control, an opponent is less able to find a way to take advantage of a misstep. While following its opening book, a chess program makes each move instantly. Upon reaching the end of the prestored variation, it allo cates its available time more evenly among the moves. Only when there is no real choice (there is but one legal move, or an alreadycomputed obligatory recapture) will the machine move immediately. Because the same amount of time is spent on difficult decisions as on easy moves, a program tends to play erratically. Especially in closed positions characterized by patient maneuver ing, a tournament player short of time may repeat moves in order to reach time control with a playable game. Because most programs do not retain the variations calculated at earlier positions, a repeated position must be entirely re-evaluated, which grants the opponent welcome respite. As part of his study of computer chess behavior, Levy played speed chess games with CHESS 4.5. He soon noticed a new feature. For amusement, Slate and Atkin had “humanized” their program with a collection of brief messages, which from time to time would startle a user with a machine-generated comment printed along with the ma 1 A surprising number of tournament games end on the move after time control when there is opportunity to reflect on the hopelessness of the resulting situation. Although he rarely allowed himself to be caught in time scrambles, Bobby Fischer complained about games being decided by gross blunders during this confusion. He proposed a new way of timing games and patented a chess clock based on his idea (US Patent 4884255). Under his scheme, each player starts with an initial ration of time; the clock counts down in the usual manner, but as each move is made, an interval of two minutes is added. A player s stock of remaining time increases or decreases depending on speed of play, but can never fall below two minutes per move.
92
CHESS AND MACHINE INTUITION
chine’s move.2 Vague enough to fit many situations, these comments introduced a psychological element to the play. An unexpected “Be Careful!” during a game can be unnerving, and more than one discon certed opponent has puzzled whether “Oh, you had that” was a con gratulatory message on the recently stolen pawn, or a jab regarding a missed and now no longer possible opportunity. When the program won, it printed a polite “Thank you for an enjoyable game” ; when it lost, a pouting “Game is over.” Through his observation and study of machine chess play, Levy soon became the leading expert on computer style, and how to counter it. He refined his anticomputer play as faster computers and better programs became available, and remained perennially successful in holding off challenging chess programs. The strategy was simple: “ Do nothing, but do it well.” This means, he explained, concentrating on long-range planning, avoiding tactical engagements, and trying to reach a play able endgame, for here brute-force programs are particularly weak. Solid positional play is important; if material is dropped in a tactical free-for-all, it is unlikely to be recovered through a simple trap or swindle. Since most programs are materialistic and sacrifices are usu ally accepted, sound positional sacrifices are well worth looking for (and particularly enjoyable to discover). A player should develop threats slowly and subtly to take full advantage of the machine’s limited horizon. Levy also favored early play of an unusual move to take the machine out of its opening library quickly. Although one might suppose Michie and Levy adversaries because of their opposite positions on the famous wager, they have been long time enthusiastic collaborators in machine chess. This comes as no surprise when one considers that the most enjoyable and satisfying chess games are, after all, those played against an opponent who provides an opportunity to create one’s best game. A worthy opponent collaborates in a creative process and, in a truly epic engagement, is but one of many collaborators. To produce the best possible contest, Michie arranged for computer sparring partners to help Levy prepare fo r the challenge. One game in December 1977 with Donskoy and Arlazarov’s KAISSA was a truly international engagement—an En glishman playing in Canada against a Soviet program running on an American computer in California. This game, too, showed the success
2 Many chess programs have been equipped with a facility for gratuitous comments. A favorite story concerns a prank on a chess programmer, who was nonplussed to observe his program play a weak move and after ten seconds type “Sorry . . . I wasn’t paying attention. . . . May I take that back?”
HUMAN VERSUS MACHINE
93
of Levy’s “doing nothing, but doing it well—the program will then dig its own grave.” Certain that his program would be the one to challenge Levy, David Slate spent six months preparing a major revision (CHESS 5.0) for the encounter. The new version was written entirely in FORTRAN for portability with an eye to running the program on the Cray-1 super computer. Arrangements were complete for an August 26th, 1978 match between David Levy and CHESS 5.0 to decide the wager when an unexpected challenger appeared. After a full decade of no public competition, and very little published work on machine chess from MIT, Richard Greenblatt appeared with a new version of MacHack, the program that had inspired the Great Wager, and which now aspired to settle it.3 The new MacHack employed an add-on “chess-oriented processing system” (CHEOPS), based on a microprogrammed machine designed for brute-force tactical analysis at a rate of 150,000 positions a second. Greenblatt achieved this rapidity by using the scheme discussed in the chapter on custom-built chess engines: an 8 x 8 hardware chess array to generate legal moves. A similar device had proven its utility in the previous year’s World Computer Chess Championship when mini computer-based Belle held its own against giant main-frame comput ers. Although CHEOPS was a formidable brute-force engine in itself, Greenblatt did not use it as a stand-alone chess machine. Instead, it served as an adjunct to the MacHack heuristic program with its chessspecific knowledge. CHEOPS would carry out three-ply exhaustive searches (extended by capture chains) from nodes within MacHack’s game-tree to guard against imminent tactical threats. Running as a background process, it also evaluated the root position for comparison with the heuristic program’s selection. CHEOPS had the responsibility of alerting its teammate to changes in material, and it also had veto power over any move MacHack might select for strategic reasons that CHEOPS reckoned tactically unsound. Levy agreed to shoehorn a two-game match with MacHack into his schedule, which granted the machine a slim chance to beat him (to do so, it would have to score at least one win and one draw). The first encounter took place at MIT three days before the scheduled encounter with CHESS 5.0. Lacking any knowledge of MacHack’s opening book This was not the only surprise appearance of 1978. After years of avoiding public performance, another well-known chess recluse, Bobby Fischer, also played a (predict ably one-sided) game with the improved MacHack/CHEOPS.
94
CHESS AND MACHINE INTUITION
or playing style, Levy played the familiar Dragon Sicilian (about which he had written a book). Each of MacHack’s opening moves was played instantly, a sure sign that it was still following its book. Just as in a game with a human master, Levy found himself under self generated psychological pressure. He became increasingly uneasy that the preparer of MacHack’s opening library might have included some refutation of his published analysis, and labored to create a new, outof-book position. This brought immediate relief, for the machine went into a long calculation. There had been no prepared variation. Levy’s “do nothing, but do it well” prevailed once again. As often happens in human consultation games, the components of the MacHack/CHEOPS computer team seemed unable to agree on a plan. Somewhat aimless play permitted Levy to post his pieces ideally for a decisive attack, and the game was soon over. Since the first-game loss made a match win impossible, Greenblatt suggested a friendly return game at a faster pace. Levy agreed, and the second game was played with a thirty-minute time control. MacHack lost this one, too, and boosted Levy’s confidence in his ability to win the final match. CHESS 5.0 was not ready and the famous wager had to be settled through an encounter with CHESS 4.7. Levy reported the outcome in the November 1978 Chess Life and Review. “The Toronto match was scheduled for six games. I therefore needed three points to collect my wager; my opponent needed three and a half to make me famous” (Levy, 1978). He wore a dinner jacket for the occasion. Levy played at a specially designed electronic board that sensed his moves magnetically and illuminated the path of the machine’s replies. The program ran offsite, on the fastest available commercial computer, a Control Data Cyber 176 in Minnesota. The first game of the match provided Levy with a “horrible shock.” During the opening he followed his plan, avoiding tactical engagement while gradually expanding control on the Queenside. Then he mis judged the effect of a knight sacrifice; indeed, he accepted it gladly, for it appeared unsound. Suddenly CHESS 4.7 had a strong attack and, even more unnerving, delivered a psychological broadside with the message: “That was easy.” Levy saw he had a losing position, and began to sweat in his sartorial splendor. Still, a losing position is not always lost. Drawing on his extensive anticomputer experience, he gave up the exchange so that CHESS 4.7 would find itself ahead in material and would seek the exchange of Queens. In the ensuing complications, Levy was able to steer a careful path between immedi ate threats, barely holding the position with a three pawn deficit. Towards the endgame he achieved material equality, and even found winning chances that required CHESS 4.7 to parry exactly. It did and,
HUMAN VERSUS MACHINE
95
for the first time, a program had drawn with an International Master under tournament conditions. The next day, game two went according to Levy’s plan. He took CHESS 4.7 out of its opening book on the third move, induced the program to play a strategically poor exchange and, avoiding tactical engagement, won easily. After a five-day break, the match resumed and again at ease, Levy did nothing and did it well, pulling CHESS 4.7 out of its opening book at move two, and waiting until weaknesses developed in the position. Even psychological ploys failed CHESS 4.7: the program’s “Be careful!” was not even communicated to Levy until after the game. Levy thus went into the fourth round with a comfortable lead, needing only a single draw in the next three games to win his bet. Confident of victory, he abandoned his do-nothing strategy and played the Latvian gambit in an attempt to beat the program at its own game of sharp tactics. In the ensuing rough-and-tumble, he made several inaccurate moves and then committed one outright blunder. That was enough. Soon CHESS 4.7 had three passed pawns and an irresistible attack. Computer chess fans could finally celebrate a machine win over a master! Having failed to outcompute CHESS 4.7, Levy resumed his strategy of quiet waiting in the fifth game and soon directed play into a won endgame. During this game the computer “crashed” twice; by the second time, its position was already hopeless and David Slate re signed game and match for his program. Levy soon learned that winning a wager is not the same as collect ing. Three of the professors paid off, Michie on the spot in cash, but Kozdrowicki welshed, pleading that his impending house purchase would leave him financially strapped. Ten years later Levy still had not been paid. Although pleased with the outcome of the Great Wager, Levy ex pressed regret that the target programmers had aimed at for so long had now been taken down. He decided to extend the original challenge, and offered a prize of a thousand dollars to the designers of the first chess machine to beat him in a match. This offer was increased to $5000 by Omni Magazine. At the end of 1978, Levy made another $1000 bet, this one with Dan McCracken, a prolific author of books on computing, that he would not be beaten before 1984. Machine enthusiasts still targeted the top players, and sought breakthroughs in both software and hardware. Some four decades after their work at B.P., Donald Michie and Jack Good continued the longest sustained active collaboration in the development of machine chess with another approach that might better imitate a human expert’s behavior during a game. Minimax assumes accurate evaluation and
96
CHESS AND MACHINE INTUITION
perfect play, while in real games inaccuracy and error are unavoid able. Michie and Good modified the minimax scheme to account for the uncertainty in the real world’s imperfect information and imperfect play by associating a probability of selection with each branch of the move tree. If the probabilities can be estimated, positions might be evaluated by utility better than by minimax score.4 The likelihood of selecting a particular branch depends on the play er’s ability to discern which move would be appropriate in that posi tion. Michie proposed a measure of discernability based on playing strength and depth of tree, and showed in an example how different moves would be chosen against different opponents. Levy remarked that such a calculation could adjust play to reflect an opponent’s playing style. If individual preference is taken into account, a game could be directed into lines uncomfortable to the opponent, and there fore, from a psychological viewpoint, more suitable. Playing styles would emerge that exhibit all the variety found in top players. Such an approach, Levy suggested, might encourage positional play against a Tal-like opponent and impose a tactical game on a Petrosian. Although this idea seemed well worth developing, like many other promising ideas for improving chess software, it was neglected in favor of the very successful (and hence alluring) hardware add-ons such as CHEOPS. In 1980, Professor Edward Fredkin, who had become wealthy through his successful computer-related enterprises, endowed a foun dation for encouraging machine chess. Besides offering prizes for achieving certain milestones in computer chess, the Fredkin Founda tion sponsored competitions to test the strength of top chess machines against well-prepared masters and grandmasters. These competitions showed how much machine ratings had inflated when based on play against humans who did not understand machine style. In tournament play against amateurs, machines tend to perform an entire class better than the ratings achieved in play against other machines. The effect is probably due, not to intimidation, but to fewer tactical blunders and fewer missed opportunities to grab material. A player familiar with machine strengths and weaknesses can better steer a game into strate gic channels to defeat a higher-rated program. Advice on how to play against computers soon appeared as a new set of chess maxims, such as “if the Queens remain on the board, the machine has many more alternatives to compute, and search is shallower.” At the 1982 Fredkin tournament, four candidate masters who had studied machine style 4 E.g., by calculating a linear combination o f branch scores weighted by their proba bilities.
HUMAN VERSUS MACHINE
97
were able to beat the four top computer programs by playing for Rook endings, the good against bad Bishop, and open middlegame positions. Understanding of machine style quickly spread to club players and amateurs when inexpensive chess microcomputers became available. With selectable search extent, the personal chess machine can provide a good game over a wide range of playing strengths. Best of all, it is always ready to play. With such willing opponents, many younger players have much more experience with machine chess than with club chess. A quiet revolution in the acquisition of chess skill has taken place. The beginner who uses a chess computer as a sparring partner quickly realizes the importance of making plans, for the player who relies solely upon computation is likely to lose. Before long, even grandmasters could profit from readily available brute-force power: Anatoly Karpov used a Fidelity microcomputer-based machine as an analysis tool to identify tactical flaws during his opening preparation for his 1990 World Championship rematch. Since 1982, computers have been allowed to play in the U.S. Open. Each must have an attendant to move the pieces and press the button on the clock. Human players have the option of refusing a pairing with a computer.5 Sponsors of electronic participants must pay the usual entry fees, but revised USCF regulations forbid awarding any prizes to nonhuman participants (or their trainers), except for prizes specifically designated for machines. Telephone connections suddenly became im portant at tournament sites to allow data links to remotely located computers, and soon the best machines were competing in open tourna ments. The quality of machine play rose rapidly as new chess hardware pushed search rates beyond 100,000 positions per second. Belle reigned for two years as World Computer Champion and reached master level. Just when it seemed certain that custom-built hardware was essential for searching deeply enough for brute-force master play, a program named Cray Blitz running on a general-purpose supercomputer took the 1983 world computer championship with a convincing four wins and one draw. After this success, its authors Bob Hyatt and Bert 5 Tournament organizers outside the United States have not always recognized a right o f refusal to play against computers, which seems to conflict with the FIDE ruling that no title norm is considered valid if any game in the (normed) event was played by computer. The inevitable happened at the 1990 Luxembourg Open when English IM William Watson achieved his final GM norm, but was denied the title because one of his wins included a game with a computer. FIDE eventually bestowed the title (Watson had, after all, maintained a 2500 rating for half a decade), but warned that this action could not be considered a precedent.
98
CHESS AND MACHINE INTUITION
Gower of the University of Southern Mississippi and Harry Nelson of Cray Research challenged Levy for the $5000 prize. Levy had not played a serious game of chess for more than five years and was concerned he might disgrace himself against a machine capa ble of eight- and nine-ply searches. But he had, after all, offered the prize, and felt he could not long postpone this showdown while getting back into practice. Besides, settlement of the wager with McCracken was overdue, and it was unsporting to remain unbeaten by simply not playing. Don Beal arranged a four-game match as an adjunct to the fourth Advances in Computer Chess conference scheduled for April 1984 in London. U.S. Master Danny Kopec agreed to act as Levy’s second. Not only active in over-the-board play, Kopec was also expert on computer chess play. He arrived in London three days before the match for intensive training with Levy; for their preparation, they played speed games and discussed strategy and openings. Brief as it was, the preparation was sufficient. Playing Black in the first game, Levy took the machine out of book immediately. Soon he had created a position that a human master would consider inferior, but which a program would likely misplay. Sure enough, Cray Blitz misevaluated the situation, and ceded its advantage. Then the comput er got into time trouble because communication delays over the voicelink to Minneapolis had not been accounted for. The game ended when Cray Blitz, in an already-lost position, discovered that its time had expired; its instantaneous play brought near-instantaneous loss. Cray Blitz forfeited the second game. Its computer was down, and could not be restarted for an hour and a half. The game was played as a friendly encounter, that is, having no influence on the match score, with a shortened time control. Levy won this game by creating a blocked pawn formation, for he knew that without open lines, a pro gram has difficulty in creating play whereas the strong human has little trouble preparing an advantageous breakout through slow ma neuver. After the second game, the Cray Blitz team modified their program to dissuade it from accepting blocked positions. Any program change entails the risk that what worked before might work no longer, and a defect soon became apparent. Cray Blitz successfully avoided a locked pawn structure, but made an all-too-human blunder—a decidedly un sound sacrifice— and was in trouble. A few moves later the computer crashed; during the wait for the engineer the game ended by time forfeit. Levy played a relaxed fourth game. Once again, he reached a posi tion that would be inferior against a strong human, but was difficult for a program to assess correctly. He offered to swap Queens when he
HUMAN VERSUS MACHINE
99
knew that Cray Blitz, having the advantage, would accept. In this position, however, the machine’s superiority lay in keeping the Queens on the board; once it relinquished this advantage, its game grew steadily worse and presently the programmers resigned. Levy sug gested that the score probably would have been the same even without hardware problems, simply because Cray Blitz is unable to create the type of position in which it plays best. Levy’s 4—0 shutout of Cray Blitz suggested that a well-prepared human still had a substantial edge over a chess computer. But soon, dedicated brute-force chess engines built with custom hardware raised machine play to a level that surpassed all but the top masters. By the end of the 1980s, machines playing at tournament pace were capable of full-width, ten-ply search guaranteed to miss nothing that might bring material advantage within this purview. Very few masters can hope for sustained play without the slightest lapse. On the other hand, there are many good, and even winning, moves that a master will play with hardly a thought, but which even the most capable machines will disregard. In 1982, Danny Kopec and Ivan Bratko published the results of their Bratko-Kopec Experiment, a comparison of human and machine performance on a set of problem positions. Half of the problems allowed tactical solutions that brought material gain, defended resourcefully, or improved objective positional features. In each of the remaining problems, an advantage could be obtained through a pawn-lever, a type of positional move in which a pawn offers itself for trade under conditions that lead to improvement in one’s own pawn structure or damages the opponent’s.6 For both problem categories, the higher a player’s rating, the more problems were solved. When the same problems were presented to computer programs, only the scores on the tactical problems showed any parallel with rated playing strength. The application of a pawnlever was quite beyond most programs. Their scores on lever problems seemed almost due to chance; few programs solved more than half, and failures were uniformly distributed. Deep search, it seems, does not always produce best play. If a player is not overwhelmed by tactical
8 The ideas behind pawn-lever moves are particularly well explained in Hans Kmoch’s book Pawn Power in Chess (1959/1990), which has become almost essential reading for the club player who aspires to successful play against machines. 7 In a 1990 repetition of the Bratko-Kopec experiment, Tony Marsland found that even the deep-searching Hitech and Deep Thought solved no more than half the lever positions.
100
CHESS AND MACHINE INTUITION
computations, even a little knowledge of how to recognize and assess critical pawn formations can swing the course of a game. At the slower pace of postal chess, the human player is less prone to tactical error and a brute-force opponent is less formidable. In a 1988 encounter, IM Mike Valvo played a two-game correspondence match with Deep Thought via electronic mail. He sent along his move-bymove comments to provide the chess players following the games on the (semipublic) network with his appreciation of the proceedings. The contestants both had ratings near 2500, based on over-the-board play at tournament speed. While a more rapid pace probably would have favored DT, the time rate of three days per move benefited Valvo. At the search horizon, the time needed for each additional ply is so great that brute-force machines derive little benefit from the extra time. Valvo won both exciting games. Valvo noted that the machine displayed surprising tenacity when defending a losing position, and gave up ground very slowly. He was also surprised that his intuitive sacrifices were so successful, and suggested that the most effective strategy against the brute-force chess engine lies in seeking ways to give up material in return for longrange positional advantage. He cited an example from the second game:
Valvo recalled: “When the ideas behind Ba6 began to flow for me, I ’knew’ it was the move I wanted to make . . . the more I looked, the better it felt. . . . Still, it was not without a great deal of trepidation that I mailed off this piece sacrifice” (Valvo, 1989). This stroke decided the game. Deep Thought examined continuations to a depth of 35 plies, and found nothing better than the thrust b4 forking Queen and Knight. The machine hung on for another 34 moves before the Deep Thought team decided further play was hopeless.
HUMAN VERSUS MACHINE
101
GM Robert Byrne travelled to Carnegie—Mellon University to as sess thoroughly the play of Deep Thought. Like Valvo, he was im pressed by the engine’s remarkable tenacity at defense. He commented on DT’s apparent difficulty at forming a plan, and on the nonhuman playing style that results. Often, he noted, DT would follow “secondrate” strategies by placing some pieces on less-than-optimum squares, but even with slight initial misplacement, DT was able to make good use of the position it created. Unlike the human, who embarks on risky, perhaps incorrect, or even sacrificial maneuvers to avoid being squeezed into a cramped position (and the intuitively sensed eventual loss), DT would remain passive. In a curious turn-about, its strategy had almost become “do nothing, but do it well” as it waited, so to speak, for the opponent’s tactical misstep. In October 1989, the first confrontation between reigning human and computer world champions took place. Garry Kasparov and a sixprocessor version of Deep Thought capable of examining more than two million positions per second competed in a two-game match under a 90-minute-per-side time control. Kasparov prepared carefully for the match, analyzing some 50 of DT’s games. He decided he had no need to adopt a special anticomputer strategy, but would simply stick to his own sharp style. Playing Black in the first game, Kasparov easily side stepped Deep Thought’s thrusts, achieved equality, and gradually as sembled his forces for the break-through. When it came, DT’s immense tactical ability could not prevent loss of a piece and soon, with pawn promotion imminent and every branch of the tree showing a more than four-pawn deficit, Deep Thought resigned. The partisan audience cheered. With the White pieces for the return game of the now-unloseable match, Kasparov decided to beat DT at its own game. He adopted the sharp, tactical style of the brute-force chess engine in an attempt to “crush it in the opening.” Despite being outnumbered millions-to-one in positions examined, Kasparov examined the right ones, for he quickly outplayed Deep Thought.8 In the spirit of John Henry, Kas parov declared himself prepared to take on any improved successor for the honor of the human race. He opined that man can always use his brain to come up with something new to beat the machine. That “something new” must be perceived intuitively, and not com puted. The best anticomputer strategy lies in directing the course of the game along lines in which knowledge becomes important and 8 Club players can only marvel at the return game: d4 d5, c4 dc, e4 Nc6, Nf3 Bg4. d5 Ne5, Nc3 c6, Bf4 Ng6, Be3 cd, ed Ne5, Qd4 Nxf3 + , g f Bxf3, Bxc4 Qd6, Nb5 Qf6 , Qc5 Qb6, Qa3 e6, N c7+ Qxc7, Bb5+ Qc6, Bxc6+ be, Bc5 Bxc5, Qxf3 Bb4 + , Ke2 cd, Qg4 Be7, Rhcl KfB, Rc7 Bd6, Rb7 Nf6 , Qa4 a5, R c1 h6, Rc6 Ne8, b4 Bxh2, ba Kg8, Qb4 Bd6, Rxd6 Nxd6, Rb8+ Rxb8, Qxb8 + Kh7, Qxd6 Rc8, a4 Rc4, Qd7 resign.
102
CHESS AND MACHINE INTUITION
computation inadequate. “Do nothing, but do it well” is no longer an effective strategy against the top brute-force chess engines. Levy refor mulated Botvinnik’s aphorism to fit machine play: The art of defeating chess programs lies in creating positions where the program’s evaluation function fails to account for the true relative val ues as perceived by a human chess master. (Levy & Newborn, 1990)
Still, the tactical precision required to hold off a brute-force machine while creating such positions can be difficult to maintain. Two months after the Kasparov-Deep Thought encounter and more than twentyone years after establishment of the Great Wager, the long-awaited milestone in computer chess was reached: David Levy lost a match to a computer. Donald Michie arranged the match. He obtained approval from Omni Magazine, which had put up the balance of the $5000 prize offered in 1978 that extended the original challenge. Except for the four-game shutout of Cray Blitz in 1984, Levy had not played a serious game for a decade. In the month before the match, he played in Blitz and Action Chess9 tournaments. As with the Cray Blitz encounter, IM Danny Kopec acted as his second, and again, just before the match, Levy underwent three days of intensive training. He still had a reputation as the world expert in anticomputer play: British bookmakers favored Levy at three-to-one odds. The match was played in London, at the headquarters of the British Computing Society, though Deep Thought remained in Pittsburgh, and participated via telephone link. Despite the coaching, Levy was no match for the ex traordinary middle game strength of Deep Thought; this Hum anMachine encounter was a 4 -0 wipe-out. At the very top level of chess, intuition still prevails over brute force. The original John Henry could strike one accurate blow after another in a sustained effort by intuitive10correction of hammer trajec tory. Ever fewer chess players are able to sustain one accurate move after another to outplay the steam drills of our century. 9 Action Chess is played under a time limit o f half an hour for each player for the entire game. 10 Muscle coordination during exercise of a complex motor skill is also largely intu itive; indeed, if conscious attention is paid to the details of an ongoing activity, that activity is likely to be disrupted. One is reminded of the diversionary trick o f asking, say, a golfing partner “ Do you inhale or exhale when you start your swing?” and observing the next stroke go awry when attention is focussed on respiration. Expert users of a complex motor skill will advise: “ Don’t think, act. If you pause to think what you are doing, you screw up.”
chapter 8
Custom-Built Hardware
Bell Laboratories has a long tradition of drawing together enthusiastic and talented people, and of providing an atmosphere that invites collaboration. In such an environment, unexpected scientific and tech nological advances are apt to pour forth. Some, such as the transistor, are spectacular enough to attract immediate worldwide attention; oth er significant collaborations, for example, the discussions between Alan Turing and Claude Shannon on machine chess, may go unnoticed for decades. A quarter of a century after Turing and Shannon, Bell Labs brought together Ken Thompson and Joe Condon in a collabora tion that would produce a dedicated chess engine. From the era of electromechanical logic machines until well into the first decades of electronic computers, circuitry had been built piece meal from components that were individually installed, individually tested, and subject to individual failure patterns. Since an increased number of interacting components usually results in an even more rapidly increasing failure rate, designers soon encountered complexity barriers, beyond which all designs appeared inherently unreliable. Some barriers were overcome. A few devices with large numbers of components could be made to work reliably by building them with replicated, standardized, independently testable modules. Flowers made free use of this design principle in his colossal achievement at B.P. Another jump in the complexity of logical circuitry took place when vacuum tubes were supplanted by transistors as active elements of computing circuits. Since they needed less power (and cooling), com puting modules could take the form of printed circuit cards to which individual resistors and transistors were hand-soldered. The cards plugged into a cross-wired backplane, which provided power and input signals and routed outputs to other modules. The chief design difficulty 103
104
CHESS AND MACHINE INTUITION
was to make the equipment easy to test and maintain by limiting the number of signal paths between modules, thus controlling complexity. Then came integrated circuits. Layers of material, conducting and semiconducting, capacitive and resistive, deposited on a silicon sub strate could contain all the components of a printed circuit card stud ded with transistor cans. The chief restriction on IC design was the pin-out, the limited number of signal paths that could connect to a single chip. Each IC accomplished what earlier had required an entire printed circuit card; the printed circuit board on which the ICs were mounted now served as had earlier the backplane; and another level had been added to the hardware hierarchy. The complexity now pro vided by microelectronics made custom circuitry for chess operations practical, and would presently appear as a chess engine named for Bell Labs. Thompson wrote the first version of Belle as a conventional bruteforce program for a general-purpose minicomputer in 1972. It achieved an even score in the 1973 ACM Computer Chess Championship and, after some tinkering, scored 3-1 the following year. Fine-tuning brought diminishing returns and Thompson felt he had reached the limits of the minicomputer’s capability. In brute-force chess competi tion, one can do only so much with a clever program: The champion ship is still likely to go to the fastest machine. But before reprogram ming for a faster, larger computer, Thompson looked for alternatives. He soon convinced himself that without some intellectual break through, selective search could not be made practical. Well, if it has to be full-width searching, it must be done efficiently. How about add-on hardware to tear through position evaluation? To explore this possi bility, he sought the advice of Joe Condon, a hardware designer at Bell Labs. Tasks that choke a serial processor, yet might be easily handled in parallel, are natural candidates to embody in hardware. Move generation—presenting legal moves in useful order for minimax evaluation— immediately sprang to mind. Condon and Thompson also saw a way to do it. Allen Newell and Ed Fredkin had both commented on the curiosity that, although the number of chess positions is far beyond human capacity to grasp, the number of chess moves is quite modest. Over all possible positions, there are at most 77 “everpossible” moves to any given square; moreover, the total number of possible moves must be less than 64 x 64 (number of source squares times number of destination squares). In principle, custom hardware could examine a position to identify all legal moves at once, and different circuit elements operating simultaneously might even order them by priority.
CUSTOM-BUILT HARDWARE
105
In 1976, Condon built a hardware move generator and connected it to the minicomputer as a peripheral device. Thompson rewrote his program to carry out a full-width search. They entered the new Belle in the 1977 World Computer Chess Championship as a clear underdog against the giant main-frames. Despite being hopelessly outclassed in computing power, their hybrid managed a respectable tie for fourth place. Encouraged by this initial effort and convinced of the promise of chess-specific hardware, they prepared a second machine for the 1978 ACM Computer Chess Championship. Condon designed and built two additional peripheral devices, one to evaluate static positions and the other to act as transposition memory for rapid recognition of previ ously evaluated positions. The minicomputer orchestrated the opera tion of the three peripherals and controlled the a- b algorithm. This combination enjoyed immediate success: Belle swept the championship 4-0. In the final round of the championship tournament, Belle played Black against Blitz, a brute-force program that in a few years, after transplantation to the Cray supercomputer, would itself win the com puter chess championship. This encounter showed the power of Belle’s extensive opening library coupled with its ability to look far ahead. Hans Berliner described the game as the most brilliant yet played by a computer and suggested that if Mikhail Tal or Bobby Fischer had played it, the game record would be making the rounds of the chess journals. North American Computer Chess Championship Washington, D.C. 1978 Black: B el White: Blitz e5 1. e4 Nc6 2. Nf3 Nf6 3. Nc3 Nd4 4. Bb5 Bc5 5. Bc4 Qe7 6. Nxe5 Kf8 7. Bxf7 + 8. Ng6 + hg Nxe4 9. Bc4 10. 0 - 0 Blitz exhausted its opening book early on and gobbled a pawn on the sixth move. Belle was still following its opening library and developing effectively. Blitz selected an aggressive-looking, but unsound, seventh
1 06
CHESS AND MACHINE INTUITION
move; the situation worsened with the eighth; and after its tenth move Blitz was piece down for the meager compensation of pawn and Black King exposure. The resulting position might well have come from a set of chess problems, and indeed, if presented as a problem, a human expert would have little trouble finding the solution. Capturing the h2 pawn and checking with the Queen is a common sacrificial pattern. In over-the-board play, however, the expert might well go astray in de tailed calculation and, seeing a White capture with check, overlook the devastating reply. Belle’s deep search simply found that the initial move leads to material gain in all variations:
10. 11. 12. 13. 14.
•
•
•
Kxh2 K g1 Qh5 fg +
Rxh2 Qh4 + Ng3 gh Nf3 + +
Black’s twelfth move is the kind amateurs dream o f finding over the board. It forces White to capture with check, but allows a checkblocking double check that mates—as beautiful a finish as one could hope for. The delighted Ken Thompson commissioned a custom-made T-shirt showing the final position and game score, and wore it proudly to the following year’s championship. Belle was not only successful in tournaments. In 1945, Fred Reinfeld had published Win at Chess, a collection o f 300 chess problems. To pique interest (and, perhaps, sales) he noted that a chess master could be expected to find the right line of play for about 90 percent of the
CUSTOM-BUILT HARDWARE
107
positions. The book enjoyed almost immediate popularity as chess players rose to the challenge. Even a beginner feels that, given suffi cient leisure to study positions known to permit decisive moves, surely the right moves can be discovered. Still, it is not difficult to find positions so unusual that expert players misjudge them (recall the difficulty of reconstructing positions composed of randomly placed pieces), and few players attained the ninety percent score without peeking at the solutions. Besides providing entertainment for a gener ation of human players, Reinfeld’s collection of problems became a standard test for computer chess programs. When tested with this collection, Belle missed the best line in only 20 cases, and readily found the two known cooks (refutations of the proffered solutions or un suspected alternative solutions) reported since publication. Condon and Thompson were astonished, and elated, to find that Belle had uncovered an additional seven cooks in the problem set! In 1979 Belle achieved an official USCF rating of 1884 against human competition. Yet, successful as this version of Belle was, one limitation could not be remedied by fine-tuning the software. The hardware generated moves in strict order, square by square. Since the order of examination rarely coincided with likely move priorities, a- b efficiency was seriously impaired. In 1980, Condon and Thompson decided to retire the machine and to design a new version for the fall World Computer Chess Championship. The new machine contained even more chess-specific hardware. The move generator consisted of 64 transmitter circuits and 64 receiver circuits associated with the chessboard squares. Transmitters adver tised the presence of pieces on their squares; their signals enabled each receiver to recognize attacks on its square. Because moves could be examined in order of capture value, search was much more efficient, for the number of a - b cutoffs increased significantly. Condon and Thompson completely redesigned the hardware evaluation function, too. They added a set of 64 circuits for recognizing pins and potential discovered attacks as well as signaling the square control exercised by attackers and defenders. Circuitry for sensing pawn structure signaled the presence and location of passed, blocked, isolated, and backward pawns, and identified open and half-open files. Finally, the designers transferred control of the a—p search from the host minicomputer to custom hardware. Condon and Thompson built the hardware with nearly 1700 MSI (Medium-Scale Integrated) chips, fitted into a 5.5-cubic-foot box weigh ing some 125 pounds. The new machine was, as they had hoped, superfast: The custom hardware scored some 160,000 positions per second, far more than any other chess engine, and did so in the right
10 8
CHESS AND MACHINE INTUITION
order. Belle easily won the 1980 World Computer Chess Championship and the 1980 and 1981 ACM Championships. Its rating surpassed 2100, and there was hope that by the end of 1981, Belle would reach the 2200 rating of a national master. Belle’s anticipated rapid rise was interrupted by an unexpected setback: The machine became a political detainee. Federal agents thwarted an attempt to take Belle to Moscow for an exhibition. If this custom-built, special-purpose computer were allowed to visit the Soviet Union, secrets of American technology might be revealed. Despite assurances that all Belle could do was play chess (and not nearly well enough to challenge top Soviet players), the officials remained unyielding. Belle was impounded and held for some weeks. After the detention, the machine was released to Bell Labs in time to prepare for the 1982 North American Computer Chess Champion ship. Thompson was sufficiently convinced of the utility of a compre hensive opening book that he typed the entire Encyclopedia o f Chess Openings into the opening library. His effort was rewarded when Belle won its crucial game against Nuchess by springing a recondite opening trap used by Morphy in a game played in the Cafe de la Regence in 1858. This win gave Belle the closely contested tournament on tiebreak. In October 1983, Belle attained an official USCF rating of 2203 to become the first machine with the title of National Master. With the ascendancy of Belle, the way to attain top performance in machine chess seemed clear: through full-width, brute-force search using custom-designed hardware. Improvements were desirable, for a 2200 performance is still well below that of a grandmaster. To surpass Belle and, perhaps, approach grandmaster play, new hardware was needed. Hans Berliner applied newer microelectronic technology to specialpurpose chess hardware, and soon made Carnegie-Mellon University once again a center for machine chess. An over-the-board chessmaster whose job commitments kept him off the tournament circuit, Berliner had turned to correspondence chess in the 1950s and was soon totally absorbed. During the next decade he attained the rank of postal GM and in 1968 won the Fifth World Correspondence Championship with the overwhelming, and still unequaled, score of 14-2. What is left after such attainment? In over-the-board play, he felt he could expect no better than fifth or so in the U.S. championship. Another world correspondence championship would cost about two thousand hours of intense analysis at four hours a move. He decided on graduate school and earned a Ph.D. at Carnegie-Mellon in 1975 with a dissertation on computer chess. He stayed on as senior research scien tist; one of his projects was overseeing the creation of a Backgammon
CUSTOM-BUILT HARDWARE
109
machine that could, at times, achieve world-class performance. What, colleagues wondered, kept Berliner from chess? He was indeed designing a chess engine. Berliner had intended to build hardware to carry out a modification of minimax in which a range of values is assigned to each position, thus providing a measure of evaluation uncertainty that might guide the search. The more he studied possible architectures, the more attractive the potential strength of a full-width a - b tree searcher became, and Berliner de cided to build a Belle-like machine. Belle’s construction with individually connected MSI chips required great design effort and laborious trouble-shooting. The even more complex operations Berliner had in mind suggested a need for customdesigned IC chips. An integrated circuit is composed of layers of con ducting, semiconducting, and insulating materials, deposited in a com plicated process involving photolithography and etching. Expensive masking of VLSI (Very Large Scale Integrated) circuits could be justi fied for a microprocessor or a cryptographic device, but chess was just not worth doing. Still, a chess engine is well suited for implementation in VLSI, for functions such as move generation and position evaluation can be carried out with limited communication between chips. When Berliner found he could use the governmental support provided to CarnegieMellon University by DARPA (Defense Advanced Research Programs Agency) for fabricating custom experimental chips, the challenge of high-technology hardware became irresistible, and Hitech was born. The first chess function to be embodied in VLSI circuitry was move generation. Following Belle’s successful lead, Berliner treated move generation as a pattern recognition problem: selecting from some 3600 possible features (about 1800 legal moves for each side)1. The idea was to examine a board position with the help of a specialized circuit for each ever-possible move. Each circuit would signal whenever its as signed move was possible. As in Belle, the trick was to use some kind of priority encoder to order the moves from likely-best to likely-worst. Each chip must keep track of the moves that have already been evalu ated. One of Berliner’s graduate students, Carl Ebeling, designed the custom hardware for move generation. Since the technology provided by DARPA was too coarse to allow the entire move generator to fit onto a single chip, he found it necessary to partition the circuit, and as signed one chip to each square of the board. To avoid specifying a 1 For ease of coding, the Hitech circuitry actually employs 2 13 - 8192 possible features: six bits each for source and destination, and a bit to specify side-to-move.
11 0
CHESS AND MACHINE INTUITION
custom-circuit for each square, Ebeling designed a universal chip, a single circuit that could be externally hard-wired to specify its location in the chessboard array. By replicating one circuit of some 15,000 circuit elements, design and fabrication costs could be kept within acceptable bounds. Laid out on a tray, these processors operate simul taneously during examination of a position, each one generating an ordered list of the pieces that can move to its square. Ebeling used programmable logic arrays to carry out a priority calculation in paral lel. In this process, each chip broadcasts the priority of its best untried move to all chips and compares its ranking with those offered by other chips; the chip with the highest priority move identifies itself. While Ebeling worked on hardware, Berliner busied himself with the evaluation function. He had seen how adding even a little bit of chess knowledge to the evaluation function can greatly improve a machine’s game, and he sought a flexible way to apply specific knowl edge when relevant. He was also intrigued by the possibilities of parallelism, that is, of simultaneously detecting not only elementary board features, but also significant combinations of features that ought to be considered when calculating a position’s value. In a serial ma chine, each positional feature and each feature combination must be recognized by a separate sequence of computational steps— a timeconsuming process that forced programmers to choose between quality and quantity. Berliner devised an architecture for evaluation that used layers of hardware recognition circuits to combine signals pro duced upon detection of elementary features. Eight global-state recognizers provide continuity in the evaluation of certain slowly changing board features. During the middle game, for example, the vulnerable King should be kept away from the action, but after exchange of heavy pieces, the King becomes a strong piece that is most effective in the center. The boundary between midgame and endgame should not, Berliner argued, be clear-cut. Instead, once dan ger of exposure has lessened, the King ought to migrate toward the center of the board to be in best position for the endgame. A globalstdte recognizer can rate King position as favorable when either mate rial is abundant and the King sheltered or material is scant and the King centralized. Hitech’s recognizers could distinguish significant piece configura tions within a position, that is, chunks. Berliner employed three levels of detection and combination in his general recognizers: The first detected elementary features, a second combined the signals produced by the first, and a final stage of mapping produced a value based on the second-level signals together with signals from the global-state recog nizers. Final evaluation consisted of adding the numbers supplied by
CUSTOM-BUILT HARDWARE
111
the general recognizers. The entire process took less than one micro second, the settling time of the recognizers. Specialized chess knowl edge such as pawn structures and endgame situations could be incorpo rated in the evaluation when relevant, and Hitech’s evaluation function could easily adapt to the particular chess situation. Adaptation was effected by the controlling computer, which served as an oracle that specified which features are relevant in the region of search. Before initiating a search, the master computer would down load tables to the evaluation hardware, which configured the recog nizers to detect particular feature combinations and specified their weights. Most of the programming consisted of writing rules based on specific chess knowledge to determine the likely importance of, say, a bad Bishop2 at the current stage of the game. Hitech made its tournament debut in May, 1985. The machine proved a formidable opponent: It was not only superb in short-range tactics, but it was also able to avoid strategic liabilities such as inferior pawn formations. The special-purpose hardware enabled Hitech to generate and evaluate nearly 200,000 potential positions per second or more than 30 million during the three minutes spent in selecting a move at tournament speed. As Berliner fondly pointed out, this is a greater number of alternatives than a human chess player examines in a lifetime. Hitech’s tournament record was impressive. In October, the ma chine swept the North American Computer Chess Championship tour nament. Hitech’s knowledge of pawn structure proved decisive during its encounter with the champion; it saddled Cray Blitz with an awk ward doubled pawn, which after exchange of heavy pieces became a lost endgame. The new North American Computer Champion scored 5.5-2.5 in the Fredkin Masters to bring its rating above 2300. Soon Hitech was playing in the large Open tournaments against senior masters, and by March it had played fifty tournament games. Its rating soon stabilized at about 2350, not only higher than that of any other chess program, but also 99 percent of all USCF-rated tourna ment chess players. Three years of hardware improvements and finetuning of software brought the rating above 2400 to make Hitech the first electronic Senior Master. The end of the 1980s saw the rise of an even more formidable custom chess engine: Deep Thought. In mid-1985 a Carnegie-Mellon graduate student, Feng-hsiung Hsu, decided that a move generator could be constructed on a single VLSI chip. Although without faculty sponsor 2
A bad bishop is one whose scope is limited by the presence of same-color pawns on
its squares.
112
CHESS AND MACHINE INTUITION
ship, Hsu was able to make use of the same governmental support for fabricating custom experimental chips that had spawned Hitech. He spent half a year compressing a Belle-type move generator into a single circuit that could satisfy the fabrication constraints; after an other anxious four months, Hsu received the first copy, which he connected to a small computer for initial test. To his delight, the circuit not only functioned correctly but processed at rates as high as two million moves per second, a ten-fold increase over Hitech! In mid-1986, Hsu was joined by Thomas Anantharaman, another CMU graduate student who had just written a brute-force chess pro gram. They replaced the software move generator with a makeshift link to Hsu’s custom hardware, which boosted the program’s search rate five-fold to some 50,000 nodes per second. Since this speed was competitive with the best chess engines, Hsu and Anantharaman de cided to enter their machine in the 1986 North American Computer Chess Championship. But the tournament was only seven weeks off. They enlisted two more graduate students, Murray Campbell and Andreas Nowatzyk, to help improve the evaluation function and to replace their provisional link with hardware better able to exercise the move generator chip. With time so short, some compromises were necessary. The search engine, for instance, ignored castling and detection of position repeti tion, for these could be handled by a parallel, shallow search conducted by the host computer. The new hybrid, which they named ChipTest, was only partially tested when the tournament started. It lost the first two games. Between-round improvements and easier pairings in the last three rounds permitted a rally, and ChipTest finished with an even score. During the tournament, Hsu made an interesting observation: Brute-force programs were playing into forcing lines that clearly ex tended well beyond their search horizons. Because the combinatorial explosion of full-width search imposes strict limits on depth of search, the outcome of a long sequence of forcing moves cannot be predicted. In such situations, it is pure chance that determines which competitor will reach the better position. Exactly this difficulty had been pin pointed by Shannon nearly four decades earlier as the chief drawback to full-width, fixed-depth search: misevaluation of nonquiescent leaf nodes. Hsu’s solution was also the same as Shannon’s: to extend the search tree selectively so that forcing lines are examined further. An elemen tary form of this principle already enjoyed wide use in brute-force programs for certain forcing moves. Not counting a move that evades a check or certain capturing moves as a ply of depth automatically
CUSTOM-BUILT HARDWARE
113
Problem 213 of Win at Chess
extends search along these lines, and does so without undue prolifera tion of branches. But Hsu wanted their program to recognize a greater variety of constraining situations, in which it is worthwhile to investi gate whether success or disaster lies just beyond that horizon. Hsu remarked that constraint is quite apparent when a piece is trapped. As search deepens, and the potential loss is detected, horizoneffect moves are interpolated to postpone loss. But fewer and fewer interpolations can be found, and finally no move is able to save the piece. Since other moves met earlier loss, the sequence of moves up to the final fall was, in effect, forced. To probe forcing continuations more deeply, Hsu devised the “singular extension” algorithm. The idea was to imitate the human chess player, who derives information about the nature of a move sequence from the search itself. A move is considered singular if its value is much better than all the alternatives, that is, when it appears almost forced, and it is worthwhile to explore further to obtain a more exact value.3 While Hsu tuned the microcode controlling the lowest level of hard ware, Anantharaman added the singular extension algorithm to ChipTest’s host computer. During self-play, a version with the new algo rithm outplayed its sibling by as much as 3-1, which suggests an improvement of nearly 200 rating points. When tested on the 299 problems of Win at Chess (Reinfeld, 1945) that have solutions, the 3 In his 1959 paper, A.L. Samuel suggested that selective search be guided by a - b difference. He observed that when this difference is small, both sides have nearly agreed on the score, and since little advantage to either side can be expected from further exploration, it is better to search a part o f the tree with greater a - b margin. He claimed that this approach often provided savings of a factor of two.
114
CHESS AND MACHINE INTUITION
enhanced ChipTest found all but two and, in one tour de force during a (nominally) eight-ply search, discovered a forced mate 35 plies deep from the position diagrammed on the previous page in little more than a minute.4 The ChipTest team prepared carefully for the 1987 North American Computer Chess Championship, which had the strongest field ever, including an improved version o f the World Computer Champion Cray Blitz. Hitech was not entered, but Berliner made the machine avail able as a sparring partner; the ChipTest engine won 11 o f their 16 games. The 4 -0 sweep o f the championship tournament was hardly a contest, for ChipTest was much faster than any other machine, gener ating and evaluating between 400,000 and 500,000 positions per sec ond. In the third round, ChipTest played Black against Cray Blitz and quickly showed the strength o f the singular extension algorithm: After an aggressive move by Cray Blitz, ChipTest settled into an extended calculation and continued with a sharp line that led to a swift ending, which it had followed well beyond its opponent’s horizon. At first, the connection between the Hitech and ChipTest teams had been cordial and supportive. Some o f the Hitech code, heavily modified, had, with Berliner’s consent, been used in ChipTest. But the collabora tive atmosphere changed after ChipTest won the 1987 NACCC. When ChipTest was invited to play in the 1987 American Open, Berliner objected. Hitech was playing in the same tournament and, after all, the two engines did share some code. The ChipTest team acquiesced and did not enter, but the rift had been established and now the groups were rivals. The break occurred just as the new champion was about to be discarded. As in the development o f Babbage’s Difference Engine, the process o f designing and implementing ChipTest had revealed so many possibilities for improvement that the designers preferred to build a completely new machine. It is almost a tradition that the first version is built to be thrown away (and occasionally the second and third as well). ChipTest hardware could run at a higher speed, search could be improved, and, as Cray Blitz and a few other chess engines had shown, the search space can be divided among several processors to examine more tree. Most important, ChipTest’s success had brought valuable recognition: Hsu now had a faculty advisor at Carnegie-M ellon Uni versity who could obtain seed money. The Deep Thought project was underway.
4 The forced mate in eighteen is: Rxh7+ Kxh7, Q h5+ Kg8, Rxg7+ Kxg7, Bh6 + Kh7, Bg5+ Kg7, Qh6+ Kf7, Qf6 + K g 8 , Q g6+ Kh8, Bf6 + Rxf6 ef Q e1 + , K xe1 Nc2 + , K f1 Ne3+ , fe Rd7, Q e8+ Kh7, Q xd7+ Ne7, Q xe7+ Kh6, Q g7+ Kh5, Q g5++ .
CUSTOM-BUILT HARDWARE
115
The Deep Thought chess engine was built around a pair of custombuilt processors, each o f which included a VLSI chip to generate moves. These were mounted on a single printed circuit board along with an additional 250 integrated circuit chips that controlled the search and evaluated positions. Each processor could evaluate at a rate of a half million positions per second by adding values retrieved from tables. Besides material balance, the evaluation function could weigh some 120 board features, including center occupation, mobility, pawn struc ture, King-protection, and Rook control of files. A special program provided off-line learning by automatically fine-tuning the weights associated with board features according to the moves played in a training set o f GM/IM games. A two-processor engine, controlled by a main program running on a SUN workstation, could exhaustively examine a middle game tree to a depth o f ten plies in three minutes. In pursuit o f its goal o f encouraging machine chess, the Fredkin Foundation established prizes for certain milestones, which included $10,000 for the first machine to achieve grandmaster level perfor mance, and $100,000 for winning the World Championship. (Condon and Thompson had received $5,000 when Belle became a National Master.) The Foundation also sponsored competitions to test the strength o f top chess machines against well-prepared masters and grandmasters. With an eye toward the grandmaster prize, Hsu consid ered the 1988 Fredkin Masters Open an ideal arena for the debut of Deep Thought. Preparation o f the machine for its first competition was even more o f a last-minute rush than had been the case with ChipTest: The day before the tournament found the circuit board still being wirewrapped. The hardware could only be driven at half speed. Because the evaluation function was not yet tuned, play was based entirely on deep search augmented by (equally poorly tuned) singular extensions. Dur ing the very first game, the Deep Thought team was embarrassed when their machine played the same space-grabbing, antipositional rook pawn advance essayed by Turing’s paper machine over three decades earlier. Still, its purely tactical deep search was sufficient: the machine scored four and a half o f six points against master opposition to earn a USCF provisional rating of 2599/6.5 In November 1988, Deep Thought took part in the Software Tool works Open Championship held in Long Beach, California. The ma 5 An unrated chess player obtains an established rating only after a statistically significant number o f games have been played. During the first twenty rated games, a provisional rating is calculated by averaging the ratings of the opponents, (after adjust ing 400 points upward for a win, or downward for a loss). This procedure is also used to calculate a player’s performance rating for a single tournament.
116
CHESS AND MACHINE INTUITION
chine scored six and a half points in eight rounds against stiff competi tion for a “personal best” performance rating o f 2745, over 100 points better than its previous record, and raised its rating to 2550. Along the way, Bent Larsen acquired the distinction o f being the first interna tional grandmaster to lose a tournament game to a computer under regular time control. There was more to celebrate. The tournament result brought Deep Thought well above the Fredkin grandmaster norm, defined as a per formance rating above 2500, sustained against qualified opponents over 25 consecutive games. Professor Ed Fredkin’s $10,000 Intermedi ate Prize went to the Deep Thought team. Even though it has proven it can play like one, Deep Thought cannot be considered a grandmaster. For the granting of that title, FIDE rules exclude all games in which a computer participates.6 The next goal was the World Computer Chess Championship, an event held every three years since 1974. Unlike the World Champion ship, with candidate tournaments and candidate matches leading up to the main event between Champion and Challenger, the machine ver sion is decided by a single Swiss-system tournament. The drawback to this arrangement became clear at the very first WCCC when the two strongest programs, KAISSA and CHESS 4.0, did not meet. The top contenders did meet at the Sixth World Computer Chess Champion ship held in 1989 in Edmonton, Alberta, in an eagerly awaited match up o f supercomputer and dedicated chess engine. Since winning the title six years earlier, defending World Champion Cray Blitz had held off the custom-designed opposition. The program had evolved with its supercomputer host to take advantage o f the everincreasing capacity and speed, and the Cray Blitz programmers had found a way to apportion the search among the processors to keep them all busy. Now running on the Cray Y-MP with eight processors capable of some 1.3 giga-instructions per second, Cray Blitz was able to conduct a nine-ply, full-width search at tournament speed. It was not enough. Deep Thought’s deeper search outclassed all competition; the machine beat both Cray Blitz and Hitech in a 5 -0 sweep to become the new World Computer Champion. Possession of a championship title guarantees being put to the test again and again. Deep Thought now competed in the rarefied frater nity of World Champion contenders, where its deep search was still no 6 In a 1988 reply to a USCF inquiry about computer rating, Professor Lim Kok-Ann of FIDE explained that, according to the official view, a game played by a computer is a game played by a human with machine assistance; since obtaining assistance is a violation of the Rules of Chess, no such game can be rated.
CUSTOM-BUILT HARDWARE
117
match for deep knowledge. Nor was Hitech out of the running. In November 1990, Hans Berliner marked two full decades of entering machinery in national computer chess tournaments. His entry had been the favorite at the first ACM tournament in 1970. Now, in the 21st NACCC, Berliner’s improved Hitech was seeded second to Deep Thought, which had won their game in the previous year’s tourna ment. The fourth-round encounter of the arch rivals was closely watched, for in recent meetings, Hitech had emerged from the opening with an edge, but each time had dithered to let Deep Thought take over initiative and game. This time Hitech outplayed Deep Thought, using positional knowl edge to mount a successful Kingside attack while DT’s heavy pieces were out o f play on the Queenside. Soon, all that remained was win ning the won game. But what a won game! Just as two opposing grandmasters collaborate in the creation o f a chess masterpiece, the two tactical engines collaborated to produce a game quite alien to the styles of human masters. At one point, Hitech had three pieces en prise; according to deep tactical computation, none could be taken safely:
Hitech-D.T. after 25. Ne5
Tournament Director Mike Valvo threw up his hands: “Only comput ers can reach such positions!” Despite the win over Deep Thought, Hitech did not win the tourna ment. The 21st NACCC used a “sudden death” time control— each machine had two hours to complete the entire game. In the final game, after a marathon 145 moves in a drawn position, Hitech’s flag fell. Because the opposing program still had sufficient material to mate, the game could not be adjudicated a draw (a rule that Berliner himself had urged be adopted); Hitech forfeited the game on time.
118
CHESS AND MACHINE INTUITION
The struggle for computational supremacy through custom chess hardware continues, with particular emphasis on knowledge that can be included in a parallel evaluation function. Still, pure brute force may yet outrun improved evaluation. Hsu believes that a giganodeper-second search rate will be attained in a few years, which corre sponds to some 14 plies o f exhaustive search. He plans to compress Deep Thought into a very efficient and very fast single chip, which will then be replicated a thousandfold. He exuberated in (Hsu, 1990): “ Within five years, some machine’s rating will be in an unheard-of region.”
chapter 9
»
Computable Subgames
At the 1977 meeting o f the International Federation for Information Processing, Ken Thompson o f Bell Labs demonstrated a remarkable chess program. With no lookahead whatsoever, he claimed, it could play the King and Rook against King and Queen (KRKQ) endgame perfectly. Thompson invited two International Masters who were pres ent, Canadian Champion Lawrence Day and former World Correspon dence Champion (and later chief architect of Hitech) Hans Berliner, to show their prowess at winning the Queen side of the endgame. They were delighted to accept. They knew well that, apart from a few anomalous starting positions, the Queen side can always force a win, although endgame books caution that the process can be rather compli cated. The process did prove complicated for Day and Berliner. Even though every position encountered in their attempt was a win, to their embarrassment and annoyance they were unable to find a winning line. Again and again the program managed to weasel out o f seemingly tight spots. The machine conducted its defense in a manner so strange and counterintuitive— one is tempted to say alien— that the IMs re peatedly failed to find the right continuation. Reuben Fine summarized KRKQ in his Basic Chess Endings (1941): “In order to have drawing chances Black must keep his Rook near his King, for otherwise a check will capture the Rook. The basic winning
idea is to force Black into zugzwang, so that he will have to move his Rook away from his King." Implicit in this advice is the assumption that separation o f King and Rook will lead the defender onto quick sand where an alert opponent can force a skewer attack or fork that fells the Rook. Yet Thompson’s program repeatedly separated King and Rook without, however, leaving any obvious opportunity to collect the Rook through a series o f checks. 119
120
CHESS AND MACHINE INTUITION
The masters’ failure was particularly baffling, for management of the endgame has long been the mark of the chess master. It is an area in which chess knowledge is clearly superior to blind search. In stark contrast to the middlegame with its tactical complexities, the endgame is one o f positional subtlety in a looking-glass world where chess pieces can assume unusual values. The King, which during the middlegame was vulnerable and had to be carefully protected, suddenly emerges as a powerful piece that can move into the opposing pawns as a cat among pigeons. The outcome o f an endgame often depends on the ability o f one side to lose a move, and zugzwang— the obligation to move, even when one would rather not— becomes a powerful force. One might suppose that the reduced forces and greatly diminished number of possible positions make exhaustive computation feasible for many endgames. Unlike the middle game, in which a position can only be assessed with the help of an approximate (and therefore not always reliable) evaluation function, an endgame position can in principle be assigned an exact trinary value. Every position is won, drawn, or lost. Minimaxing could, in principle, determine this status if the game tree can be extended far enough, but even when restricted to a simple subset of chess positions, the combinatorial explosion usually makes minimaxing impractical. The blocked-pawn position from a 1901 game illustrates the subtle precision often needed for endgame play:
Lasker-Reichhelm Reuben Fine used this position in his Basic Chess Endings to illustrate the notion of far opposition, in which keeping an odd number o f moves between the Kings maintains a decisive zugzwang. With this principle in mind, straightforward analysis shows that White-to-move can win with K b 1, and Black-to-move can draw with Kb7.
COMPUTABLE SUBGAMES
121
Though only King moves are possible— which severely limits the branching factor— brute-force minimaxing to the point where a pawn falls and the game is effectively decided is beyond the capability of most chess computers. (Cray Blitz was able to find the White-to-move solution in slightly more than a minute by examining more than four million nodes; its million-entry transposition table substantially short ened the search.) But look at the position again. Only the Kings can move. If either King penetrates to the opponent’s board half, the game is decided, and if neither can, it is a draw. Before penetration, each King maneuvers in his own backfield, ranging over fewer than thirty squares; the number o f configurations is only a few hundred, and the number o f positions (with specified side-to-move) twice as great. Now, let’s see . . . if the White King can get to b5, it must be a win. Therefore, if the White King is at c4, and the Black King is on ranks seven or eight, then K b5 wins. Wait a minute! Since we’re only talking about a few hundred positions, tops, why should I break my head? Let a computer do it. In its simplest form, retrograde enumeration (or maximin) is a process of constructing a table that includes all legal positions that can occur in a particular endgame. Starting with terminal positions, which might include those in which a reduction in material results in a known outcome, a brute-force backward search computes best-move predecessors to already-categorized positions and enters them in a table until the status o f every legal position has been ascertained. Furthermore, not only the status, but also the best move to maintain that status can be included in the table. This was the idea underlying Thompson’s program. It contained a precalculated look-up table containing every possible KRKQ configu ration (some four million entries) along with the best moves, that is, the moves leading to the quickest win for the Queen side and maxi mally postponing Rook side defeat. The frustrated Day and Berliner were naturally curious about the program’s strategy, but the explana tion was a disappointment. The program’s slippery behavior was sim ply the result of its selecting the longest path, however twisting, to inevitable loss. Although Thompson’s machine embodied perfect knowledge o f the KRKQ endgame,1 it was not expressible in familiar strategical and tactical concepts understandable by human chess play ers. Donald Michie described such play as “ Martian chess,” for no
1 In practice, the perfect play o f his endgame database has been sufficient to wrest a draw about 80% o f the time from human masters. Even grandmaster Walter Browne lost a wager to Thompson on his first attempt, but was able to recoup it later.
122
CHESS AND MACHINE INTUITION
human explanation of the program’s behavior seems possible save for the unilluminating “ Here’s the table with all the moves.” Former World Champion Botvinnik explained the irrelevance o f database information to his own endgame preparation: In 1954, in the Amsterdam Olympiad, I was playing Minev of Bulgaria. We reached an endgame with Queen and pawn against Queen. I an alyzed the position until two in the morning, and established a simple rule: the White King has to be on the same rank as the Black King, or on an adjacent one. Once I had found this rule, everything was clear to me when I went to resume play in the morning. Minev wasn’t aware of the rule. All he knew was the very complicated analysis by Keres. He was armed with stacks of positions and variations. And Minev very quickly lost. In a case like that, Keres’ analysis is as much use as a computer’s. A chess master needs to know rules. (Keene, Levy, & van den Herik, 1988)
Just so. A chess player’s endgame representations must be far more economical than a table with thousands of entries, for even masters flounder when confronted with a Martian strategy that has no compact description. Endgame knowledge, too, is largely intuitive; a player notices meaningful structures, which call to mind ways to create con figurations the player feels competent to handle. Endgame treatises and compendia build intuitive competence by illustrating important configurations with examples, and by exposing the student to a multi tude of endgame situations that show which chunks in what combina tion are significant, and how they are meaningful. Still, no set o f examples seems representative enough to provide more than marginal understanding, and the individual methods of endgame play that develop during assimilation of published tradition al lore are often less than optimal and occasionally erroneous. Worse, much o f the authoritative endgame knowledge on which these methods are based turned out to be deficient, and some widely believed claims were just plain wrong. Only when brute-force computation was applied to chess endgames did the extent o f the deficiency become apparent. King-and-Rook against King is the simplest generally winnable endgame (this is the chief reason Torres y Quevedo selected it for automation). In his Basic Chess Endings, Fine (1941) devotes half a page o f text to KRK, and offers a diagram of a position claimed to require sixteen moves to mate. He shows how to constrain the lone King to force it to an edge for the mate, and warns o f stalemate possibilities. Fine’s compact guidelines are quite sufficient for the beginning chess player, who quickly grasps the principles involved and thereafter has little trouble winning KRK endgames.
COMPUTABLE SUBGAMES
123
Yet Fine slipped in his exposition of even this simple endgame. In an early brute-force computer tabulation of endgame positions, Clarke generated an exhaustive table o f legal KRK configurations (which, after suppressing duplications due to symmetry, contained some 28,000 different positions), and calculated optimal move sequences by retrograde enumeration. His table revealed that with best play by both sides, mate can be forced from the position diagrammed in Basic Chess Endings in only 15 moves. Moreover, mate can be accomplished from any legal KRK position in no more than 16 moves, and not 17, as Fine had claimed. This is a rather trivial discrepancy in a trivial endgame, especially since Fine was concerned with explaining correct play rather than optimal play. Optimal play is rarely necessary in winnable endgames against a lone King; indeed, in an over-the-board game, correct play that assures the win is often preferable to seeking the shortest path to mate. But in more complicated endgames, exact play may be essential to win. Enumeration o f winning pathways could, perhaps, provide clues to a fresh understanding o f how to conduct an endgame. Not all endgames are candidates for retrograde enumeration. To be at all useful, a database must contain information that can affect the outcome o f a game. The KBKN endgame, for example, is preordained. Except for anomalous starting positions with a cornered King blocked by its own piece, it is dead drawn and is not worth the effort of a table. Replace the Bishop by a Rook, however, and the endgame is no longer trivial. KRKN, too, is often thought o f as a drawn game, yet sustained over-the-board play good enough to reach a draw is often difficult, and even endgame specialists have misjudged some KRKN positions. KRKN encompasses a set of positions small enough to allow exhaus tive enumeration. There are, after discarding duplications resulting from reflections and rotations, about 1.7 million distinct ways two place two Kings, a Rook, and a Knight on a chessboard. When side-tomove is specified, this number is reduced by those cases in which the King of the side not on move is in check. With Rook side to play there are some 1.3 million legal positions and with Knight side to play nearly 1.6 million. Although 3 million positions are too many for a person to enumerate accurately, the complete KRKN subgame can be readily analyzed by even a small computer. An endgame database can be thought of as a two-part table— one section for all positions with White-to-move (WTM), and a complemen tary portion covering the same piece configurations with Black-tomove (BTM). A legal move connects a position in one table to its successor in the other, and a winning strategy specifies a path for each
124
CHESS AND MACHINE INTUITION
winnable position that alternates between the WTM and BTM tables to arrive at a winning terminal position.2 Construction of a database starts with the terminal positions. These include checkmates and stalemates as well as the immediate capture of material without compensation that reduces the problem to another endgame with known outcome. For KRKN, winning terminal positions for the Rook side (suppose it White) are all with Black-to-move. In these won-in-zero positions, Black is either checkmated, or the Knight, having just been captured on a square unprotected by its King, is absent. Retrograde enumeration starts by examining all not-yet-classified WTM positions. Any that allows a legal move to a won-in-zero position is marked as won-in-one, and the move is entered in the table. Then a pass through the unresolved BTM positions will flag as won-in-two those which must move (zugzwang) to a won-in-one position. Alterna tion between the tables continues until no additional winnable posi tion appears in either table. The remaining unresolved positions are drawn. In an optimal database, the move selected from each WTM position must lead to a minimum-length win; since the WTM cases are pro cessed in order o f increasing path-length, it suffices to choose the shortest already-decided alternative. For BTM positions, however, an additional complication arises. An optimal move by the underdog is one that affords maximum delay; selecting the most remote WTM successor must be deferred until the path-lengths o f all alternatives have been determined. One might wonder, once this computer-produced knowledge has been encapsulated in a table, just how accurate it might be. After all, programming lapses are notorious, and a program for database genera tion is complicated enough to harbor plenty o f bugs. Moreover, because alternative paths to a win may be equally long, independently gener ated tables are apt to disagree on the best lines. Fortunately, correct ness can be readily shown with a single pass over the table entries. A
2 The distance-to-win in plies is not necessarily distance-to-mate. It is often conve nient to measure not to mate, but to change in material, which transforms the endgame into another, simpler endgame with its own exhaustive table. The different measures give rise to conflicting definitions o f optimal, and depending on the criteria selected, different play and different depths-to-win will be calculated. Optimal, it seems, is not always best. In KBBKN, for example, some positions allow a mate in two— surely best— or an optimal-to-conversion Knight capture in one. Another example of disagreement over what might be optimal can be seen in KQRKQ. When faced with a choice of exchanging Queens, Black will likely think it best to avoid exchange in the hope of obtaining a perpetual check, rather than swap into a lost KRK endgame that prolongs the mate at the cost o f drawing chances.
COMPUTABLE SUBGAMES
125
verification program simply checks that for each entry, the position that results from playing the specified move is no worse than the others reachable by legal moves, and, if a win, that the win is one ply closer. With all KRKN positions correctly classified, serious lapses in the body o f chess knowledge became apparent. Tabulation of the results provided several surprises, not only for masters and grandmasters, but even for the much more knowledgeable endgame specialists. The KRKN endgame is not the dead draw that had been widely supposed: White can force a win from over 51% o f the WTM and 87% of the BTM positions. How, one wonders, could a master ever have considered KRKN a draw? Part o f the answer is that statistics often mislead, and these are based on the total number of legal positions, rather than on those positions likely to occur in play. Furthermore, the selective memory o f the master tends to recall drawn positions that required some struggle as typical of KRKN endgames, and tends to dismiss immediate wins as trivial. Still, many winning positions are not easy to recognize, and some have been spotted only with the help of a database. To illustrate just how tricky KRKN analysis can be, consider the WTM position taken from a continuation in Fine’s Basic Chess Endings:
Fine implied that this position cannot be won, for after the natural Kb6 Nd8, White is held to a draw, which he substantiated by appeal to a previous example. What GM Fine missed, and what a KRKN database shows, was that the unnatural (because it gives Black a free check for no apparent reason) Kc6! wins. It wins for the very good (Martian) reason that, even with best defense by Black, the Knight will soon fall without compensation to give White a winning KRK endgame.
12 6
CHESS AND MACHINE INTUITION
The lapses revealed by the first exhaustive tables raised the disturb ing suspicion that the traditional endgame lore (from which a player’s expert knowledge is derived) might be everywhere riddled with error. It is still the handling of the endgame that most clearly distinguishes masters from club players. A master giving a simultaneous exhibition will often push on towards the endgame as rapidly as possible, for even when no advantage has been obtained, a master can almost always outplay a club player in the endgame. The new databases exposed some surprising deficiencies in published knowledge, revealed oppor tunities not suspected by master players, and completely solved some endgames not fully understood even by endgame specialists. The availability o f perfect endgame knowledge soon affected tourna ment practice, and even influenced changes in the rules o f chess. In the early 1970s Boris Spassky had welcomed the results o f exhaustive endgame computation with the comment that if he had an adjourned game and a good program, well, he might feel a bit lazy and use the computer as a second. The first (admitted) use of a computer for adjournment analysis took place during the 1975 USSR Zonal tourna ment at Kiev, when grandmaster Bronstein’s seconds telephoned the KAISSA programmers to seek advice on his adjourned Queen and pawn against Queen endgame with Tzeshkovsky. A printout was soon on the train from Moscow. According to Bronstein, the sequence of moves calculated by the computer was so beautiful he would never have found it himself, and he won the game shortly after resumption of play. (Later analysis revealed that KAISSA had disregarded a stale mate opportunity towards the end of a critical variation, and that the game should have been drawn. Or maybe not: GM Bronstein still felt there was a win.) Rapidly proliferating databases of four- and five-piece endings over turned ever more traditional knowledge. In 1851, a study by J. Kling and B. Horwitz became the classic endgame text for generations of chess masters. Their section on minor-piece endgames noted that two Bishops against Knight wins unless the weaker party can obtain a position similar to that diagrammed on the next page. This configura tion, stated to be drawn, became known as the K ling-H orw itz position. For more than a century, most masters knew this was the position to aim for (or the position to prevent) in the KBBKN endgame. Ken Thompson’s exhaustive table not only showed that King and two Bish ops can normally defeat King and Knight, but provided another bomb shell: The K ling-H orwitz position is also a forced win. The longest KBBKN maximin— forced winning variation against best defense— requires 66 moves to mate, and on the way passes a few rare and beautiful zugzwang situations.
COMPUTABLE SUBGAMES
127
Kling-Horwitz position
The exact knowledge o f maximin path-lengths provoked the hottest dispute in the world chess body, Federation Internationale des Echecs (FIDE), since its formation. The altercation concerned proposed changes to the 50-move rule, by which a game is considered drawn if no irreversible move has occurred in the last hundred plies. This rule has been a chess tradition for some four centuries. It appears in Ruy Lopez’ code, not, it seems, because o f any theoretical understanding, but simply because protracted endgames were disagreeable to the profes sional player who made a living from playing chess for stakes. The discovery of additional classes o f winnable endgames that from some starting positions require more than 50 moves to mate sparked lobbying for a new rule. After long argument and extensive politick ing, the FIDE General Assembly decided at the end o f 1984 to extend the drawing limit to 100 moves, but only for the KRBKR (King, Rook, and Bishop against King and Rook) endgame. Professional players groaned. World Champion Anatoly Karpov complained that now one has to suffer for 100 moves when defending with a Rook against Rook and Bishop, just because some people who are working with computers have announced that they have found some positions which the com puter can win in 62 or 65 moves. His view is shared by many; after all, there is little likelihood of near-longest-path positions occurring in tournament play. Longest-path starting positions for KRBKR are so unnatural, with pieces unreasonably out of play, they could have been reached only through impossibly poor previous play. The outcry over this rule change forced FIDE to reconsider. Addi tional database results brought demands (and a host of new proposals) for further tinkering with the 50-move rule. In November 1988, after lavish expenditure o f effort on a rule that might apply to one game in
128
CHESS AND MACHINE INTUITION
10,000, the President o f the FIDE Rules Commission announced a compromise and a final (?) rule. The 50-move limit was retained, but a specific list of exceptions was appended: the endgames KRBKR, KNNKP, KQKBB, KQKNN, KBBKN, and KQPKQ (with the pawn on the seventh rank) are subject to a 75-move drawing limit. With five-piece endings, computer-generated databases penetrated and completely mapped regions in which correct play is beyond the ability o f even a grandmaster. This does not mean that GMs are unable to play endgames such as KRBKR or KNNKP. Indeed, they have little trouble maintaining an advantage in over-the-board play against equally fallible opposition. But even after long and careful prepara tion, a grandmaster playing against a database from a position per haps 30 moves away from the win is likely to go astray sooner or later, and to relinquish all advantage. As Day and Berliner noted, the moves specified by an endgame database, though unusual, provided very effective defense. Still, the table also contains moves that counter this defense. Through study of the database, one ought to be able to recognize purposeful sequences of moves that press on to the win, and formulate some systematic ap proach to the conduct of even a complicated endgame. In an attempt to find a compression of the KRBKR database that could be couched in strategic terms, Grandmaster Edmar Mednis undertook an analysis of the optimal longest-path defense provided by Thompson. He ‘w as able to find meaning in the Martian movements prescribed by the table entries and to explain the strategy and tactics in terms o f chess con cepts understandable by any competent player. Even specialists with voluminous endgame knowledge can be over whelmed by the complexities revealed through exhaustive enumera tion. John Roycroft, a life-long endgame scholar and editor of the magazine EndGame, had mastered KRKN after studying its database and in 1980 demonstrated his move-perfect accuracy in play against it. When Thompson generated a complete look-up table for KBBKN, Roycroft accepted the challenge of formulating a method of practical play and, if possible, attaining complete mastery of this new territory. He had set him self a formidable task, for the number of positions in KBBKN is nearly 100-fold greater than KRKN. In his 1987 article “ Expert Against Oracle,” Roycroft described the results of his monthslong analysis o f examples supplied by Thompson. He identified five distinct phases in the progression along a longest-path optimal solu tion; a Kling-Horwitz position is reached at the end of the second, and after the third, the Black King has been forced into the open by way of one of four “ K ling-H orwitz exits.” The next stage “takes White some twenty-three moves, not to be found in any book and characterized at
COMPUTABLE SUBGAMES
129
times by excruciating slowness and mystery” to produce a position with the Black King at the edge o f the board. In a final stage the Knight falls. During this experiment, he observed that most classical chess concepts such as mobility and center control are unhelpful in the KBBKN domain, and found it worthwhile to formulate seven new descriptive concepts. Roycroft tested his hard-won knowledge by playing against Thompson’s database under near-tournament conditions, using a clock, but allowing him self the luxury o f analysis on a separate board. Given ten positions, he won eight and abandoned two. In a second part of his experiment, he continued his study using the database as a training aid, and discovered still more relevant patterns. Confronted with another ten problem positions, he again won eight, using slightly less than twice the number of moves required for optimal play. Roycroft surmised that as a result of his effort, it is likely that future advice for winning KBBKN will be to aim for the K ling-H orwitz position, since the winning method from that point will be well charted. Database builders, most prominently Ken Thompson, generated still more tables, and had soon completely mapped all interesting fivepiece endings. The long optimal paths of KRPKR were exceeded by KQPKQ mating distances o f more than 100 moves (which, because they include irreversible pawn moves, have no effect on further excep tions to the 50-move rule). The new chess knowledge encompasses some curious and counterintuitive results. Thompson showed, for ex ample, that the longest optimal path to pawn conversion in the KQP(a)KQ endgame (King, Queen, and a-pawn versus King and Queen) starts with the pawn already well advanced on a5. His database also includes several unusual BTM positions in KQPKQ endgames that oblige Black to capture into a lost subgame. These tables were called upon during the first round of the 20th North American Computer Chess Championship in 1989. David Levy, the Tournament Director, needed to adjudicate an 80-move game that had arrived at a KQPKQ position. Since a database is considerably less fallible in such positions than any grandmaster, and much quicker, Levy telephoned Thompson for computer adjudication. The answer (won for the superior side) arrived in time for the next pairing. Each complete table for a five-piece ending contains several hun dred million entries. With current technology, exhaustive enumera tion o f endgames with as many as seven pieces seems feasible, but this complexity has not yet been attempted. Unlike the forward-search chess engines, no custom hardware has been built for brute-force in reverse, and all endgame databases are generated by programs run ning on general-purpose computers. Cray computers seem particularly
130
CHESS AND MACHINE INTUITION
suited to this task, for the 64-bit word length and array operations lend themselves to rapid manipulation of chessboard patterns, and pro grammers enjoy finding clever ways to exploit its parallel computing power. Lewis Stiller exploited the even greater parallelism o f the Connection Machine with its 65,536 processors to establish that the KRBKNN endgame is a win for the superior side. In four hours, his program examined some 1011 positions to find a longest-path-to-simplification o f 223 moves. By embedding the results o f brute-force calculation in a look-up table, complete solutions o f (terminal) endgames3 have been derived that permit perfect play transcending human comprehension. These look-up tables already augment the powers o f the forward-search en gines. As long as this information can only be absorbed through rote memorization o f myriads of special cases, it remains useless to the over-the-board player, but endgame specialists are striving to trans form the raw data into new chess knowledge that will benefit future students o f the endgame.
3 Endgame databases on CD-ROM are available at cost through Ken Thompson at Bell Telephone Labs.
chapter 10
Machine Learning
“Four score and seven years ago our fathers brought forth on this nation a new, uh, . . The schoolchild stopped, reddened, studied the floor, and finally sat down. Realizing the uselessness of imperative, the teacher called up the next pupil, suppressing a muttered remark that some folks just can’t seem to learn. The child, not perceiving any significance in the four score and eight words of the Gettysburg Ad dress, fully shared this view. Without meaning, the passage is as hard to remember— and as little worth remembering— as a beginner’s chess game. Storing and retrieving raw information without error, while diffi cult for a human, is a trivial exercise for a machine. Editing its memory a few bits at a time, a von Neumann computer lays down long error-free swaths o f data at astonishing speed. But learning is more than just recording: It is the acquisition o f potentially useful knowl edge that fits into— and modifies— one’s view of the world. Human memory is particular. To assimilate a passage o f text, a person must impose individual meaning upon it; sometimes by inventing a crazy mnemonic. To the child obliged to memorize it, the meaning of the Gettysburg Address lies not in the content of its phrasing, but in the rhythm o f its lines. Recitation depends on previously discovered pat tern— a result of intuition. The haphazard discovery of meaning so characteristic o f learning in organisms seems quite unlike the inhu man precision o f digital machinery. In one important sense, no computer ever learns. If a properly functioning finite-state computing machine finds itself in the same state as on some previous occasion, it will take the same action. When driven by a fixed algorithm with the same data, a machine’s play does not improve with experience. Until very recently, most chess programs were strictly deterministic; a duffer could beat even a very capable program by simply repeating the moves of some previous game, confi 131
132
CHESS AND MACHINE INTUITION
dently relying on the machine following the same succession o f inter nal states.1 Both Babbage and Shannon suggested using a random element to relieve the monotony of predetermined behavior. But variety need not be random. The chess player remembers past games and tries plausible alternatives to unsatisfactory continuations. Shannon proposed a rem edy for the machine’s inability to profit from mistake: fine-tuning the evaluation function by adjusting weights based on results o f play. Turing also suggested that one might write programs based on differ ent playing strategies and adopt the one giving the most satisfactory results. Learning o f any kind in chess machines was a long time coming. Since performance rose rapidly with hand-crafted evaluation functions and faster hardware, none of the four types of learning observed in humans (rote, advice-taking, induction, and analogy) was necessary for improving machine play, and authors of chess programs paid scant attention to learning possibilities. In 1959 the IBM researcher Arthur L. Samuel published the results o f the first phase of his “Some Studies in Machine Learning Using the Game o f Checkers.” He set out to examine ways a computer might be programmed to learn from experi ence, adjusting its behavior to avoid repeating mistakes. The limited environment of a game is ideal for a study o f learning procedures. Chess was the natural candidate, but he was interested in learning techniques and opted for the simplicity o f checkers. He played quite poorly, thought the game trivial, and even imagined writing a pro gram that could beat the world champion. He thought his ignorance an advantage: Without expertise, there is no temptation to program in specialized knowledge. Samuel followed Shannon’s recommended type B strategy for the design o f his checker program. It searched four-ply, not counting ex changes, and pruned by a - b . He used a linear polynomial based on a set of significant board features for evaluation, with a separate test for inability to move, which marks the end o f a checkers game. He first experimented with rote learning— recording the scores of every board position in a transposition table. But few positions could be retained, most were not worth remembering, and it was difficult to decide which positions to discard. Still, his program gradually built up 1 During the 1982 Fredkin Incentive Match (in which four experts were pitted against four chess programs), Nuchess played Black twice in a row. Because the opening book had not been modified between rounds, the second round opponent repeated the first 24 moves of the preceding game to obtain a piece advantage and an easily won position. The ploy just succeeded. The difference in elapsed time (and hence in search extent) nearly resulted in selection of different moves.
MACHINE LEARNING
133
a repertoire o f good openings as it found better values for often-encountered positions. No middle-game improvement took place, for positions almost never recur, but in frequently occurring endgames, his program learned to avoid obvious traps and, with a piece advantage, could usually press on to the win. His rote learning was, in effect, the generation o f endgame databases and an opening book. Samuel then investigated learning by parameter adjustment. He selected a set o f board features he thought would distinguish all signif icant positional differences, included the most important in the evalua tion polynomial, and maintained the others in a list for potential later inclusion. Ideal weights could not be known in advance (indeed, not even their signs), but if their values change to reward correct evalua tions and penalize mistakes, the coefficients might eventually settle, and converge to effective values. Because the terms he selected for the initial evaluation polynomial might not be the most significant, he provided for deleting an irrelevant term when its coefficient dropped below a threshold, and replacing it with one of the reserves. In short, the program should automatically make the ad hoc adjustments to the evaluation function that programmers were doing with their finetuning. At first, his program adjusted weights after every move while play ing both sides o f the game. The coefficients oscillated erratically as the program attempted to adapt to an ever-changing standard. He rem edied the difficulty by using two versions of the program, one of which would adapt while the other employed the same evaluation parameters for the entire game. The winner’s coefficients were preserved as the best standard o f play; whenever the variable version amassed a prede termined number o f losses, its evaluation function would be shaken up by setting the largest coefficient to zero. In spite o f the public relations potential for the novel application of their product, IBM did not approve o f games on company time and especially not computer play o f any game that might be considered intellectual. IBM corporate leaders supposed that customers would feel threatened by the prospect o f thinking machines, and considered talk of artificial intelligence bad for business.2 The only company recogni tion o f Samuel’s program was as a hardware test vehicle that would run continuously for long periods. Still, this recognition sufficed for all the (third shift) computer time he might desire. Every night, in the
2 A revisionists 1984 press release announced that IBM had been active in artificial intelligence research since the 1950s and referred to ‘a computer program that learned to play checkers at an advanced level,” but did not identify the program s author.
134
CHESS AND MACHINE INTUITION
assembly hall of the IBM 704, he ran variants o f his program on as many as four machines at a time. It might seem quite straightforward for two programs to play against each other, tallying the results. One can even imagine leaving a machine with a learning program on overnight, and returning in the morning to find it playing at expert level. Samuel discovered why this is unlikely to occur: There is no simple way to assign credit and blame— to decide which moves are good and which bad. Despite his best efforts, credit was awarded in positions that permitted spectacular moves instead o f to earlier moves that made their creation possible, and blame was apportioned to the best-possible moves in hopeless positions. In games as complex as chess, or even checkers, mistakes on both sides are inevitable. Depending on the type of blunder and its effect on the evaluation procedure, a learning program may well acquire more bad habits than good ones. Samuel lamented that an opponent’s bad play would beguile his program into replacing terms in the scoring polynomial too often. He never found an effective way to assign credit or blame, and finally side-stepped the credit assignment problem by treating position evaluation as a pattern recognition task. A good player o f checkers or chess knows a multitude o f specific rules that apply in particular situations. The linear polynomial often fails to handle exceptions because no adjustment o f weights can deal with feature combinations not linearly separable. Samuel tried using logical combinations o f simple terms to account for nonlinear relation ships, but combinatorial explosion thwarted every effort. He observed that the linear polynomial approximation suffices only because mini max evaluation subdivides the possible positions into separately scored discrete cases, and concluded that he needed some mechanism for easy subdivision by case that would distinguish types o f positions. In his 1967 paper, Samuel introduced a new evaluation function based on a “ signature table.” The idea was to group parameters accord ing to recognizable strategic concepts. As in his earlier scheme, se lected features were assigned values based on the position. Center control, for example, might be rated at -1 , 0, or 1 (corresponding to the classification deficient/balanced/advantageous), based on occupation o f specified squares by checkers o f given colors. These numbers, taken in groups of three or four, served as indices into arrays, each cell o f which represented a signature— a particular combination o f feature values— and contained a weight. Further weighting and indexing o f secondary arrays produced a final value at the apex o f a three-tiered hierarchy. To determine the proper weights for this grand scheme, he took as training material 250,000 positions from published master games to gether with the moves that led to wins or draws. For each position, the
MACHINE LEARNING
135
program evaluated all alternatives; weights were adjusted slightly whenever the highest-scored move disagreed with the book move. Because no forward search was required, learning by example pro ceeded hundreds o f times faster than learning during play. Moreover, the difficulty o f determining excellence could be avoided, for the book moves almost always represent best play. Finally, because nonlinear interdependencies among board features were better accounted for, the signature table evaluation function was more accurate than the linear polynomial. After training with some 170,000 examples, he tested his signature table evaluator with new positions and found that nearly two times out o f three the program would select an acceptable move with no look-ahead at all. Two decades later, machine-assisted fine-tuning played a central role in developing Deep Thought’s evaluation function. DT’s chess knowledge included tables for piece placement, open and closed files, blocked pawns, and passed pawns. Tables of weights (and even databases containing best moves in all four-piece endings) appropriate to the type o f position could be down-loaded from the controlling computer to the chess engine. Andreas Nowatzyk wrote a program that, like Samuel’s, would tune the evaluation parameters automat ically according to the moves from a training set o f more than nine hundred GM/IM level games. Hsu remarked that their team was per haps the only one at the sixth WCCC that could not cite the exact numeric values o f their machine’s evaluation parameters. Samuel considered the inability to generate new parameters— to induce rules from examples— a major defect of his program. Although he had made his own list o f board features as inclusive as possible, he knew significant oversight was likely, for he was not a good checker player and the checker experts were quite unable to express their immense knowledge o f the game as rules. Still, the expert assessing a position seemed always to have applicable rules at hand, since conclu sions could be supported by citing specific positional features, usually with the help o f specialized idiom. The language used by chess annotators to describe the progress of a game is just as specialized as the jargon of other sports announcers. Positions are described as open, closed, balanced, or dead drawn. A move might be characterized as safe, sharp, forced, strong, doubleedged, or inaccurate; a combination called brilliant or unsound. These nuances hint at the range o f inductive categories distinguished by strong players in evaluating a position. The descriptive language of chess has evolved to permit easy expression of classes of positions that arise during play and analysis; it reflects the experience of those who have contributed to the lore o f the game. Grouping observations into categories is an important part o f intel
136
CHESS AND MACHINE INTUITION
lectual endeavor; science depends on formulating general laws from particular cases. Inductive reasoning and its part in scientific discov ery have long intrigued philosophers, who somehow seemed more concerned with seeking some rational basis for it than to discover how animals do it or how it might be carried out in a computer. Some consider induction an expression o f probability: If two phenomena always appear together, and never separately, one supposes the likeli hood of future association to be great. Damon Runyon quipped that the race is not always to the swift, nor the battle to the strong, but that’s the way to bet. Understanding how humans make good guesses was left to the psychologists. In the early 1960s, Earl Hunt pioneered in developing algorithms to mimic the inductive process. He wanted to design a program that could learn to recognize membership in classes, where the classes are known only through their samples. In Experiments in Induction, he commented: An intelligent device, including man, can be thought of as something capable of adjusting to its environment. To make such adjustments, the device must continually be classifying slightly different states of the environment as equivalent or not equivalent. [Any intelligent device must have] a capacity to develop classification rules from experience. (Hunt, Marin, & Stone, 1966)
The Linnaean taxonomy of biota comprises the best-known formal system of classification. The Kingdom Animalia is broken into phyla, which contain classes, which in turn are further partitioned into all the divisions and subdivisions biology students wrestle with. Since it is strictly hierarchic, with mutually exclusive categories at each level (no species can belong to more than one genus), the entire system can be represented as a tree. Hunt had noted that when examining a collection o f objects, people tend to group them according to observable attributes such as color, size, or texture. Furthermore, these groupings are hierarchic: One attribute dominates, while others are secondary. Depending on the ranking o f the attributes, a set o f objects can be described by mutually exclusive hierarchies; that is, objects might be arranged first by shape and then by color, or the other way around. He also knew that in experiments in which presented objects are identified as belonging, or not belonging, to some unrevealed category, the performance o f sub jects trying to guess the status o f further objects does not improve gradually over many trials, but appears to leap suddenly from chance to perfection. Learning occurs not through slow parameter adjustment while practicing correct responses, but by revising assumptions after wrong guesses. The hierarchy is suddenly restructured.
MACHINE LEARNING
137
He described each object by enumerating the values of its discrete attributes. After selecting a concept— a partitioning of the universe into classes— he presented object descriptions along with their stated class memberships to a computer program called an induction classi fier. The classifier generated a compound rule for categorizing the objects in the training sample according to a simple rule: Select a root attribute to partition the set o f training samples; if no partition con tains a mixture o f instances, the job is done; otherwise repeat the process, recursively, until all positive and negative cases have been separated. In the resulting decision tree, nodes correspond to attri butes, and branches to their values. Hunt illustrated his idea with an example o f a universe of fourletter words (no, not that kind), describable by four attributes, the letters, and showed how his algorithm generates a decision tree from a set of precategorized samples. Given that the words JRTK, QPQZ, BRYL, XVQM, PTQW, FRKV, and NSQK belong to a class, while JVBK, QPWY, NTYD, MRQM, JCTK, BRQW, and LRQW do not, the following decision tree might be generated: Is Q in the 3rd position? yes: Is R in the 2nd position? yes: Word does not belong to the class no: Word belongs to the class no: Is R in the 2nd position? yes: Word belongs to the class no: Word does not belong to the class
The resulting rule reflects only one generalization of the training sample (exchanging the tests, for example, would produce a different, but equally correct tree). With increasing number of attributes, there is a combinatorial explosion o f possible partitionings consistent with the training sample. To produce the simplest hypothesis that accounts for the known facts, some kind o f minimalist principle— a sort of Occam’s Razor— can be used to generate the least complicated decision tree that can correctly classify the training objects. Because he published in a different academic subculture, more than a decade elapsed before Hunt’s ideas found their way into computer science. The gulf between disciplines was bridged in 1978 during a graduate course in Artificial Intelligence taught by Donald Michie. The class exercise was to write a program that could determine wheth er a KRKN endgame position is lost in two plies for the Knight’s side. Michie specified a performance that excluded brute force, for the idea was to synthesize a decision rule by induction from examples. But every proffered solution, when tested against the database, contained
138
CHESS AND MACHINE INTUITION
at least one error resulting from some incorrect assumption about the problem.3 Ross Quinlan, the graduate student who had helped Hunt in his work a decade earlier, happened to be auditing the class. He saw how one might, in steps, generate a rule that grows to cover all three million KRKN positions. After choosing a set of attributes he thought would suffice to distinguish all cases, Quinlan started with a working set of examples, which he called a “ window,” and used the Hunt algorithm to generate a decision tree that could classify these in stances. Since a decision tree is logically equivalent to a conditional expression in a programming language, the algorithm produced, in effect, a program that could test additional samples. The program ignored correct classifications, which merely confirmed the current rule, but maintained a list of refutations. After several counterexam ples had accumulated, the original tree was discarded, the exception cases were added to the window, and a new tree was built. A few iterations o f this process produced a complete, running program which correctly classified the entire space. The program seemed to capture the essence of intuitive learning: It would form a guess at the category to which each sample belonged and, if wrong, would modify its classification rule to account for the correc tion. Quinlan improved the Hunt algorithm by adding a minimal entropy criterion for selecting the attribute to be tested at each level, and went on to lost-in-three-ply. This determination is much more difficult: A chess master can recognize lost-in-two cases at a glance, but might spend half a minute on lost-in-three and just occasionally get it wrong. Quinlan’s program quickly constructed a compact deci sion tree that could identify all cases but, alas, this representation bore only the faintest resemblance to known chess concepts. Chess masters who examined the tree asserted it made no sense at all; indeed, it was even less comprehensible than the KRKN database and completely useless for explaining lost-in-three. Another drawback to using decision trees for representing inductive categories became apparent. In real life, not all attributes may be observable, yet intuition functions remarkably well with incomplete information. Analogy and intuition often seem to require fuzziness. Because every member o f the training set is neatly precategorized, with no “sort o f ’ memberships, the categories defined by decision trees have sharp boundaries. Quinlan’s approach to machine induction
3 The chief reason for failure lay in the human predilection to formulate simple rules that seem plausible, such as “ Black is safe when neither King nor Knight is threatened,” but for which a counterexample can be found.
MACHINE LEARNING
139
found further application in codifying endgames, and much wider use in the Expert Systems4 o f the 1980s, but it has not worked well when knowledge is fragmentary; in particular, it is unhelpful in choosing a course o f action during a chess middle-game. The philosopher Ludwig Wittgenstein had already exposed a funda mental objection to the scheme employed by Hunt and Quinlan. He asked what must be known in order to classify an object as, say, a chair. Though people can list attributes that chairs have in common, they do not classify by enumerating sets o f attributes; there may well be no set o f characteristics that can be applied to all members of the class of chairs, and only to these members. Because concepts are established piecemeal during perception and recognition, they rarely depend on a simple set o f necessary and sufficient conditions. Instead, they are embedded within a system o f beliefs about the nature o f the world and about the behavior o f objects in it. A previously unobserved object, notes Wittgenstein, probably would be called a chair because, as perceived, it bears a close “family resem blance” to other objects that serve as chairs. Chairs comprise a natural family, “ a complicated network of similarities overlapping and criss crossing,” and, he observed, existence of such a network would be sufficient to account for success in identifying the corresponding object or activity (Wittgenstein, 1953). Since the decision tree provides only an awkward representation o f a complex o f overlapping similarities, the inductive learning observed in organisms would seem to require a network structure. When Quinlan attempted automatic classification of KRKN posi tions, the entire branch o f machine learning by computational network had fallen into eclipse. In the 1940s, the Chicago neurologists Warren McCulloch and Walter Pitts had developed a hypothesis of how a network might process information. They showed that an ensemble of interconnected threshold gates could carry out computations and, with sufficient storage, it would have all the power of a Universal Turing Machine. Their design resembled an organic nervous system— a net work o f similar circuit elements with dense connections between near by cells— but its biological implausibility made it useless as a model of nature. One obvious difficulty with their model was its single-channel infor mation flow: Discrete signal paths (wires) converge on a logic gate, which produces an output on another discrete path. Until very recent
4 A misleading term for rule-based systems that display competence in some strictly delineated region, but which behave like a tyro when confronted with borderline situa tions that require true expertise.
140
CHESS AND MACHINE INTUITION
ly, all computer architectures, even distributed systems, were based on individual, nonredundant signal channels. Indeed, in the 1940s, brain activity was explained in terms o f an automated telephone exchange. But telephone wires are vulnerable to single-point failures. A severed telephone cable must be exactly respliced, while an organism’s nerve bundles seem to require less exact “wiring” and often continue to function reliably despite damage.5 The Hungarian-born mathematician John von Neumann, famed for the idea of the stored-program computer, found a way to make M cCulloch-Pitts nets insensitive to single-point failure. In place of the all-or-nothing activation of a single threshold gate, he introduced a statistical element into computation by using simultaneous activity on several data paths to signal each bit o f information. Von Neumann was able to show that redundant nets, although built o f unreliable compo nents, could be configured to carry out arithmetic or logical operations with arbitrary accuracy. Shmuel Winograd and Jack Cowan soon found another way of building redundant networks o f more compli cated, neuron-like elements. Their design not only distributed each information bit over many processing elements, but used each element to (partially) represent many bits. These studies of reliable processing with unreliable components gave new plausibility to guesses about how an ensemble o f neurons in an organic brain might provide full function even when individual neurons are fatigued, or damaged. In 1949, biologist Donald Hebb proposed a mechanism by which a network might learn. He supposed that simultaneous activity at both ends o f a neural pathway would strengthen the connection, and argued that a network o f adjustable connections would automatically adapt to patterns of signals, and would thereby learn. Hebb’s rule— the more a connection is used, the better it works— is sometimes expressed the other way around: “use it or lose it.” This notion now guided “ connectionists” in designing adap tive networks that learn by modifying “ synaptic” couplings. In the 1950s Frank Rosenblatt became the central figure o f neural net research through his perceptron, a connectionist device based on Hebbian ideas. He used a M cCulloch-Pitts network with adjustable weights. Like the operational amplifiers o f the electronic engineer, each computational element forms a sum of weighted inputs, with excitatory and inhibitory of opposite polarity; whenever the sum ex
5 With a suitable coding scheme it is easy to imagine continued transmission o information over a damaged bundle of channels if there is some form of cross-talk between channels. In organic nervous systems, sensory information is encoded in pulse rate or as pulse bursts; both patterns are easily transferred to adjacent channels.
MACHINE LEARNING
141
ceeds a threshold, the gate produces an output. Rosenblatt’s perceptron consisted of two layers o f gates: One accepted patterns from a grid of photocells that represented a retina; the second signalled the presence of those patterns that the device had been trained to perceive. Training was by supervised learning— a teacher would present a series o f input patterns and observe the outputs. Correct responses, although welcome, were ignored. The trainer would make small ad justments to the weights of those gates that contributed to wrong answers. Rosenblatt was delighted to find that as the weights were tweaked, the number o f mis-classifications gradually decreased. His network did seem to learn. And once trained, the perceptron would respond almost instantly, an impossibility for a computer program that must be executed step by step. Rosenblatt’s connectionist approach seemed an important step to ward understanding, and imitating, the biocomputer. In 1959, a team of neurologists that included McCulloch and Pitts verified the inade quacy o f the then-popular metaphor of the sequential computer as an “electronic brain.” Their study o f organic information processing, “What the Frog’s Eye Tells the Frog’s Brain” (Lettvin, Maturana, McCulloch, & Pitts, 1959), revealed that most of the processing re quired for recognition o f simple objects o f interest to the frog, such as flies, is done in parallel within the optic conduit. The brain seems only to receive complete messages important to the frog’s future such as “ fly detected” (reward possible) or “duck detected” (danger threatens). Somehow, the frantic activity o f neurons in the retina is converted to an unambiguous signal. Excited by the initial success of his artificial retina, Rosenblatt’s enthusiasm outran his scientific caution. He exaggerated the capa bilities of his perceptron, suggesting that it could carry out more complicated processes than the digital computer and thus represented a fundamental breakthrough in AI. His book did little to substantiate the extravagant claims. Its tone merely irritated other researchers who were delighted to illuminate the flaws in his device. The flaws were many. Because the perceptron learned through global reinforcement, it could hardly model an organic nervous system in which neurons do not adapt to global purpose. Because it lacked any internal, symbolic representation of what it perceived, the perceptron could not refer to a perception other than by recapitulating the act. Because it had to be trained, the perceptron could not learn on its own, for it was subject to the same credit—blame problem that Samuel had found insuperable. A final criticism proved devastating. The threshold gate— a linear device— can only distinguish catego ries that are linearly separable. In 1969, Minsky and Papert proved
14 2
CHESS AND MACHINE INTUITION
that in two-layer6 networks, with increasing number of inputs (and combinatorially expanding input patterns), the ratio o f recognizable patterns to the potential total quickly becomes infinitesimal. Not only can a perceptron recognize “hardly any” patterns, but it is unable to detect certain basic properties o f an image, such as connectivity o f a region. Such is the intimidating power of mathematical proof that over the following decade, few researchers would even consider working with perceptron-like networks. The renaissance in development o f artificial networks arose from a need to find some way to deal with hard problems, that is, those areas of computing in which algorithms become swamped. Hard problems fall roughly into two classes: optimization tasks involving a fixed problem, such as the celebrated “travelling salesman” problem, in which parallel processing can be used to produce ever-better solutions; and learning tasks, such as inducing general rules from examples in which classification plays a central role. Many optimization problems can be solved by “ relaxation.” If nodes— pieces o f a solution— can be bound together by weighted rela tionships to form a network, a process of repeatedly updating the nodes and adjusting the weights to reflect the changed connection strengths can result in network activity settling into a stable state that repre sents the optimal solution. One relaxation technique is gradient de scent. To minimize a cost function— a measure o f solution goodness— combinations o f weights o f the best solution found thus far are altered slightly in the hope that a better solution will be found. With hard problems, however, this process is apt to converge on local minima, to get stuck in a depression. This tendency can be counteracted to some extent by randomly jiggling the weights to boost the best-yet solution over the next rise into the (perhaps deeper) valley beyond. (Samuel also tried shaking up the evaluation function o f his checker player whenever it began to lose consistently.) Since a cost function is analo gous to energy and the random jumps resemble thermal noise, this form of relaxation was dubbed a “ Boltzmann machine” in honor of the founder of thermodynamics; gradual reduction o f jump extent to en sure settling became, inevitably, “ annealing.” A relaxation network can also handle fragmentary data. When presented with a partial pattern of inputs, a trained relaxation net work can complete it: With some nodes clamped in fixed states, the weights determine best guesses for the remainder. If the network consists only o f input and output nodes, and all o f these are clamped,
6 Layers of processing units; some authors refer to layers of weights, and call the perceptron a single-layer network.
MACHINE LEARNING
143
nothing interesting happens. But if in addition to these visible nodes, the network also contains an intermediate layer7 of hidden nodes— invisible from outside the network— clamping the interface between network and environment results in the formation of internal repre sentations.8 As if by magic, the Boltzmann machine solves the credit assignment problem for the hidden units, and does so using only locally available information. Although it represents a substantial advance in unsupervised ma chine learning, the Boltzmann machine suffers from a severe draw back: Since it depends on chance, the wait for serendipity can be a long one; indeed, there is no guarantee that it will ever learn. David Rumelhart, Geoffrey Hinton, and Ronald Williams found a way to avoid the slow, erratic learning of the Boltzmann machine by using a procedure originally suggested by Rosenblatt. Their training scheme employed a two-stage computation on a perceptron-like network with a hidden layer. In a forward computation, an input pattern is presented to the network and the responses noted. Then a backward computation uses the difference between actual and desired responses to adjust the weights o f the output layer; the changes to these weights are then propagated further back to modify the weights o f the hidden nodes (whence the expression back-propagation). A new mathematical proof showed that under certain easily realized constraints, convergence of the back-propagation algorithm is assured.9 In his attempt to improve on a “reading machine” developed at Digital Equipment Corporation, Terrence Sejnowski devised a star tling demonstration o f the power o f back-propagation (Sejnowski & Rosenberg, 1987). The DEC research project had used a set of rules developed by a team of linguists for English pronunciation, together with an extensive dictionary o f exceptions, to select sounds that would match a stream o f written text. Phoneme codes generated by DECtalk were transmitted to a voice synthesizer to produce audible speech. Despite the limitations of purely-syntactic substitution of sounds for words (even simple sentences such as “ Live animals don’t live in space” present difficulties), the DECtalk designers claimed an accuracy of “about 95%.” 7 It is curious that layers so often come in threes, for example, in Samuel’s signature tables, in Hitech’s evaluation function, and in single-hidden-layer back-propagation networks. One might suppose some diminishing-returns effect at work. 8 The Boltzmann learning procedure requires that hidden units know whether they are in training or freely running. The signal providing this information is, in effect, an attention device. 9 The Soviet mathematician Andrei Kolmogorov proved a theorem on mapping arbi trary real vectors in the 1950s; Robert Hecht-Nielsen’s reformulation proved conver gence of back-propagation.
144
CHESS AND MACHINE INTUITION
Sejnowski tried to duplicate the DEC results with a back-propagation net. With a seven-character input window providing context for selecting a phoneme code, a network of some 300 neurons with 18,000 connections trained on a sample text with associated codes for pho nemes, stress, and pauses. As in the DEC experiment, Sejnowski trans mitted the codes produced by his NETtalk to a voice synthesizer so he could hear its improving performance. Two simple refinements heighten the drama o f Sejnowski’s presen tation: He selected a speech synthesizer frequency that would produce a high-pitched voice and chose a training text phrased in the simple English o f a child’s essay. The listener has an uncanny impression that a child is learning to speak. When started with random connection strengths, NETtalk produces the la-ba-da one associates with the dis covery o f voice. This soon turns into a babble which, though unintelli gible, consists o f word-like groupings of sounds. As training continues, more and more recognizable words appear. After half a day, NETtalk is able to produce the halting speech o f a beginning reader, pronounc ing accurately enough to be easily understood. Even more remarkable, NETtalk could read new material (a contin uation of text composed by the same child containing fresh vocabulary) nearly as well. The network had somehow absorbed the notoriously erratic rules o f English pronunciation only from examples. Sejnowski’s collaborator Charles Rosenberg developed a program to analyze the weights o f the trained network and calculate the extent o f the “recep tive fields” o f the hidden layer, that is, the features o f the input to which each unit responds. He found regions similar to those neurolo gists had discovered and mapped in brain tissue that correspond to invariant features abstracted by a perceptual system. NETtalk had spontaneously generated its own cognitive map— a model o f vowels. Such experiments emphasized but one component of skill acquisi tion: the training of a more or less fixed network. The other component is perhaps even more important: structural change in the network itself. Although all biological nervous systems change during develop ment, the human brain undergoes an especially dramatic metamor phosis. During embryogenesis, the evolutionary older structures such as the “ reptilian brain” are overlaid by successive accretions o f tissue crowned by the primate neocortex; development is marked by m igra tion of great herds o f cells; and pre- and postnatal growth o f brain tissue involves destruction o f vast populations o f cells, which wither to be replaced by new growth. Human learning is not merely the programming of hard-wired neu ral circuitry. The most important developmental step— language ac quisition— is closely linked with significant structural change. Behaviorists continue to dispute the extent to which this trait or that
MACHINE LEARNING
145
property might depend on structure or practice, that is, due to nature or to nurture. The distinction seems even blurrier when it is agreed that all abilities are in one sense innate, and in another, acquired.10 Any change of structure during training that increases the capacity to learn relevant tasks is likely to make the fine-tuning provided by education more effective. But improving software has always been easier than improving the hardware on which it runs. Where this has been possible, such as with the hardware add-ons o f Belle, dramatic performance increases often result that could be attained in no other way. In the 1960s, John H. Holland explored methods for systematically modifying computational devices to allow a better fit with their tasks. His studies culminated in the “ genetic algorithm”— a machine embodiment of Darwin’s “descent with modification.” While working at IBM in the early 1950s, Holland had attempted to model “nerve nets” on a computer, with an eye to predicting the behavior o f a small network. As later with Samuel’s checker-playing program, IBM management tolerated the project; Holland was allowed graveyard shift computer time. His all-night experiments provided fresh inspiration for applying mathematics to biology, but Holland felt the need for more mathematics, and he soon left IBM for graduate school at the University o f Michigan. After receiving his PhD in 1959, he remained at the University of Michigan as assistant professor, and eventually rose to professor o f engineering and computer science. During his studies, Holland became interested in cellular automata, a form of parallel computer which consists of a network of processing units laid out in a grid. Each cell in the grid is constrained to one o f a fixed set o f states. A computational step consists of reassigning every cell’s state according to a local rule; that is, the new state is determined only by the cell’s previous state and the states of its immediate neigh bors. When displayed on a video screen, regions of initially random states quickly coalesce into quasistable groupings; some configura tions oscillate; others glide across the screen, sometimes colliding to produce new forms or to annihilate each other. Some can even repli cate. (Indeed, the first design for a self-reproducing machine— by von Neumann— ran on a cellular automaton.) In the self-propagating patterns in this cellular microcosm, Holland could not help noticing the resemblance to microscopic creatures com peting for living space. He realized that programs also grow, replicate, 10 In this holistic view, an assertion that some trait, such as chess ability, depends “20% on nature and 80% on nurture” is meaningless. One is reminded that physicists once disputed the question “does light consist o f waves, or particles? The answer to is it nature or nurture” is also “yes.”
146
CHESS AND MACHINE INTUITION
and evolve under human direction. Couldn’t one create conditions under which this process might take place automatically within a population of programs? During his late-night experiments with com peting checker-playing programs, Samuel him self had selected the most successful for further modification. The variants he had to choose from, it should be noted, contained the same program steps; they differed only in the parameters o f their evaluation functions. Although Samuel did not develop any algorithm for systematically recombining successful feature groups, his program did possess a key property: a concise representation o f its particular behavior as an easily modified vector o f parameter values. Holland imagined a collection o f programs, each representing a solution to a specified problem, and each expressed as a vector. Now if every vector o f that form specified a program, it would be easy to generate new solutions as variants o f (and potentially better than) existing solutions. Indeed, a similar process takes place in nature during combination of DNA vectors, which specify developmental strategies used to construct an organism. During sexual reproduction, genetic material from both parents is combined to create offspring which exhibit similarities to each parent, but which are identical to neither. Each organism represents a potential solution to the problem of propagating its kind. Nature provides a strict criterion of, success: Offspring are evaluated by the environment; well-adapted individuals survive to produce surviving offspring with greater probability than those less well-adapted. The same Darwinian process could take place in a population o f programs if some useful measure o f fitness, such as ELO rating, were available. Holland soon refined this idea for a state-space search into what is now called the Genetic Algorithm. Each possible solution is encoded as a fixed-length vector (for convenience, usually with binary elements). Initially, a population o f trial solutions is selected at random from the state space, augmented perhaps by a few known solutions. If well distributed over the state space, these solutions provide clues to the overall shape of the space and to the location of better solutions. Each solution is evaluated by means o f a fitness function. New solutions are generated by combining parts o f the best solutions; these, too, are evaluated by the fitness function. Like a battery o f Boltzmann ma chines operating in parallel, selection and generation o f offspring con tinues until a satisfactory solution is found. When overpopulation threatens, practical constraints dictate that less-fit solutions be dis carded. All he needed was a mechanism to generate new solutions. Genetic recombination during sexual reproduction can account for a great deal
MACHINE LEARNING
147
of variety in a species, but it does not offer a satisfactory explanation for novelty. A once-popular doctrine of genetics held that mutation— copy error during replication— was the chief source o f adaptive change, yet the mutation rate seemed far too low to match the rate of observed change. Holland realized that ordinary chromosome crossover, in which a piece of one chromosome becomes attached to another, not only occurs routinely in the cell, but offers an explanation that might better account for novelty. Holland thus selected crossover as the primary genetic operator for producing offspring. Two parent vectors are cut at a randomly chosen point, and the first part o f one is spliced to the second part of the other. Since crossover alone does not always provide sufficient variety, Hol land included mutation— Boltzmann-like jumps in individual vector elements— as a secondary genetic operator. With a third genetic opera tor, “ refresh,” he was even able to improve on nature by reincarnating the best historical individual into the population. Although recombination takes place randomly, the genetic algo rithm is not a random search. Because parents have been selected in proportion to their fitness, the offspring are also likely to be fit. Hol land introduced the notion of schemata— templates matching gene combinations which propagate in a breeding population— and showed that the proportional selection strategy is optimal. It best allocates effort towards reducing the uncertainty that a superior solution has been found. Over successive generations, a population becomes more uniform as successful schemata are reproduced, and all individuals represent good solutions. Researchers are now contemplating putting several million artifi cial neurons onto a WSI (Wafer Scale Integrated) superchip to build a “Darwin Machine” (Holland Machine would be more accurate). Its neurons are not programmed individually. Instead, neuron groups participate in genetic programming directly in hardware according to rules that are themselves subject to adaptation. Since internal activ ities and connection strengths are as Martian as an endgame database, these new machines are not so much programmed as educated. This suggests a new variety o f chess-playing machine, one based not on computation, but on intuition trained chiefly by example.
chapter 11
Machine Intuition
Although artificial intelligence enthusiasts can mimic the activity of simple ensembles o f interconnected nerves, they have been unable to imitate the kind o f intelligence necessary to get along in the real world. Even the most elementary information processing done by or ganisms seems quite beyond the capacities of programmed automata. Real-world intelligence demands skills that include moving about, perceiving space and time, recognizing self and non-self, and making sense o f fragmentary input. During everyday activity animals notice the unusual. They are especially quick to sense danger— in a haz ardous environment, time may be too short for deliberate thought. An animal’s intuitive knowledge provides almost instantaneous recogni tion of potential dangers and possibilities o f reward, and an ability to anticipate events in an ever-changing world. Chess play also takes place in a dynamic environment. Like an animal in the wild, the expert player is aware of what is taking place on the board, quickly notices opportunities to seize an advantage, and stays attuned to potential dangers. World Champion Capablanca is said to have remarked: “ I know at sight what a position contains. What could happen? What is going to happen? You figure it out, I know it!” Although few players will claim this level of awareness, their chess play is guided nonetheless by what they perceive is likely to happen. The unease engendered by a sense of, say, back-rank vulnerability, can easily save a game when time is too short for a good think, or when exact calculation is impossible. The trick underlying chessplay— and life— is recognizing what is important. Distinguishing relevant from irrelevant input is the chief information-processing activity o f an organism, for one’s continuing existence depends on accurate judgement even when significance is disguised, as through protective coloration. But coaxing a machine to see importance has proved difficult, so difficult that one suspects that 149
150
CHESS AND MACHINE INTUITION
we must be doing something wrong. Perhaps the problem lies in the presumption that training can take place in isolation, detached from the environment in which the machine must function. The Massachusetts computer scientist Michael Kuperstein is trying to train an artificial neural net to imitate a baby’s learning. His patented robot INFANT, with video cameras to determine limb posi tion and to locate the objects it manipulates, is set loose in a realworld-like environment to train itself. Like its organic role-model, the robot learns by exploring, by experiencing the consequences o f its actions, and by repeating favorable results. It follows no pre programmed coordinates, but conforms instead to its network’s own sense of relative position, which arises in a kinesthetic linkage be tween self and world. Including the environment in the learning process is an increas ingly important paradigm. Instead of being guided by an algorithmic set of rules, a learning entity is simply immersed in an environment, such as a chessboard with an active opponent, in which it must discov er appropriate ways to behave. Learning and practice are combined; there is no longer any artificial distinction between training and per forming. In the still-young discipline of designing autonomous machin ery, Kuperstein’s INFANT project appears overly ambitious. Better techniques are needed for balancing the conflicting requirements of plasticity and robustness, for assimilating new material while retain ing the old. Furthermore, connectionist implementations that learn only through weight adjustment o f already-established connections adjust slowly to a changing environment, and seem inappropriate for modeling concepts that undergo radical change as exceptions are per ceived. Many o f us see the genetic algorithm as a much more promising mechanism for implementing machine intuition. For one thing, the chess player examining game continuations in search of a plan acts out the genetic algorithm. Over several generations— analysis cycles— the player examines a population o f actions that seem appropriate in that board situation. Less fit, flawed continuations, if not discarded out right, are categorized as “ remotely relevant” and rarely reconsidered; those that seem suitable are reexamined. Dangers and opportunities noticed while exploring earlier continuations spring to mind when relevant, so that initially distinct thoughts meet and fuse to produce new understanding. This is exactly the behavior that an intuitive machine should exhib it. We would like to develop a contrivance that forms its own chess concepts through active exploration, and, like the human chess player, continually discovers new structure in a configuration o f pieces. Such a
MACHINE INTUITION
151
device does not merely match established templates with a position. Through experiment, it finds new relationships among the elements it examines. The specific situation acquires new meaning, which may have wider relevance. The known acquires new dimension. But where should one start? Some learning theories suppose a tabula rasa with equally likely rules o f behavior. Without seed con cepts to build on, concept formation in machines (and, perhaps, people) is likely to be slow. It is easy to imagine the difficulty o f learning chess without being told the rules by watching games being played. Still, at least two world-class chess players, World Champion Jose Raoul Capablanca and senior grandmaster Sammy Reshevsky, did learn chess at a very early age by watching relatives play, and both won their first games against their models. But neither prodigy was a blank slate; both drew on their extensive real-world knowledge to help puzzle out what was taking place on the chessboard. In an ongoing study o f ways a machine might emulate the intuition of grandmasters (who invariably consider only good moves), Robert Levinson at U. C. Santa Cruz built a chess system, “ Morph,” that looks only one ply ahead. He used directed graphs to represent attack and defense relationships among pieces and squares and worked out ways to induce generalizations o f these relationships as a result of experi encing chess positions. His hybrid system adapts not only through gradual weight adjustment, but also by using an evolutionary algo rithm to change its own logical structure, and in the process internal ize new concepts. It is tempting to start, as Samuel did, with a set of atomic concepts corresponding to positional features that— so the beginner is told— are of fundamental importance. But a fundamental concept (for example, center control) is rarely elementary. Any approximation to this notion defined explicitly enough to allow testing as a board feature (say, occupation o f certain squares by pawns) is likely to be both too precise and too imprecise. It is too precise because alternatives, such as fianchettoed bishops, are excluded; it is too imprecise because occupation still may not guarantee control as, for example, when an occupier is backward on a half-open file. Let’s try to nail down the term “concept.” A concept is no ossified, changeless formation o f data, but an evolving construct that partici pates in its own formation. Concepts form as things are grouped, and meaning is seen in the juxtaposition. A child knows that two objects, however different, are still similar because they are both “things,” and understands, too, the companion observation that two objects, however identical, cannot really be the same because one can always distin guish, say, the one on the left. To a child, whose creative activity lies in
152
CHESS AND MACHINE INTUITION
discovering samenesses and differences, these notions are not incon gruous, but are merely instances o f the meaning one perceives in groupings of objects. Until language has calibrated the child’s mean ings, few of these perceptions agree with an adult’s estimation o f similarity. Guided by chance impressions, the child heaps together quite diverse objects and, simply by virtue o f its grouping, sees mean ing in the perceived unit.1As objects are added or removed, the mean ing changes. One would also expect an intuitive machine to acquire its concepts through play: by rearranging components just for the sake o f experi ment, and then reexamining arrangements that seem significant. Like the chess player, children explore variations, reviewing whatever has made an impression. As they examine alternatives, the “what ifs” acquire new significance. Concept formation is particularly noticeable when some problem arises that cannot be solved, when what is known fails to account adequately for the current situation: Wrestling with the problem produces new understanding, and each partial solution a change in conception. John Holland suggested that this form of knowledge acquisition might be supported by an adaptive, self-organizing classifier system based on message passing. A classifier system accepts message bitstrings from its environment, generates further messages upon finding matches with rule templates, and, after successive cycles o f internal message selection, eventually emits output messages that effect changes in the environment. As part o f this process, it forms categories and internal representations based on the regularities found in the input, and, by experiment, it discovers appropriate actions. In particu lar, a classifier system should, even when sparsely reinforced, be capa ble of learning stage-setting actions and coordinated action sequences. This is accomplished by matching active messages, both internal and external, with rule templates: The best matches activate further messages. Selection is based on specificity and “ strength”— a measure o f usefulness associated with each rule, which acts as specie in the classifier's internal economy. Strength is transferred from winning rules to those that led to their invocation, and additional credit is distributed to recently active rules upon reinforcement from the envi ronment. The rules are like players in a Monopoly game: A few pros per, some manage just well enough to stay in the game over many seasons o f financial fortune, and most go bankrupt. Genetic operators
1 Psychologists call this “syncretism,” a single-word oxymoron borrowed from theolo gy, which sounds more scholarly than, say, “coherent incoherence.”
MACHINE INTUITION
153
alter surviving rules to create new, often more specific rules which join the game to compete for invocation, and potential riches. Most important, a classifier system offers a mechanism for devising and testing competing hypotheses without disturbing capabilities al ready present. A perennial (and still largely unsolved) problem in concept learning is that o f striking an effective balance between rigidi ty and plasticity. The classifier system avoids the chief difficulty of the Hebbian “ use it or lose it” principle, for rules that perform well under specific circumstances do not wither from disuse, but, like the chess player’s skill at handling certain rare types of positions, remain in stantly available even after long dormancy. Concepts should vanish only when superseded by a more effective classification. A century after Torres y Quevedo’s pioneering effort, the KRK endgame is still an appealing microworld for experiment. Now, how ever, instead o f implementing an algorithm that guarantees mate, some of us are seeking to perfect a mechanism that can apprehend for itself the constraints imposed by the presence of the pieces, and can anticipate how these constrictions will change as pieces are moved. The challenge is to induce it to absorb the principles of coordinating action o f King and Rook in a dance o f zugzwang that confines the lone King to ever-smaller regions, and culminates in mate. This is achieved with the connivance of an opponent that puts up an active defense. It takes advantage o f every lapse, snapping up the Rook if it is left hanging and evading the mating net at every opportunity. But this opponent is more than just an endgame database; it must also serve as trainer. Effective pedagogy requires that the tutor understand not only what the trainee knows, but also what can (and cannot yet) be assimilated. During early training, the trainer supplies feints to show the trainee which forces apply in the current situation. As the trainee begins to react appropriately, further lessons illustrate more compli cated notions. Because there are no seed concepts to serve as a basis for building advanced notions, even the simplest understanding of board control will be acquired with difficulty. Already having concepts about bar riers that cannot be crossed, a child can quickly perceive that the Rook’s (unblocked) control o f a file acts as a moat that constrains the action o f the opposing King. The notion o f barrier that the child so easily transfers to the chess environment must, somehow, be commu nicated to the machine. Although complicated concepts can be formed from lower-level concepts, one cannot split concepts ad infinitum into even more rudimentary building-blocks, for when there is no longer sufficient context to support meaning, further decomposition is sense less. The notion o f uncrossability o f a barrier can no more be broken
154
CHESS AND MACHINE INTUITION
down into components than Wittgenstein’s chair, yet, like the chair, it can hardly be regarded as an atomic concept. Indeed, no such thing seems to exist: The moat-notion simply arises through interacting with reality to become part of the trainee’s belief o f how the world behaves. With this approach to artificial intelligence, new directions o f inqui ry keep springing up. There is, for example, the matter o f temporal order. The trainee learns to perceive a sequence o f events as part o f a familiar unfolding scenario in which certain actions take place. Now if sequences o f events and actions are anticipated, expectations arise. With expectation comes the capacity for surprise. Surprise is not just the result o f an uncommon, yet plausible, perception, but rather the jolt that occurs with the realization that something not deemed possi ble has occurred. It is surprise that leads to recognition of mistake, to reexamination o f the sequence o f events that led to it, and to revision of beliefs about how to act. Another intriguing area of experimentation is that of training imagination. Although necessary at first, the examination o f alterna tives need not always take place by moving pieces and retracting moves. With even rudimentary internal representations, one would expect some ability to pose internal “ what-ifs.” As simple a mechanism as rehearsing an action before carrying it out can allow a consequence to be anticipated and, perhaps, the action to be suppressed. And the capacity to imagine, like the capacity to perceive, can be trained. With practice comes familiarity. What must initially be overtly acted out can later take place automatically. The awkward hesitancy of experiment becomes transformed into a more efficient and skilled, but less mutable form, as if interpreted code had been compiled. An activity becomes orchestrated, as when the conductor so prepares an orchestra, that once the performance is underway, the musicians are attuned to each other and direction is no longer necessary. In my own experiments with classifier system learning, initial rapid progress is often followed by a seeming forgetfulness. Tantalizingly purposive activity arises spontaneously; then there are sudden shifts back to undirected aimlessness. But still the potential for previously exhibited goal-directed activity lurks in the background and, having occurred once, it is likely to recur. This unevenness is annoying, but I am encouraged by its resemblance to the inconstancy o f a child’s learning, and like to believe that some similar process is at work. Little is known about what kinds o f behavior one can expect, and what kinds one ought to expect. At some level o f complication, a classifier system becomes a complex dynamical system in which some activity remains relatively stable, some regions are marked by wild fluctua tions, and chaotic behavior occurs at boundaries between different
MACHINE INTUITION
155
patterns o f activity where minuscule differences in state can be greatly amplified. In a complex dynamical system, the only stable behavior is catatonic. Consider, too, the importance of conversation. Chess proficiency comes not just from practice, but from absorbing the lore of the game from other players, present and past; players improve their under standing by studying texts that explain tested principles of tactics and strategy. But solitary study alone rarely leads to mastery. Chess is, after all, a social activity: At club meetings and tournaments, people pass on their abilities by communicating their experiences and by offering advice. Serious students of the game test their ideas by re viewing positions and games in individual discussion with knowledge able players. To participate in the chess player’s culture, an intuitive chess ma chine must also communicate. How might a machine express the view, say, that in a given position the increasing pressure on f4 is a serious matter? Or offer as reason the danger o f line-opening tactical ex changes near the King? Though people describe these notions in natu ral language, augmented by a specialized vocabulary, communication of chess ideas does not demand such expressive richness. At every tournament one notices players engaged in lively postgame analysis which, in deference to conventions o f silence, is conducted chiefly by gesture. A player indicates an alternative course of action, moves a few pieces to show how the expected variation would run, and expresses positional understanding by pointing to significant constraints. During this exchange, received information meshes with beliefs and expectations as the participants discern each other’s meanings. New knowledge modifies existing concepts to refine personal understand ing, which will influence future play. An intuitive machine able to express its own chess ideas according to some common symbolic con vention could also participate in tutorial dialogue. By supplying suit able examples to illustrate a noteworthy idea, the machine calibrates its concepts with those o f its conversational partners and develops the ability to offer reasonable explanations— perhaps the most important characteristic o f intelligence. Imagine a community o f chess-playing machines, a subculture united by common interest and common mode of expression, whose chess lore is transmitted culturally through conversation among its members (which might also include a few humans). In the course of recalling and relating chess experiences, useful concepts acquire sym bolic representations— words, gestures, names— that evoke expecta tions o f chessboard events and playing methods. The language used to share concepts evolves: When information is expressed in several
156
CHESS AND MACHINE INTUITION
ways, the simpler form is more easily applied, and more often em ployed. Perhaps the most interesting problem is that of passion. Since exploration implies involvement, not detachment, the intuitive chess machine must be self-motivated, and exhibit a zest for play. Much of the enjoyment of chess play arises as ideas fall into place: Something clicks; understanding comes in a rush; there is a little convulsion of pleasure. For this to happen, ideas must be tried out, if only for the adventure of discovering what happens, and one must be ready to do so even when failure is likely. Freud remarked on the “ instinct for mas tery” that provokes reliving— and thereby making active— a passive situation. Such motivation requires dedication to the point o f passion. But speculation need not stop here, for if passion can be instilled in the machine, can emotion be far behind? Erik Mueller (1990) hypothe sized that it is the emotion accompanying an insight that directs attention. An emotion-driven intuitive machine might well experience regret at an action taken or not taken, elation upon finding a good continuation, or relief that a previously considered action was not carried out. It may be capable o f amusement, embarrassment, hope, or worry. And how about consciousness? A conscious machine might even delight in outplaying an able opponent! Back to reality. No one expects to imbue a chess machine with these qualities. Nor is anyone eager to invest years in training any single machine if it will still lose to brute-force play. The current interest in intuitive machinery is due chiefly to the excitement o f the. chase, and the expectation that marvelous discoveries lie just around the corner. An ever-changing population o f ideas about intuition is undergoing a lively process o f crossover and selection. The intriguing unsolved prob lems, like the game o f chess itself, can arouse that passion for mastery, which sooner or later will drive the inspiration o f a Turing or Shannon, or the persistence o f a Samuel. Organizing this work has been a great pleasure; may it encourage others to join the exciting quest for under standing intuitive processes and their inevitable role in any artificial intelligence.
appendix A:
Chess Notation
A b rie f guide to decip h erm en t o f A bbreviated A lgebraic Squares of the chessboard are referred to by letter-digit pairs which indicate file and rank. The White side of the board is rank one.
Piece names are abbreviated A square name means a pawn advance thither (if a promotion, the new piece is named) A letter pair gives from-to of a pawn capture A piece name tells w hich, followed by whither; an “x” in the middle is spoken as “takes” (in confusing cases, the origin is given, e.g., if the a-file Rook goes to e l) Kingside castling (K to g-file) Queenside castling (K to c-file) “ + ” is read “check” ; “ + + ” is mate
Examples K = King but N = Knight e4 d8 = Q de Ka5 Rxe8 R ae1 0 -0 0 -0 -0 Bb4+
8 7
6 5 4 3
2 1 a
b
e
d
e
f
g
h
157
158
CHESS AND MACHINE INTUITION
Annotators append special symbols as commentary: “!” means “good move” “!!” means “outstanding move” “?” means “poor move” “??” means “awful move”
appendix B %
Torres y Quevedo's Mating Algorithm
Torres’ scheme for effecting mate in the KRK endgame assumes an initial position with the automaton’s White King on a8, Rook on b8, and the opponent’s King on any unchecked square in the first six ranks. His algorithm for moving can be described in programming notation:
if then elseif then elseif then elseif then elseif then
both BK and R are on left side {files a,b,c} move R to file h {keep R out of reach of K} both BK and R are on right side {files f,g,h} move rook to file a {keep R away from K} rank o f R exceeds rank o f BK by more than one move R down one rank {limit scope o f BK} rank o f WK exceeds rank of BK by more than two move WK down one {WK approaches to support R} horizontal distance between kings is odd {make tempo move with R} if R is on a file then move R to b file elseif R is on b file then move R to a file elseif R is on g file then move R to h file else {R is on h file} move R to g file
endif elseif then else
horizontal distance between kings is not zero move WK horizontally toward BK {keep opposition} give check by moving rook down {and if on first rank, it’s mate}
endif If the opponent’s King is placed on a6, with best delaying tactics mate can be staved off for 61 moves. 159
appendix C:
Recursive Programming and the Minimax Algorithm
Recursive programming employs a habit of thinking that, despite years o f widespread use, remains alien to most programmers. To write instructions for a recursive process, one pretends that the job is already done, that is, that a program which performs the required function is available for use as a subordinate process. Provided that any subordi nate invocation of this process involves a less general task, that is, that a genuine reduction o f the amount of work to be done takes place, termination is guaranteed. As if by magic, a complete working pro gram, at first only assumed to exist, turns out to actually exist! Some inconvenience is associated with recursion. For example, when a recursive process is included in a computer program, care must be taken to ensure that no critical information is destroyed by a subordinate activation. A separate data area must be assigned to each activation to keep track of, for example, which chess position is under evaluation. This saving of variables is usually done with the help of data structures called stacks.1 Recursion also offers some advantages. By means o f self-invocation, a rather compact program is able to control an enormously complex process. Turing (1937) used the representational power of recursion in his “ Computable Numbers” paper to explore the limits o f comput ability. Some processes are much easier to describe and implement recursively than iteratively. One o f these is minimax:
1 Stacks were a central feature of Turing’s Automatic Computing Engine. He referred to the stacking and unstacking of parameters as “burying” and “disinterring” (nowadays “pushing” and “popping” ).
161 #
162
CHESS AND MACHINE INTUITION
recursive function MINIMAX(POSITION,DEPTH); {MINIMAX is the name of the process, which requires two inputs: a chess POSITION with white to move, and a number DEPTH indicating the ply level at which evaluation is to take place. The result of this process is the minimax value o f the position} if DEPTH = 0 then MINIMAX : = EVAL(POSITION) {the function EVAL evaluates at the bottom level} else begin MINIMAX : = FINDMOVES(POSITION,MOVES,NMOVES) {the move generator finds all legal moves from POSITION; the value produced and stored in MINIMAX is that of a loss, say -100, or zero if stalemate (NMOVES = 0 and no check)} if NMOVES > 0 {loop over legal moves} then for i := 1 to NMOVES do NEWPOSITION: = SWAPSIDES(MAKEMOVE(POSITION,MOVE(i))); {produces a new position, by making move i in POSITION, and then reversing Black and White sides} VALUE := -MINIMAX(NEWPOSITION,DEPTH-1); {here comes the magic: assuming that the MINIMAX function is available for use (not quite true at the time this line is written), it is called upon to produce a minimax value for NEWPOSITION (with depth decreased by 1); since this value is with respect to the Black side, its sign is reversed} if VALUE > MINIMAX then MINIMAX : = VALUE {MINIMAX contains the largest value found up to now; in this example, no record is kept of the associated move} end do end
References
Alexander, C. H. O’D. & Birdsall, D. (1973). A book o f chess. London: Harper & Row. Anantharaman, T., Campbell, M., & Hsu, F. (1988, December). Singular ex tensions: Adding selectivity to brute-force searching. International Com puter Chess Association Journal, 11(4), 135-143. Andric, D. (1970). Blitz Blitz! Chess life and review. 25, 308. Babbage, C. (1864). Passages from the life o f a philosopher. London: Longman, Green, Longman, Roberts, and Green. Bell, A. G. (1978). The Machine Plays Chess? New York: Pergamon Press. Berliner, H. J. (1978, August). Computer Chess. Nature, 274, 745-748. Berliner, H. J. (1984). Search vs. knowledge: An analysis from the domain of games. In A. Elithorn & R. Baneiji (Eds.), Artificial and Human Intel ligence (pp. 105-117). Amsterdam: North-Holland. Berliner, H. & Campbell, M. (1984). Using chunking to play chess pawn endgames. Artificial Intelligence, 23(1), 97-120. Berliner, H. J. (1986). Computer chess at Carnegie-Mellon University. In D. F. Beal (Ed.), Advances in Computer Chess 4 (pp 166-180). New York: Pergamon Press. Berliner, H. & Ebeling, C. (1986). The SUPREM architecture: A new intel ligent paradigm. Artificial Intelligence, 28, 3-8. Berliner, H., Kopec, D., & Northam, E. (1990, November). A taxonomy of concepts for evaluating chess strength. Proceedings Supercomputing 90, IEEE, 336-343. Binet, A. (1966). Mnemonic virtuosity: A study of chess players (M. L. Simmel & S. B. Barron, Trans.). Genetic Psychology Monographs, 74, 127-162. (Original work published in 1893) Boole, G. (1960). An investigation o f the laws o f thought. New York: Dover. (Original work published in 1854) Botvinnik, M. M. (1970). Computers, chess and long-range planning. New York: Springer-Verlag. Botvinnik, M. M. (1984). Computers in chess. New York: Springer-Verlag. Bratko, I., Tancig, P., & Tancig, S. (1976). Some new aspects of board recon struction experiments. Third European Meeting on Cybernetics and Sys tems Research, Vienna, Austria. Calvocoressi, P. (1980). Top secret Ultra. London: Cassell.
163
164
CHESS AND MACHINE INTUITION
Chase, W. G. & Simon, H. A. (1973). The mind’s eye in chess. In W. G. Chase (Ed.), Visual information processing (pp. 215-281). New York: Academic Press. Cleveland, A. A. (1907). The psychology of chess and of learning to play it. American Journal o f Psychology, 18, 269-308. Cohen, P. & Feigenbaum, E. (Eds.). (1982). The handbook o f artificial intel ligence. Los Altos, CA: William Kaufman. Condon, J. H. & Thompson, K. (1982). Belle chess hardware. In M. R. B. Clarke (Ed.), Advances in Computer Chess 3 (pp 45-54). New York: Pergamon Press. Cowan, J. D. & Sharp, D. H. (1988, Winter). Neural nets and artificial intel ligence. Daedalus, pp. 85-121. Crypton. (1986, June). The machine who would be king. Science Digest, pp. 74-78. de Groot, A. D. (1965). Thought and choice in chess. The Hague: Mouton. de Latil, P. (1956). Thinking by machine. (Y. M. Golla, Trans.). London: Sidgewick and Jackson. (Original work published in 1953) Dreyfus, H. L. (1979). What computers can't do (2nd ed.). New York: Harper & Row. Dunne, A. (1985, June/July). The check is in the mail. Chess Life, June 1985, p. 369 ff. and July 1985, p. 48 ff. Ebeling, C. (1987). All the right moves. Cambridge, MA: MIT Press. Elo, A. (1978). The rating o f chessplayers, past and present. New York: Arco Publishing. Fine, R. (1941). Basic chess endings. New York: David McKay. Flesch, J. (1982). Schachtaktik f ur jedermann (Chess tactics for everyman). Stuttgart: Franckh. Freud, S. (1961). Beyond the pleasure principle. (J. Strachey, Trans.). New York: Norton. (Original work published in 1920) Frey, P. W., (Ed.). (1983). Chess skill in man and machine. (2nd ed.). New York: Springer Verlag. Goldberg, D. (1989). Genetic algorithms in search, optimization, and machine learning. Reading, MA: Addison-Wesley. Golombek, H. & Hartston, W. (1976). The best games o f C. H. O’D. Alexander. Oxford: Oxford University Press. Good, I. J. (1968). A five-year plan for automatic chess. In E. Dale & D. Michie (Eds.), Machine Intelligence 2 (pp. 89-118). New York: American Elsevier. Good, I. J. (1980). Pioneering work on computers at Bletchley. In N. Metropo lis, J. Howlett, & G.-C. Rota (Eds.), A history o f computing in the twen tieth century (pp. 31-45). New York: Academic Press. Hartston, W. R. & Wason, P. C. (1983). The psychology o f chess. New York: Facts on File Publications. Hebb, D. O. (1949). The organization o f behavior. New York: Wiley & Sons. Hilts, P. J. (1982, October). Mind machines. Omni, p. 105 ff. Hodges, A. (1983). Alan Turing: The enigma. New York: Simon & Schuster.
REFERENCES
165
Hofstadter, D. R. (1979). Godel, Escher, Bach: An eternal golden braid. New York: Basic Books. Holland, J. H. (1975). Adaptation in natural and artificial systems. Ann Arbor: University of Michigan Press. Holland, J. H. (1986). Escaping brittleness: The possibilities of generalpurpose learning algorithms applied to parallel rule-based systems. In R. Michalski, J. Carbonell, & T. Mitchell (Eds.), Machine learning: An artificial intelligence approach (Vol 2). Los Altos, CA: Morgan Kauffman. Holland, J. H., Holyoak, K. J., Nisbett, R. E., & Thagard, P. R. (1989). Induction: Processes o f inference, learning and discovery. Cambridge, MA: MIT Press. Holland, J. H. (1989). Using classifier systems to study adaptive nonlinear networks. In D. Stein, (Ed.), Lectures in the science o f complexity. Read ing, MA: Addison-Wesley. Hsu, F., Anantharaman, T., Campbell, M., & Nowatzyk, A. (1990, October). A grandmaster chess machine. Scientific American, 263(4), 44—50. Hunt, E. B., Marin, J., & Stone, P. (1966). Experiments in induction. New York: Academic Press. Hyatt, R. M., Gower, A. E. & Nelson H. L. (1986). Cray Blitz. In D. F. Beal (Ed.), Advances in Computer Chess 4 (pp 8-18). New York: Pergamon Press. Hyman, A. (1982). Charles Babbage: Pioneer o f the computer. Oxford: Oxford University Press. Keene, R., Levy, D., & van den Herik, J. (1988, March). Botvinnik interviewed. International Computer Chess Association Journal, 22(1), 40—43. Kister, J., Stein, P., Ulam, S., Walden, W., & Wells, M. (1957). Experiments in chess. Journal o f the Association for Computing Machinery, 4, 174—177. Kling, J. & Horwitz, B. (1851). Chess studies, or endings o f games. London: Skeet. Kmoch, H. (1959). Pawn power in chess. New York: David McKay. (Reprinted by American Chess Promotions. Macon, GA. 1990.) Kopec, D. & Bratko, I. (1982). The Bratko-Kopec experiment: A comparison of human and computer performance. In M. R. B. Clarke (Ed.). Advances in Computer Chess 3 (pp. 57-72). New York: Pergamon Press. Kopec, D. (1989, September). Interview with Feng-hsuing Hsu, DEEP THOUGHT team leader. Chess Life, pp. 22-24. Kozdrowicki, E. W. & Cooper, D. W. (1974, August). When will a computer be world chess champion?” Computer Decisions, pp. 28-32. Leiber, F. (1962, May). The 64-square madhouse. Worlds o f IF, 22(2), 64-100. Lettvin, J. Y., Maturana, H. R., McCulloch, W. S., & Pitts, W. H. (1959). What the frog’s eye tells the frog’s brain. Proceedings o f the Institute of Radio Engineers, 47, 1940-1951. Levinson, R. (1991). Experience-based creativity. (Report UCSC-CRL-91-37). Santa Cruz, CA: University of California. Levy, D. (1978, November). Man Beats Machine. Chess Life and Review, pp. 600-603.
166
CHESS AND MACHINE INTUITION
Levy, D. N. (1986). Chess master versus computer. In D. F. Beal (Ed.), A d vances in Computer Chess 4 (pp. 181-194). New York: Pergamon Press. Levy, D. (1990, March). The end of an era. International Computer Chess Association Journal, 13(1), 34—36. Levy, D. & Newborn, M. (1990). How computers play chess. New York: W. H. Freeman. Marsland, T. A. & Schaeffer, J., (Eds.). (1990). Chess, computers, and cognition. New York: Springer-Verlag. Martin, G. R. R. (1972, August). The computer was a fish. Analog, LXXXIX(6), 61-74. McCorduck, P. (1979). Machines who think. San Francisco: W. H. Freeman. McCulloch, W. S. & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin o f Mathematical Biophysics, 5, 115-133. Mednis, E. (1989). Rook and Bishop versus Rook: The controversial endgame. International Computer Chess Association Journal, 22(1), 30-36. Menabrea, L. F. (1982). Sketch of the analytical engine invented by Charles Babbage, esq (Translated and annotated by A. Lovelace). In Babbage’s calculating engines, (pp. 6-51). Los Angeles: Tomash. Michalski, R. (1982). A theory and methodology of inductive learning. In R. Michalski, J. Carbonell, & T. Mitchell (Eds), Machine learning: An artificial intelligence approach. Palo Alto, CA: Tioga. Michie, D. (1977). King and Rook against King: Historical background and a problem on the infinite board. In M. R. B. Clarke (Ed.), Advances in Computer Chess 1 (pp. 30-59). Edinburgh: Edinburgh University Press. Michie D. (1980). A prototype knowledge refinery. In M. R. B. Clarke (Ed.), Advances in Computer Chess 2 . Edinburgh: Edinburgh University Press. Michie, D. (1982). Machine intelligence and related topics. New York: Gordon and Breach. Michie, D. (1983, March 9). Transcription o f a lecture at UCLA. Michie, D. & Johnston, R. (1985). The knowledge machine: Artificial intel ligence and the future o f man. New York: William Morrow. Michie, D. (1986). On machine intelligence. (2nd ed.). New York: Halsted Press. Minsky, M. & Papert, S. (1969). Perceptrons. Cambridge, MA: MIT Press. Mueller, E. T. (1990). Daydreaming in humans and machines. Norwood, NJ: Ablex. Newborn, M. (1975). Computer chess. New York: Academic Press. Newell, A. & Simon, H. A. (1972). Human problem solving. Englewood Cliffs, NJ: Prentice-Hall. Quinlan, J. R. (1979). Induction over large databases. (Report STANCS-79-739). Stanford, CA: Stanford University. Quinlan, J. R. (1982). Learning efficient classification procedures and their application to chess endgames. In R. Michalski, J. Carbonell, & T. Mitchell (Eds), Machine learning. Palo Alto, CA: Tioga Press. Quinlan, J. R. (1987). Decision trees as probabilistic classifiers. Proceedings 4th International Workshop on Machine Learning. Irvine, CA. pp. 31—37.
REFERENCES
167
Reinfeld, F. (1945). Win at chess. New York: Dover Books. Rosenblatt, F. (1962). Principles o f neurodynamics: Perceptrons and the theory o f brain mechanisms. Washington, DC: Spartan Books. Ross, P. E. (1991, November). Endless endgame? Scientific American, 265 (5), 38. Roycroft, A. J. (1987). Expert against oracle. In Machine Intelligence 11 (pp. 347-373). Oxford: Oxford University Press. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In D. E. Rumelhart & J. L. McClel land (Eds.), Parallel distributed processing, Vol. 1 (pp. 318-362). Cam bridge, MA: MIT Press. Samuel, A. L. (1959, July). Some studies in machine learning using the game of checkers. IBM Journal o f Research and Development, 3 , 211-229. Samuel, A. L. (1967, November). Some studies in machine learning using the game o f checkers II— Recent progress. IBM Journal o f Research and Development, 6, 601-617. Schaeffer, J. (1983). The history heuristic. International Computer Chess Asso ciation Journal, 6(3), 16-19. Seirawan, Y. (1991). Interviewed in I. Drasnin (Director) The chip vs. the chess machine. Nova Video. Boston: WGBH. Sejnowski, T. J. & Rosenberg, C. R. (1987). Parallel networks that learn to pronounce English text. Complex Systems, 1, 145-168. Shannon, C. E. (1948, July & October). A mathematical theory of communica tion. Bell System Technical Journal, 27, 379—423 & 623-656. Shannon, C. E. (1950, March). Programming a computer for playing chess. Philosophical Magazine, 41, 256-275. Simon, H. A. (1985). Some computer models of human learning. In M. Shafto (Ed.), How we know: Nobel conference XX. New York: Harper & Row. Simon, H. A. & Newell, A. (1958, January-February). Heuristic problem solv ing: The next advance in operations research. Operations Research, 6, 1- 10. Skiena, S. (1986). An overview of machine learning in computer chess. Inter national Computer Chess Association Journal, 9(1), 20-28. Spanier, D. (1984). Total chess. New York: E. P. Dutton. Staunton, H. (1847). The chess-player’s handbook. London: Henry G. Bohn. Steiner, G. (1974). Fields o f force. New York: Viking Press. Thompson, K. (1986). Retrograde analysis of certain endgames. International Computer Chess Association Journal, 9(3), 131-139. Turing, A. (1937). On computable numbers, with an application to the entscheidungsproblem. Proceedings London Mathematical Society (2), 42, 230-265. Turing, A. (1953). Digital computers applied to games. In B. V. Bowden (Ed.), Faster than thought (pp. 286-295). London: Pitman. Valvo, M. (1989). The Valvo-Deep Thought unix mail match. International Computer Chess Association Journal, 12(3), 183-190.
168
CHESS AND MACHINE INTUITION
van den Herik, H. J. (1983). Strategy in chess endgames. In M. A. Bramer (Ed.), Computer game-playing: Theory and practice (pp. 87-105). Chicester, England: Ellis Horwood Ltd. Welchman, G. (1982). The hut six story: Breaking the enigma codes. New York: McGraw Hill. Wilson, S. W. (1987). Classifier systems and the animat problem. In Machine learning 2 (pp. 199-228). Boston: Kluwer. Wittgenstein, L. (1953). Philosophical investigations. New York: Macmillan.
Author Index
A
E
Alexander, C.H.O’D., 26, 29, 33, 36, 43, 88, 163 Anantharaman, T., 112, 163, 165 Andric, D., 163
Ebeling, C., 109, 110, 163, 164 Elo, A., 52, 54, 54n, 55, 164
F Feigenbaum, E., 164 Fine, R., 21, 80, 119, 120, 122, 123, 125,
B Babbage, C., 3, 17, 18, 19, 20, 25, 35, 36, 41, 42, 132, 163 Bell, A.G., 43, 163 Berliner, H.J., 61, 105, 108, 109, 110, 111, 114, 117, 119, 121, 128, 163 Binet, A., 6, 74, 75, 163 Birdsall, D., 88, 163 Boole, G., 20, 30, 32, 163 Botvinnik, M.M., 87, 102, 122n, 163 Bratko, L, 82, 99, 163, 165
164 Flesch, J., 73, 164 Freud, S., 13, 164 Frey, P.W., 164
G Goldberg, D., 34, 164 Golombek, H., 35, 36, 164 Good, I.J., 30, 31, 33, 34, 36, 39, 95, 96,
164 Gower, A.E., 98, 165
C Calvocoressi, P., 163 Campbell, M., 112, 163, 165 Chase, W.G., 6, 82, 83, 164 Cleveland, A.A., 6, 75, 76, 79, 82, 164 Cohen, P., 164 Condon, J.H., 68, 103, 104, 105, 107,
164 Cooper, D.W., 63, 165 Cowan, J.D., 140, 164
D de Groot, A.D., 6, 46, 76, 76n, 77, 77n, 78, 80, 81, 82, 88, 164 de Latil, P., 164 Dreyfus, H.L., 57n, 86, 164 Dunne, A., 164
H Hartson, W., 164 Hebb, D.O., 11, 140, 164 Hilts, P.J., 164 Hinton, G.E., 143, 167 Hodges, A., 39, 164 Hofstadter, D.R., 165 Holland, J.H., 12, 145, 146, 147, 152,
165 Holyoak, K.J., 165 Horwitz, B., 126, 127, 128, 129, 165 Hsu, F., 1l l , 112, 113, 118, 135, 163,
165 Hunt, E.B., 10, 136, 137, 138, 139, 165 Hyatt, R.M., 56n, 97, 165 Hyman, A., 165
169
1 70
AUTHOR INDEX
J
R
Johnston, R., 166
Reinfeld, F., 106, 107, 113, 167 Rosenberg, C.R., 143, 144, 167 Rosenblatt, F., 11, 140, 141, 143, 167 Ross, P.E., 167 Roycroft, A.J., 128, 167 Rumelhart, D.E., 143, 167
K Keene, R., 122, 165 Kister, J., 45, 165 Kling, J., 126, 127, 128, 129, 165 Kmoch, H., 99, 165 Kopec, D., 98, 99, 102, 163, 165 Kozdrowicki, E.W., 63, 86, 165
L Leiber, F., 68, 165 Lettvin, J.Y., 141, 165 Levinson, R., 165 Levy, D., 1, 7, 8, 55, 60, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 98, 102, 122, 129, 165, 166
M Marin, J., 136, 165 Marsland, T.A., 99, 166 Martin, G.R.R., 166 Maturana, H.R., 141, 165 McCorduck, P., 166 McCulloch, W.S., 11, 139, 140, 141, 165,
166 Mednis, E., 128, 166 Menabrea, L.F., 19, 20, 166 Michalski, R., 166 Michie, D., 33, 34, 35, 36, 39, 86, 87, 92, 95, 96, 121, 137, 166 Minsky, M., 141, 166 Mueller, E.T., 156, 166
N Nelson, H.L., 165 Newborn, M., 102, 166 Newell, A., 47, 51, 53, 85, 86, 104, 166,
167 Nisbett, R.E., 165 Northam, E., 163 Nowatzyk, A., 112, 135, 165
P Papert, S., 86, 141, 166 Pitts, W.H., 11, 139, 140, 141, 165, 166
Q Quinlan, J.R., 10, 138, 139, 166
S Samuel, A.L., 9, 10, 113n, 132, 133, 134, 135, 142, 143n, 146, 151, 167 Schaeffer, J., 71, 166, 167 Seirawan, Y., 84, 167 Sejnowski, T.J., 143, 144, 167 Shannon, C.E., 4, 5, 32, 33, 36, 39, 40, 41, 42, 46, 55, 103, 132, 152, 167 Sharp, D.H., 164 Simon, H.A., 1, 6, 47, 51, 53, 81, 82, 83, 85, 86, 164, 166, 167 Skiena, S., 167 Spanier, D., 167 Staunton, H., 56n, 62, 167 Stein, P., 45, 165 Steiner, G., 167 Stone, P., 136, 165
T Tancig, P., 82, 163 Tancig, S., 82, 163 Thagard, P.R., 165 Thompson, K., 69n, 103, 104, 105, 107, 108, 119, 121, 121n, 126, 128, 12 9,1 30 ,16 4,16 7 , Turing, A., 4, 20, 25, 27, 29, 30, 31, 32, 33, 35, 36, 39, 42, 43, 44, 46, 50, 103, 139, 156, 161, 167
U Ulam, S., 45, 165
V Valvo, M., 100, 117, 167 van den Herik, J., 83, 84, 122, 165, 168
W Walden, W ., 45, 165 Wason, P.C., 164 Welchman, G., 27, 28, 29, 36, 168 Wells, M., 45, 165 Williams, R.J., 167 Wilson, S.W., 168 Wittgenstein, L., 11, 139, 154, 168
Subject Index t
50-move rule, 127-128
A ACE (Automatic Computing Engine), 36, 39, 42 ACM (Association for Computing Machinery), 60, 67 Alexander, C. Hugh O’D., 26, 29, 33, 36, 43, 88 Alien style, 117 Alpha-beta, 49, 50, 56, 66, 71, 105, 109, 113, 132 Analytical Engine, 3, 19-20 Anantharaman, Thomas, 112-113 Anticlerical chess, 45 Anticomputer play, 92 Artificial Intelligence Laboratory, 53 Assigning grades, 58 Atkin, Larry, 59, 65-67, 70, 91 AUTOCODE, 42 Averbakh, Yuri, 80
B Babbage, Charles, 3, 17-20, 35, 41-42, 132 Back-propagation, 11, 143 Backplane, 103 Backward pruning, 49 Basic Chess Endings, 80, 119-120, 122, 125 Beal, Don, 98 Bell Laboratories, 4, 32-33, 103-104, 108, 119, 130 Belle, 97, 104-109 Berliner, Hans, 61, 105, 108-111, 114, 117, 119-121 Bernstein, Alex, 47 Binet, Alfred, 74-75 Bletchley Park, 4, 24—26, 29-34, 39
Blitz, 105 Boltzmann machine, 142-143, 146 Bomba, 26-27 Bombe, 28, 33, 35 Boole, George, 20, 30, 32 Boolean algebra, 20 Botvinnik, Mikhail, 87, 102, 122 Bowden, B.V., 42 Bratko, Ivan, 82, 99 Bratko-Kopec experiment, 99 British Computing Society, 102 Browne, Walter, 70, 121 Buenos Aires, 20 Bugs, 63 Byrne, Robert, 101
C Cambridge University, 24 Campbell, Murray, 112 Capablanca Memorial Tournament, 60 Capablanca, Jose Raoul, 69, 149, 151 Carnegie-Mellon University, 47, 108, 111, 114 Cellular automata, 145 Champernowne, David, 39 Chaos, 57 Chase, William, 82-83 Checkers, 132 CHEOPS, 93, 96 Chess culture, 155 Chess playing compared with literacy, 75 CHESS X.X, 59-62, 65-70, 87-88, 91-95 Chinese Museum, 17 ChipTest, 112-114 Chunk, 82, 110 Churchill, Winston, 29, 34 Classifier system, 152-154 Clement, Joseph, 18
171
172
SUBJEC T INDEX
Cleveland, Alfred, 75-76, 79, 82 Clockwork toys, 15 Cognitive map, 144 COKO, 63-64 Colossus, 34-35 Columbia University, 61 Combinatorial explosion, 46, 134 Computable Numbers, 25, 30, 32-33, 161 Computers in US Open, 97 Concept building, 153 Condon, Joe, 68, 103-107, 115 Connection Machine, 130 Connectionism, 11, 150 Contempt factor, 70 Control Data, 59-60 Conversation, 155 Cooper, Dennis, 63 Cowan, Jack, 140 Cray Blitz, 56, 97-98, 111, 114, 121 Creative anarchy, 29 Credit assignment, 134 Crook, Russell, 67 Crossover, 147 Custom chips, 109
D DARPA, 109 Darwin machine, 147 Day, Lawrence, 119-121 De Groot, Adriaan, 46, 76-82, 88 Decision tree, 137-138 DECtalk, 143 Deep Thought, 8, 100-102, 111, 114-118, 135 Denniston, Alastair, 23-26, 29 Dialogue, 155 Difference Engine, 18-19, 42 Differential equations, 30 Discrete mathematics, 30 Dreyfus, Hubert, 86
E Ebeling, Carl, 109-110 Edinburgh University, 36 Electronic components, 31 Elo, Arpad, 53 Elo rating, 54 Endgames KBBKN, 8, 124, 126, 128-129 KBKN, 123 KNNKP, 128 KQKBB, 128 KQKNN, 128
KQPaKQ, 129 KQPKQ, 128-129 KQRKQ, 124 KRBKNN, 130 KRBKR, 127-128 KRK, 3, 21, 122-124, 153, 159 KRKN, 8, 10, 123-125, 128, 137-139 KRKQ, 119, 121 KRPKR, 90, 129 Endgame database, 121-124 Endgame weakness, 89 Enigma, 4, 24, 26-28 Euston Station, 25 Evaluation function, 110 Ever-possible moves, 104, 109 Evolutionary algorithm, 151 Expectation, 154 Expert system, 139 Expertise, 6
F Faster Than Thought, 42 FIDE, 97, 116, 127-128 Fine, Reuben, 21, 80, 119-120, 122-123, 125 Finite difference calculus, 17 Fischer, Bobby, 60, 73-75, 67, 91, 93 Fish, 32-35 Flesch, Janos, 73 Flowers, T. H., 31, 34, 36 Fredkin Foundation, 115 Fredkin, Edward, 96, 104, 116 Freud, Sigmund, 13, 156 Friedman, William, 29 Full-width search, 65-67, 88
G Game Game Game GCCS
adjudication, 61 theory, 39-40 tree, 31 (Government Code and Cypher School), 23-25, 28 Genetic algorithm, 12, 145, 150 GENIE, 64 Gettysburg Address, 131 Glennie, Alick, 42—43 Global-state recognizer, 110 G olf Club and Chess Society, 26 Golombek, Harry, 35-36 Good, I.J., 30-31, 33-36, 39. 95-96 Gorlen, Keith, 59, 65 Gower, Bert, 97-98 Gratuitous comments, 91-92
SUBJECT INDEX
Great Wager, 86-87, 93-95, 102 Greenblatt, Richard, 53, 55, 61, 86, 93
H Hansen, Ron, 67 Heath Robinson, 34 Hebb, Donald, 11, 140 Hecht-Nielsen, Robert, 143 Henry, John, 85 Herceg Novi, 74-75 Hierarchy, 136 Hilbert, David, 25 Hinton, Geoffrey, 143 History heuristic, 71 Hitech, 110-111, 114, 117 Holland, John H., 12, 145-147, 152 Horizon effect, 44 Hsu, Feng-hsiung, 111-113 Hunt, Earl, 10, 136-139 Huxley, Thomas Henry, 6 Hyatt, Robert, 56, 97-98
I IBM, 133, 145 IBM 704, 134 Incoherent chess, 89 Induction, 135-138 Inductive reasoning, 10 INFANT, 150 Information Processing Languages, 47 Information theory, 33, 42 Integrated circuits, 104 Intuitive experience, 6, 78 Intuitive knowledge, 6-7 Iterative deepening, 71
J J. Biit, 61-62 Jacquard loom, 19
K KAISSA, 86, 92, 126 Karpov, Anatoly, 87, 97, 127 Kasparov, Garry, 8, 87, 101-102 Killer heuristic, 67 King, Kenneth M., 60 Kling-Horwitz exits, 128 Kling-Horwitz position, 126, 129 Knowledge-based play, 5 Knox, Dillwyn, 25 Kolmogorov, Andrei, 143 Kopec, Danny, 98-99, 102 Korchnoi, Viktor, 87
173
Kotov, Alexander, 80 Kozdrowicki, Ed, 63, 86, 95 Kruskal, Martin, 45 Kuperstein, Michael, 150
L Larsen, Bent, 116 Levinson, Robert, 151 Levy, David, 1, 7-8, 55, 60, 86-96, 98, 102, 129 Linear polynomial, 58, 132-134 Linnean taxonomy, 136 Logical consistency test, 27 Look-ahead carry, 19 Los Alamos Scientific Laboratories, 45 Lovelace, Augusta Ada, 19-20
M MacHack, 7, 53, 55-59, 61, 86, 93-94 Maelzel, 16-17 Manchester University, 42 MANIAC, 45 Marshall Chess Club, 60 Marshall, Frank, 69 Marsland, Tony, 99 Martian chess, 121 Master games, number of, 46 Maximin, 31, 121 McCarthy, John, 49, 86 McCracken, Dan, 95, 98 McCulloch, Warren, 11, 139, 141 Mednis, Edwin, 128 Menabrea, L.F., 19 Menzies, Stewart, 28-29 Michie, Donald, 33-36, 39, 86-87, 92, 95-96, 102, 121, 137 Milner-Barry, Stuart, 26, 29, 36 Minimax, 31, 38-39, 109, 134, 161 Minsky, Marvin, 141 MIT, 36 Morgenstern, Oskar, 31, 39 Morph, 151 Morphy, Paul, 62 Mr. Turk, 63 Mueller, Erik, 156 Mutation, 147
N Napoleon plays Turk, 16 National Physical Laboratory, 36 Nature-nurture, 145 Nelson, Harry, 97-98 NETtalk, 144
174
SUBJECT INDEX
Network, 11, 139 Newborn, Monroe, 60 Newell, Allen, 47, 53, 85-86, 104 Newman, M.H.A., 32, 36 Northwestern University, 59 Nowatzyk, Andreas, 112, 135 NSS, 47-51, 85-86 Nuchess, 108
O Obvious move processing, 67 Omni Magazine, 95, 102 Opening library, 68 Oxford University, 24, 33, 36
P Papert, Seymour, 86, 141 Parameter adjustment learning, 133 Parry, Jim, 67 Passion, 156 Pawn defects, 40 Perceptron, 140-141 Perfect endgame knowledge, 126 Perfect information, 40 Performance rating, 115 Personal chess computers, 97 Petrosian, Tigran, 74, 87 Pitts, Walter, 11, 139, 141 Plausible move generator, 47, 55-56, 65 Playing strength increase with search, 69, 71 Playing style, 87 Playing time-outs, 61 Poduska, John, 63 Poe, Edgar Allan, 17 Political detention, 108 Polytechnic University, 22 Portisch, Lajos, 68 Position balance, 48 Position evaluation, 38 Post Office Research Department, 36 Postal chess, 100 Prepared variation, 69 Princeton, 32 Printed circuit cards, 103 Provisional rating, 113 PURPLE cipher machine, 29
R Random selection, 41 Recondite opening, 108 Recursive, 31, 47, 53, 161-162 Reinfeld, Fred, 106-107 Relaxation, 11, 142 Reshevsky, Sammy, 151 Retrograde enumeration, 121, 124 RIBBIT, 67, 69 Room 40, 23-24, 29 Rosenberg, Charles, 144 Rosenblatt, Frank, 11, 140-141 Rote learning, 132 Royal Society, 18 Royal Society Computing Laboratory, 42 Roycroft, John, 128-129 Rules o f thumb, 79 Rumelhart, David, 143 Runyon, Damon, 136
S Samuel, Arthur L., 9-10, 113, 132-135, 146 Schaeffer, Jonathan, 71 Schlumberger, William, 17 Search-based play, 5 Seirawan, Yasser, 84 Sejnowski, Terrence, 143-144 Shallow strategic play, 88 Shannon, Claude E., 4 -5 , 32-33, 39-42, 103, 132 Shaw, John, 47, 53, 85-86 Short-term memory, 81 Signature table, 134 Simon, Herbert, 1, 47, 53, 81-83, 85-86 Singular extension, 113 Slate, David, 59-61, 65-67, 70, 91-93 Smyslov, Vassily, 74 Software Availability Bulletin, 59 Spassky, Boris, 126 Special Liaison Units, 28 Stacks, 161 Staunton, Howard, 56, 62 Steinitz, Wilhelm, 57 Stiller, Lewis, 130 Swiss-system, 62
T Q Quiescent positions, 31 Quinlan, Ross, 10, 138-139
Tables, numeric, 17 Tal, Mikhail, 74, 87 Tancig, C., 82
SUBJECT INDEX
Tancig, S., 82 Thinking speed, 78 Thompson, Ken, 68, 103-107, 115, 119, 121, 126, 128-129 T-shirt, 106 Threshold gate, 141 Time usage, 90-91 Torres y Quevedo, Leonardo, 3, 12, 20-22, 122, 159 Training examples, 134 Transistors, 103 Transposition table, 56, 69 Traveling salesman problem, 142 Travis, Edward, 29 Turing, Alan M., 4, 25, 27, 29-33, 35-36, 39, 42-44, 50, 103, 132, 161 Turk, 3, 15-16 Turochamp, 39 Tutte, W.T., 32 Type A strategy, 40-41, 45 Type B strategy, 41-42, 132
U U.S. State Department, 60 Ultra, 28
175
University of Michigan, 145 University of Waterloo, 67 USCF (United States Chess Federation), 53-55
V Vacillation, 89 Vacuum tubes, 103 Valvo, Mike, 100-101, 117 Van den Herik, H. Jaap, 83-84 VLSI, 109, 111, 115 Von Kempelen, Wolfgang, 3, 15-16 Von Neumann, John, 11, 31-32, 39, 140
W Watson, William, 97 Weariness factor, 72 Welchman, Gordon, 27, 29, 36 Whirlwind (MIT computer project), 36 Williams, Ronald, 143 Win at Chess, 106-107, 113 Winograd, Shmuel, 140 Wittgenstein, Ludwig, 11, 139 Wylie, Shaun, 39