Архивоведение и делопроизводство

The mathematics books for both young ladies and young men open with a discussion of the four fundamental operations-addition subtraction multiplication and division. Students were asked to learn the terms addend minuend subtrahend difference multiplicand multiplier product divisor dividend and quotient.



285 KB

0 чел.


THE ACADEMY AWARD-WINNING MOVIE Sense and Sensibility presented a wonderful vision of life in early nineteenth-century England. In the absence of television, radio, movies, and videos, families sought entertainment in a manner far different from today's. The Dashwood girls--Elinor, Marianne, and Margaret--filled their days with visiting, reading, practicing the pianoforte, needleworking, and letter writing, not to mention gossiping and matchmaking. Long days were highlighted by a wonderfully relaxed midday family meal, during which conversation was paramount. Above all, Jane Austen portrays a concern for the thoughts and feelings of one's immediate acquaintances and pride in one's village.

An upper-middle-class family of the landed gentry-the Dashwoods--would have been interested in a proper education for their children. The textbooks of that era are a clue as to what was considered essential mathematics. Gender and class differences abound. Books were scarce. Paper was not mechanically produced until 1801, and power driven printing machines did not appear until 1812. Only a few books were available for instructors and students.

The mathematics books for both "young ladies" and "young men" open with a discussion of the four fundamental operations--addition, subtraction, multiplication, and division. All books' carefully discuss these concepts and the related vocabulary. Students were asked to learn the terms addend, minuend, subtrahend, difference, multiplicand, multiplier, product, divisor, dividend, and quotient. Each section included basic algorithms for calculating. In general, both the algorithms and vocabulary have endured and are similar to those found in contemporary American textbooks. Although explanations were brief, the basic material is the same.

After the opening sections on the basics, the differences are great. Commonly, the four operations are followed with a chapter on the "rule of three direct," or "golden rule," which is now called ratio and proportion and is solved by cross multiplying. After learning four operations and one method for finding an unknown quantity, students were thought to be prepared for adult mathematics. Books ended with tables--especially for weights, measures, and money--all of which were notoriously awkward in the English system.

We next examine the unique features of three books that were widely circulated between 1800 and 1810, or during the height of the Napoleonic era. An effort has been made to preserve the original grammar, spelling, capitalization, and punctuation.


Like the accomplished and distinguished writer Jane Austen, William Butler (1806) furnishes the reader with a sample of domestic detail in his book intended for "young ladies." See figure 1. Butler asserts that he is an experienced teacher of young ladies. His book presents mathematics in an unusual format. We find 619 problems, which are organized in alphabetical and numerical order. The topics range from astronomy, anchovies, and cork to parchment, the plague, and the steam engine. The literary content of the problems is worthy of an Austen. The problems display an integrated approach to teaching mathematics and cut across literature, history, science, and geography. They include quotations from Virgil, Milton, Pope, Shakespeare, and the Bible. Moreover, beginning with addition, the author apparently tried to increase the degree of difficulty of the mathematics as he progressed through the arrangement. Reading a problem or two helps one appreciate the scope and sequence of the curriculum.


No. 36 Pay a baker's bill of two pounds, a grocer's of three pounds, a milliner's of five pounds, a linen-draper's of sixteen poufids, and a cheesemonger's of seven pounds, and find the amount of the whole.

No. 38 Virgil, the celebrated Latin poet, was born near Mantua, in Italy, seventy years before the nativity of our Saviour; how many years have elapsed since that event to the present year 18057


No. 92 Magna Charta. Runnymede... is reverenced by every son of liberty, as the spot where the liberties of England received a solemn confirmation... [and is] considered the bulwark of English LIBERTY. The celebrated charter in question was wrenched from John in 1215; How long has that happy event preceded 18057

Ans. 590 years.


No. 133 Coaches. Coaches, as well as almost all other kinds of carriages which have since been made in imitation of them, were invented by the French, and the use of them is of modem date. Under Francis I who was a contemporary with our Henry VIII there were only two coaches; that of the queen, and that of Diana, natural daughter of Henry II. The kings of France, before they used these machines, traveled on horseback; the princesses were carried on litters, and ladies rode behind their squires. Till about the middle of the 17th century there were but few coaches in Paris; but prior to the late revolution in that capital, they were estimated at 15,000, exclusive of hackney-coaches (horse drawn taxis), and those let out for hire.

The introduction of coaches into England is ascribed by Mr. Anderson, in his History of Commerce, to Fitz Allen, earl of Rundl, in the year 1580; and about the year 1605, they were in general use among the nobility and gentry of London.

In the beginning of the year 1619, the earl of Northumberland, who had been imprisoned since the Gunpowder-Plot, obtained his liberation. Hearing that Buckingham was drawn about with six horses in his coach (being the first that was so) the earl put on eight to his, and in that manner passed from the Tower through the city.

Hackney-coaches, which, according to Maitland, obtained this appellation from the village of Hackney, first began to ply the streets of London, or rather wait at inns, in 1625, and were only twenty in number. So rapid, however, has been their increase since that period, that London and Westminster now contains 1100.

Suppose each coach to earn 16 shillings a day on an average, which is deemed a very moderate computation, the sum of Ј880 sterling is expected daily in the metropolis, in coach-hire, exclusive of what is spent in glass coaches, or unnumbered ones. What is the weekly, monthly, and yearly expenditure in the use of these vehicles?

Ans. Ј6,160 per week; Ј24,640 per month, and Ј321,200 per year; reckoning 13 months 1 day to the year.

A common form for correcting the calendar was to add a thirteenth month from time to time. This method was used in 1806. Butler's answer may be calculated using 365 + 1 days.

Multiplication of money, or compound multiplication

No. 219 Potatoes. Potatoes are the most common esculent (succulent) root now in use among us, though little more than a century ago they were confined to the gardens of the curious, and presented as a rarity. They form the principal food of the common people in some parts of Ireland:

Leeks to the Welsh, to Dutchmen butter's dear; Of Irish swains, potatoes are the cheer.


Potatoes were originally brought to us from Santa Fe, New Mexico, North America, and as has been asserted, by Sir Francis Drake, in the year 1586. Others mention the introduction of them into our country about 1623; whilst others affirm that they were first cultivated in Ireland, about Younghall, in the county of Cork, by Sir Walter Raleigh, in 1610, and that they were not introduced into England till the year 1650. Peru, in South America, is the natural soil of potatoes, particularly the fertile province of Quito, whence they were transplanted to other parts of America. It is the root only of the potato plant that is eatable.

There are two varieties in general use; one with a white, and the other with a red root. And besides these, there is a new kind, first brought from America, which that "patriot of every dime," the late Mr. Howard, cultivated in 1765 at Cardington, near Bedford. They were also propagated in from the adjacent counties. Many of these potatoes weigh four or five pounds each; and hogs and cattle are found to prefer them to the common sort. They are moreover deemed more nutritive than others; being more solid and sweet, and containing more farina or flour. As an esculent plant, they appear also worthy of cultivation; being it is said, when well boiled, equal, when roasted, preferable to the common sort.

Immense quantities of potatoes are raised in Lancashire for exportation. Mr. Pennant says, that 30 or 40,000 bushels are annually exported to the Mediterranean Sea from the environs of Warrington, at the medium of 1 shilling 2 pence per bushel. A single acre of land sometimes produces 450 bushels. What are 179 bushels of potatoes worth at one shilling, 2 pence per bushel?

Ans. Ј10 8 shillings 10 pence.


No. 243. Velocity of light.

Lot there be light, said God, and forthwith light Ethereal, first of things, quintessence pure, Sprung from the deep ....

Mathematicians have demonstrated, that light moves with such amazing rapidity, as to pass from the sun to our planet in about the space of eight minutes. Now, admitting the distance, as usually computed to be 95,000,000 of English miles, at what rate per minute does it travel?

Ans. 11,875,000 miles.

At 438 pages, this book is longer than most textbooks of the Austen era. Butler includes elaborate notes, footnotes, and references for further study. See figure 2. He closes with twenty-seven pages of arithmetic tables, including the essential monetary conversions, abbreviated here for the contemporary reader. The one obvious omission in this highly successful book is illustrations. Not one drawing is included.

4 farthings = 1 pence               [4 qrs. = 1 d.]

12 pence = 1 shilling               [12 d. = 1 s.]

20 shillings = 1 pound sterling     [20 s. = 1Ј]

21 shillings make 1 guinea


A young man's mathematics instruction was designed to give "secret satisfaction to the possessor, and contribute to render him an agreeable and useful Member of Society." Unlike the problems in Butler, those in Wallis's The Self-Instructor, or, Young Man's Best Companion (1811) are stated without elaboration:

What is the value of 21 gallons of brandy at 7 shillings 9 pence per gallon?

What is the value of 108 lbs. of indigo lahore at 7 shilling 8 pence per pound?

Reduce 246 Venetian ducats de Banco into sterling money.

Admit an army of 32,400 men were formed into a square battalion. Find the rank and file.

For the golden rule or rule of three, that is, ratio and proportion, the student is advised that the "chief difficulty is the placing of the numbers."

If 12 gallons of brandy cost Ј4, 10 shillings, what will 134 gallons cost?

The author suggests that Ј4, 10 shillings be changed to "the lowest mentioned."

The Self Instructor, or, Young Man's Best Companion contains chapters for "Joiners, Painters, Glaziers, Sawyers, and Bricklayers," as well as a chapter on "Planometry." See figures 3 and 4. The book discusses such unusual regular figures as the undecagon, which has eleven sides, and the quindecagon, with fifteen sides. It includes globes, cones, and pyramids, including the frusta, as well as algorithms for taking square and cube roots. It teaches a method for finding the length and mast of a ship. The young man is given information on the financial arrangements for bookkeeping, wills, and legal matters. The sections on longitude and latitude, which were important measures for a seafaring country, include the following: "New Mexico, including California, is bounded by 'unknown lands' on the north, Louisiana on the East, Old Mexico, and Pacific on West. The chief town is Santa Fe 36 degrees north latitude and 104 degree west longitude." The book closes with the statement that algebra was first known in Europe in 1494 and that printing-of all types--had been carried on in Westminster Abbey from that time until now.

Decimals are included, and the book describes unusual methods of handling "vulgar [common] fractions" (Wallis 1811, p. 96).

To reduce fractions

To reduce a fraction, a prime common divisor, not necessarily the greatest common divisor, was written in the position of an exponent.

56[sup 2]|28[sup 2]|14[sup 7]|2/84|42|21|3

To add fractions

A product of the denominators, not the LCD, was used.

3/4 + 2/7 + 5/6 = 126/168 + 48/168 + 140/168 = 314/168

To divide fractions

The method was not to invert and multiply but to find the product of the numbers joined by each of the first pair of arrows.

(15/16) / (2/3) = 45/32 = 1 13/32

(15/16) / (3/2) = 32/45 = 1 13/32


Those two books represent the mathematics typically learned by middle-and upper-middle-class teenagers. The reader might ask about younger readers and more advanced mathematics students. For the answer, we turn to the Opie Collection, a special collection in the Bodleian Library, Oxford University. When it was presented to the university, the Opie, as it is known, was dedicated by Prince Charles, and it will soon be available for North American viewers through the UMI microfiche collection. See www.umi.com/hp/Support/Research /Files/220.html.

The 20 000 items in the collection include fairy tales, nursery rhymes, games, comic books, and coloring books, as well as game boxes and other educational items. Early American children's books, especially those reprinted in London, are part of the Opie. They include Tommy Thumb's Song Book (1794), which is thought to be the earliest known surviving edition of what may have been the first English nursery rhyme book. Mother Goose had already been printed in Boston. Then, and now, rhymes with counting were considered to be a child's first, and possibly best, introduction to arithmetic.

For the youngest students

At 3 3/4 by 2 1/4 inches, the size is the first thing that one notices about A Compendium of Simple Arithmetic; in which the First Rules of That pleasing Science are made familiar to the Capacities Of Youth, a book for elementary-school-age children. These books were "little books for little people." Indeed, the Opie has books so small that they can scarcely be held between the thumb and index finger. They typically begin with writing and spelling the counting numbers. Wallis's Compendium (1800), the title page of which is shown in figure 5, then goes to great length to explain the advantages of the "cypher," or place value, and the "decadary" system.

Wallis writes--for young children--that "neither a Euclid nor an Archimedes with all their wonderful mechanical powers" was able to extricate their number system from a "labyrinth of confusion." As in other titles of this decade, addend, minuend, and subtrahend are explained, but products are composed of "factors." The checking of subtraction and division is called the "PROOF" in bold letters. Definitions appear, for example, "Simple division is the finding how often one simple number is contained in another. The calculation is written as

              Dividend   Quotient

Divisor 3)       12        (4

or, for a longer problem, as follows:

833)   3104679  (3727   88/833





Another definition is, "Reduction is the conversion of numbers from one name to another, but still retaining the same value." Although it was written for young children, this tiny book, like all titles in this article, contains tables for wine measure, as well as ale and beer measure. See figure 6.

For more advanced students

Three particular qualities of mathematics of this era should be noted:

1. For British students, advanced mathematics was synonymous with geometry, and most students studied an edition of the first six books of Euclid's Elements. The most popular edition of that time was the one by Robert Simson, of the University of Glasgow. The obvious advantage the student using later editions by Simson is that the Euclidean propositions were each followed by a proof. Moreover, each book of Euclid was accompanied by sample examination questions.

Another edition of Euclid was written by John Playfair, of the University of Edinburgh. It contains his Axiom 12, now known as Playfair's axiom, which states, "Two straight lines that intersect one another cannot both be parallel to the same straight line." This statement, and its implied deviation from earlier editions of Euclid, evolved into the largest controversy of nineteenth century British mathematics.

In his preface (1795, pp. iv-v), Playfair remarks that Dr. Simson had been "the most successful" modern editor and had left "very little room for the ingenuity of future editors to amend or improve the text of Euclid or its many translations." Playfair wrote that Simson's objective was "to restore the writings of Euclid to their original perfection, and to give them to modern Europe as nearly as possible in the state wherein they made their first appearance in ancient Greece." Playfair praised Simson by stating that he knew languages, was profoundingly skilled in geometry, and was an "indefatigable" researcher. To "restore" Euclid was a perfect mission for Simson. Playfair, however, believed that despite Simson's endeavors to remove corruptions, something was "remaining to be done." Playfair wrote that "alterations might be made that would accommodate Euclid to a better state of the mathematical sciences," and thus the Elements would be "improved and extended," more than at any "former period."

2. Until the American Revolution, one book--a single copy--was typically shipped across the Atlantic and then carefully used by an instructor to lead advanced mathematics students through a course of study. The Revolution brought about a change. In 1803, for example, an edition of Simson was printed in Philadelphia. Copies of Simson's later editions are still available in several older libraries.

3. Although mathematics journals existed, scant exchange occurred between German mathematicians and French or English mathematicians. However, the Opie collection does contain a fine translation from the University of Paris of Selected Amusements in Philosophy and Mathematics proper for agreeable exercising of the Minds of Youth (Despiau 1801). The introductory material is similar to that in the English books previously described in this article, but it ends with a discussion of topics that are now associated with probability. It includes factorials, permutations, combinations, Pascal's triangle, and various types of "gaming" odds--all topics that were highly developed in France. Actuarial tables on expected length of life include corrections for the large number of deaths that occurred in the first year of life. See figures 7, 8, and 9.


The British Library has a copy of the "most important parts" of the arithmetic and algebra examinations required of candidates for an "ordinary" bachelor of arts degree from Cambridge in the early nineteenth century. A Cambridge or Oxford degree did not--and still does not--have "breadth" requirements. Unlike in American universities, one who "reads maths" studies no other subjects. The undergraduate degree is given at the end of three years. Mathematics majors must successfully write examinations that include only mathematics questions.

The arithmetic problems in the early 1800s required computational skills, conversion of measures and money, extraction of square and cube roots, and applications to business, especially interest and discount. Most of the algebra is commonly taught in high school today. However, some problems are unusual, whereas others are surprisingly familiar. Consider, for example, the following:

5. What will be the price of carpeting a room of 13 feet 4 inches long, and 12 feet 6 inches broad, at 4 shillings 6 pence a square yard?

Ans. Ј4. 3s. 4 d., or 4 pounds sterling, 3 shillings 4 pence.

12. Extract the square root of x[sup 4] + 8x[sup 3] - 64x + 64.

Ans. x[sup 2] + 4x - 8.

13c. Solve the equation

1/x + a + 1/x + 2a + 1/x + 3a = 3/x.

Ans. -11 +/- Square root of 13/6 a.

15. Expand

1/Square root of a - x

to 4 terms by the binomial theorem.

Ans. 1/a[sup 1/2] + x/2a[sup 3/2] + 3x[sup 2]/8a[sup 5/2] + 5x[sup 3]/16a[sup 7/2] + &c.

The answer in Arithmetic and Algebra (Wallis 1835 p. 327) is incorrect. The answer should be

1/a[sup 1/2] + x/2a[sup 3/2] + 3x[sup 2]/8a[sup 5/2] + 5x[sup 3]/16a[sup 7/2] + &c.

See Anton (1992, p. 730).

16. Insert 6 arithmetic means between 1/2 and 2/3.

Ans. 1/2, 11/21, 23/42, 4/7, 25/42, 13/21, 9/14, 2/3,

Find the sum of the series.

Ans. 4 2/3.

17. Define a logarithm; and shew that log N[sup p] = p log N. Having given log[sub 10][sup 2] = .30103 and log[sub 10][sup 3] = .4771213, find log[sub 10]36 and log[sub 10].018.

Ans. 1.5563026 and 2.2552726.

The answer 2.2552726 represents the centuries old notation and "tables" answer of characteristic + mantissa, or (-2) + (.2552726), and is equivalent to the contemporary calculator answer of (-1.7447275).


These publications furnish a record of the skills thought to be essential at the turn of another century. These mathematical records illustrate the continued need to develop good materials and tests. The era that gave us the legendary names of Trafalgar, Waterloo, Nelson, and George III was preparing its young for the increasingly complex global society.

In Britain today, parents are just as concerned as Americans with the education of their children. Specific topics debated in Parliament and discussed in the media are uncannily similar to those in the United States. Testing for teacher competence in mathematics and English, meeting standards, reducing class size, overcoming the shortage of qualified teachers, finding after-school care, and censuring underachieving schools are discussed at least as much-or more--in the United Kingdom than they are in the United States. The BBC and the government broadcast professional commercials in which a famous person, for example, Paul McCartney, reminisces about a favorite teacher. The government rates schools, and the ratings appear in newspapers. Being scrutinized and meeting standards are accepted as part of the system.


Source: Mathematics Teacher, Nov2000, Vol. 93 Issue 8, p692, 3p 

Author(s): Davitt, Richard M.

In her article "The Changing Concept of Change: The Derivative from Fermat to Weierstrass," Grabiner (1983) notes the following:

Historically speaking, there were four steps in the development of today's concept of the derivative, which I list here in chronological order. The derivative was first used; it was then discovered; it was then explored and developed; and it was finally defined. That is, examples of what we now recognize as derivatives first were used on an ad hoc basis in solving particular problems; then the general concept lying behind these uses was identified (as part of the invention of calculus); then many properties of the derivative were explained and developed in applications to mathematics and to physics; and finally, a rigorous definition was given and the concept of derivative was embedded in a rigorous theory.

As Grabiner observes, the historical order of the development of the derivative is exactly the reverse of the usual order of textbook exposition, which tends to be formally deductive rather than intuitive and inductive. Grabiner's article contains a number of other well-articulated historical and pedagogical messages, and I strongly encourage every mathematics instructor to read it in its entirety. However, this article emphasizes only her use-discoverexplore/develop-define (UDED) paradigm to describe the derivative's evolution. This model is extremely useful for constructing accounts of the evolution of numerous mathematical concepts and theories in addition to the derivative. In various courses that I teach, I often ask my students to use UDED to compile their own accounts of the evolution of mathematical entities. Occasionally, I have also required students to report their findings to the class, but the final, structured account is usually intended for the individual student's benefit alone.

Such assignments have many advantages. By encouraging my students to refer to such reputable histories of mathematics as those cited in the bibliography in constructing their accounts, I introduce them to the history of mathematics in a manner that is not overwhelming. This same exercise helps students understand that because most historical accounts are somewhat subjective, students need to justify their historical claims by citing reliable sources. For example, by using the UDED paradigm, students can learn to appreciate the basis that an author uses to assert that Isaac Newton and G. W. Leibniz invented calculus, that Girolamo Cardano was the first to solve the general cubic equation, that Carl F. Gauss, Janos Bolyai, and Nikolai Lobachevsky invented non-Euclidean (hyperbolic) geometry, and the like. Furthermore, as Grabiner observes, students learn that creating mathematics is often incremental, inductive, and exciting and that our modern versions of mathematical theories are polished diamonds that started off as rough pieces of carbon.

When I heard a colleague in the physics department describe the scientific method as "the development of knowledge from observation of specifics to conjecture to experiment to theory," it dawned on me that the UDED paradigm is essentially nothing more than using the scientific, or experimental, method to describe how mathematical theories and concepts evolve. Fuzzy foreshadowings, false starts, and dead ends have occurred in developing scientific models before such modern theories as those of the atom, light, heat, electricity, evolution, and the cosmos have crystallized and have been accepted as legitimate scientific theories. Students need to see this connection of shared modi operandi in the evolution of both mathematics and the natural sciences.

The accounts that teachers and students write using UDED can be detailed, brief, or anywhere in between. At times, the "big picture" is precisely what students should absorb; at other times, a mini-term paper might be appropriate. In assigning the UDED account as a student project, the instructor can easily set the parameters for the UDED project.

One of my favorite abridged applications of the UDED model is using it to construct a brief chronicle of the acceptance of the principle of mathematical induction as a valid method of proof in mathematics. In the sixth century B.C.E., the Pythagoreans certainly used the ideas underlying this principle when, proceeding geometrically, they conjectured and accepted as "true" such number-theoretic patterns as theorem S, which states that the sum of the first n odd integers is equal to the nth square number (Burton 1999, pp. 91-93). Francesco Maurolico gave the first formal inductive proof in the history of mathematics when he proved theorem S by induction; his proof (discovery) can be found in his work Arithmeticorum Libri Duo, published in 1575, the year of his death (Burton 1999, p. 426). In the next century, Blaise Pascal explored and developed the technique of mathematical induction in connection with his work on the arithmetic triangle and its applications (Burton 1999, pp. 418-28). Although John Wallis and Augustus De Morgan helped name this procedure induction, only in the latter part of the nineteenth century did Richard Dedekind--and then Gottlob Frege and Giuseppe Peano--define it mathematically. When formulating their sets of categorical properties for the natural numbers, each included the principle of mathematical induction or one of its logical equivalents as an axiom (Katz 1998, pp. 735-37).


The UDED model can also be used to describe the evolution of the complex numbers, a more commonplace high school mathematical topic than induction. Girolamo Cardano and other sixteenth-century Italian algebraists reluctantly began to use complex numbers when they saw that negative values appearing under the radical sign in the Cardano-Tartaglia formulas for solving specific cubic equations sometimes corresponded to recognizable real roots and when Cardano attempted to solve the problem of dividing 10 into two parts such that the product is 40. In Ars Magna, his famous algebra text of 1545, Cardano showed by "completing the square" that the two parts must be 5 + Square root of -15 and 5 - Square root of-15. Although he checked that these answers formally satisfied the conditions of the problem, he still regarded them as being "fictitious" and useless; he was only halfheartedly using complex numbers.

A generation later, Raphael Bombelli discovered the complex numbers in analyzing the "irreducible case" of the cubic equation when all three roots are real and nonzero and yet negative values always appear under the radical when a Cardano-Tartaglia type formula is used. When he published his treatise Algebra in 1572, he became the first mathematician bold enough to accept the existence of "imaginary," or complex, numbers and to present an algebra for working with such numbers. He assumed that they behaved like other numbers in calculation and proceeded to manipulate them formally, with Square root of -a x Square root of -a = -a for a > 0 being his key observation.

During the next three centuries, many mathematicians explored and developed various aspects of the complex, that is, imaginary, numbers. For example, in conjunction with their formative work in analytic geometry, calculus, and algebra, such mathematicians as Rene Descartes, Isaac Newton, G. W. Leibniz, Leonhard Euler, Jean d'Alembert, Carl F. Gauss, and Bernhard Riemann all employed complex numbers in describing their theories of equations, formulating the general logarithmic and exponential functions, and devising analytic tools for modeling and solving real-world problems. Casper Wessel, Jean Argand, and Carl F. Gauss contributed a crucial development to accepting and understanding the nature of complex numbers when they began to represent them geometrically in the real plane, much as we do today.

Finally, William Rowan Hamilton established the theory of complex numbers on a firm mathematical footing when he defined them in terms of ordered pairs of real numbers in almost the same way that modern textbooks define them. This definition and his rules for performing arithmetical calculations with his ordered pairs can be found in his 1837 paper "The Theory of Conjugate Functions, or Algebraic Couples; with a Preliminary and Elementary Essay on Algebra as the Science of Pure Time." Additional details concerning this UDED account of the evolution of the complex numbers can be found in Burton (1999) and Katz (1998).


The UDED paradigm can also be used to construct brief accounts of the evolution of such entire branches of mathematics as Euclidean geometry. Most ancient peoples used formulas to calculate the areas of simple rectilinear figures and to approximate the circumference and areas of circles. For example, the early Egyptians, Babylonians, and Chinese used algorithms to compute the volumes of rectangular blocks, cylinders, and pyramids. Furthermore, the latter two civilizations discovered the general Pythagorean theorem and used it in geometrical and astronomical applications. These civilizations had no real notion of an axiomatic system on which they could base "proofs" of their geometric formulas and theorems. As most students do today, they accepted their geometrical results on the basis of diagrams and intuition and often did not even distinguish between exact and approximate answers.

From the sixth century B.C.E. to the beginning of the third century S.C.E., Thales, Pythagoras, Eudoxus, Plato, Aristotle, and other Greek mathematicians and philosophers shaped mathematics into a deductive, axiomatic science and discovered Euclidean geometry. Around 300 B.C.E., Euclid compiled their accumulated discoveries in geometry and number theory and presented them axiomatically in his famous book, the Elements.

Over the next two millennia, Euclidean geometry was explored and developed by mathematicians from virtually every society that learned of the Elements. Such additional mathematical advances occurred as Archimedes' replacement of the Euclidean theorem "The areas of circles are to one another as the squares on their diameters" with a proof of the precise Babylonian formula "The area of any circle is equal to the area of a right triangle in which one of the legs is equal to the radius and the other to the circumference" (equivalent to the modem formula area = pi r[sup 2]). However, the principal explorations and developments did involve repeated attempts to prove that Euclid's fifth, or parallel, postulate followed as a theorem from his other four more self-evident postulates and his common notions. The celebrated attempts of Proclus, ibn al-Haytham, John Wallis, Girolamo Saccheri, Adrien-Marie Legendre, Johann Lambert, and untold others were doomed to failure because--as we now know from the work of Janos Bolyai, Carl F. Gauss, and Nikolai Lobachevsky in the early nineteenth century--Euclid was indeed on sound logical ground when he made his parallel postulate an axiom for his geometry. It is logically independent of his other four.

Finally, at the very end of the nineteenth century, David Hilbert completely and logically defined Euclidean geometry in his classic monograph Foundations of Geometry (1899). Hilbert began his treatment of Euclidean geometry by postulating three undefined terms (point, line, and plane) connected by three undefined relations--incidence (on), order (betweenness), and congruence. He then offered a set of twenty-one axioms on which a logically consistent and complete treatment of Euclidean geometry could be based. In axiomatic studies of Euclidean geometry today, authors often distill Hilbert's collection of twenty-one axioms down to a set of fifteen logically independent axioms by combining related ones and deleting those that are implied by the others.

The principal pedagogical message here is that anyone purporting to offer high school geometry students a complete, deductive study of Euclidean geometry will fail. NCTM's curricular standards and recommendations indicate that a school geometry course should emphasize discovery, applications, and a representative sample of truly accessible proofs of such theorems as the Pythagorean theorem. Additional details concerning this UDED account of the evolution of Euclidean geometry can be found in Burton (1999) and Katz (1998).


Topics in addition to those already noted to which the UDED paradigm can be applied without unduly forcing the issue include the evolution of the concept and theory of a function, limit, infinite series, the integral, the number zero, negative numbers, real numbers, the theory of equations, and numerical procedures. It can be applied to describing the evolution of such entire branches of mathematics as non-Euclidean geometry, analytical geometry, and algebra (both manipulative and structural); such subareas of modern algebra as group theory; and trigonometry.

I encourage classroom teachers of mathematics to use Grabiner's generic paradigm both as a tool for their own acquisition of authentic historical accounts of the evolution of mathematical topics and as a pedagogical stratagem for their students to do the same.


Source: Mathematics Teacher, Oct2000, Vol. 93 Issue 7, p604, 3p 

Author(s): Bradley, Sean

Everyone loves the Fibonacci sequence. It is easy to describe, yet it gives rise to a vast amount of substantial mathematics. Physical applications and connections with various branches of mathematics abound. What could be better, unless someone told us that the Fibonacci sequence is but one member of an infinite family of sequences that we could be discussing? The generalization that follows has great potential for student and teacher exploration, as well as discovery, wonder, and amusement.

The Fibonacci sequence is defined by the recurrence relation F[sub 0] = 0, F[sub 1] = 1, and F[sub n + 1] - F[sub n] + F[sub n - 1], for all integral n is greater than or equal to 1. The Fibonacci numbers can be generalized in various ways. Horadam (1965) furnishes one example. He defines a collection of sequences that depend on the real numbers a and b, as well as arbitrary integers k and q, as follows: we let w[sub 0] = a, W[sub 1] = b, and W[sub n + 1] = k x w[sub n] - q x W[sub n - 1]. The Fibonacci sequence has a = 0, b = 1, and q = -1. For example, 8 = 1 x 5 - (-1) x 3.

A subset of these sequences is interesting enough to deserve wider recognition among teachers and students of mathematics. We consider Horadam's sequences with w[sub 0] = 0, w[sub 1] = 1, and W[sub n + 1] = k x W[sub n] + W[sub n - 1] for all n is greater than or equal to 1. That is, instead of adding two consecutive terms to find the next term, as in the Fibonacci sequence, we first multiply the current last term in the list by k, then add the result to the next-to-last term. When k = 1, the result is just the ordinary Fibonacci sequence. Table 1 gives the first few terms of several generalized Fibonacci sequences.

We begin by investigating a few properties of this infinite collection of sequences, bringing along only the quadratic formula.


For many, the attractions of the Fibonacci sequence are the many elegant identities that it satisfies and the curious properties that it possesses. This article offers eight to illustrate. For more, see Hoggatt (1969) or a variety of other sources. The first five properties follow:

(F1) The GCD of F[sub n] and F[sub n + 1] is I for all integral n is greater than or equal to 0.

(F2) F[sub n] divides F[sub n x m] for all positive integers m, for all integral n > 0.

(F3) F[sub n, sup 2] - F[sub n - 1] x F[sub n + 1] = (-1)[sup n + 1] for all integral n is greater than or equal to 1.

(F4) F[sub 2, sub 2] + F[sub n + 1, sup 2] = F[sub 2n + 1] for all integral n is greater than or equal to 0.

(F5) {F[sub 1] + F[sub 3] + F[sub 5] + ... + F[sub 2n - 1] = F[sub 2] {F[sub 2] + F[sub 4] + F[sub 6] + ... + F[sub 2n] = F[sub 2n + 1] - 1

The first four statements are still true if any of the generalized Fibonacci sequences W[sub n] replace the Fibonacci sequence F[sub n]. Property (5) needs only the minor modification

(F5a) {w[sub 1] + w[sub 2] + ... + w[sub 2n - 1] = w[sub 2n]/k {w[sub 2] + w[sub 4] + ... + w[sub 2n] = w[sub 2n + 1] - 1/k

These extensions of Fibonacci properties convince us that these sequences are special and are worthy of further investigation. We can prove them by modifying the standard proofs of statements (F1) to (F5) for the Fibonacci sequence. See, for example, Hoggatt (1969). As an example, we prove property (F3) for the generalized sequences. That is, we prove

(F3a) w[sub n, sup - 2] - w[sub n - 1] x w[sub n + 1] = (-1)[sup n + 1],

for all integral n is greater than or equal to 1, where W[sub n + 1] = k x w[sub n] + w[sub n - 1], using mathematical induction.

We first note that when n = 1, the identity holds, since w[sub 1, sup 2] - w[sub 0] x w[sub 2] = 1 - 0 = (-1)[sup 2]. We assume that statement (F3a) is true for a particular value of n. Adding the quantity k x w[sub n] x w[sub n + 1] to each side of statement (F3a) and simplifying give

[Multiple line equation(s) cannot be represented in ASCII text]

Thus, statement (F3a) holds for n + 1 as well. The proof is complete.


The golden ratio is the number

phi = 1 + Square root of 5/2

= 1.6180339887....

It arises in a variety of geometric contexts as a length or a ratio of lengths. See, for example, section 4 of Hoggatt (1969). We focus on a surprising property of the number phi, namely,

(F6) 1/phi = phi - 1,

and two connections between phi and the Fibonacci sequence,

(F7) [Multiple line equation(s) cannot be represented in ASCII text]


(F8) phi[sup n] = F[sub n] x phi + F[sub n - 1].

Property (F6) states that phi and its reciprocal differ only by 1, an integer, even though each is an irrational number with nonrepeating, nonterminating decimal expansion. An impressive way to demonstrate this property to students is to ask them to enter phi as

1 + Square root of/2

in their calculators, then use the reciprocal key. The decimal part does not change. Properties (F7) and (F8) hint at a complex intertwining of the golden ratio and the Fibonacci sequence. In particular, property (F7) tells us that the ratios of successive Fibonacci numbers,

1/1, 2/1, 3/2, 5/3, 8/5, 13/8, 21/13, 34/21,...,

approach the golden ratio phi. Property (F8) relates that every positive integral power of phi is a multiple of phi plus a constant, and these constants come from the Fibonacci sequence. For example,

phi[sup 2] = 1 x phi + 1, phi[sup 3] = 2 x phi + 1, phi[sup 4] = 3 x phi + 2,....

Let us see how properties (F6) to (F8) relate to the generalized Fibonacci sequences. We begin by considering how to verify property (F7). Since the terms of the Fibonacci sequence are defined by F[sub n + 1] = F[sub n] + F[sub n-1], the ratios of successive terms in the Fibonacci sequence are given by the equation

(1) F[sub n + 1]/F[sub n] = F[sub n] + F[sub n - 1]/F[sub n] = 1 + F[sub n - 1]/F[sub n],

for integral n is greater than or equal to 1. As n gets larger and larger, the ratio on the left-hand side of equation (1) approaches a limit, Which we call r. A calculus class can derive a proof. That is,

(2) F[sub n + 1]/F[sub n] arrow right r

as n arrow right Infinity. Then the ratio on the far right-hand side of equation (1) must approach 1/r, since it is a ratio of Fibonacci numbers in the reverse order of those on the left-hand side of equation (1). In other words,

(3) F[sub n - 1]/F[sub n] arrow right 1/r

as n arrow right Infinity. Putting equations (2) and (3) together with equation (1), we get

(4) [Multiple line equation(s) cannot be represented in ASCII text]

as n arrow right Infinity. It follows that r[sup 2] = r + 1 , or r[sup 2] - r - 1 = 0. By using the quadratic formula to solve this last equation, we arrive at the positive value of

r = 1 + Square root of 5/2

= phi,

the golden ratio. Equation (4) actually proves property (F6), since now we know that phi = r.

What happens if we repeat this process with any of the generalized Fibonacci sequences? We first consider some examples. Using the entries in table 1 when k = 2, we see that the successive ratios of this generalized Fibonacci sequence are

When k = 3, the ratios are

2/1, 5/2, 12/5, 29/12, 70/29, 169/70, 408/169,...,

with decimal approximations

2, 2.5, 2.4, 2.416, 2.41379..., 2.41428..., 2.41420..., ....

When k = 3, the ratios are

3/1, 10/3, 33/10, 109/33, 360/109, 1189/360, 3927/1189,...,

with decimal approximations

3, 33, 3.3, 3.30, 3.30275..., 3.3027, 3.30277...,....

Each sequence of ratios seems to be approaching a definite number. What numbers are they?

We can proceed in a manner analogous to the way that we derived equation (4). In the case in which k = 2, we have

(5) w[sub n + 1]/w[sub n] = 2 x w[sub n] + w[sub n - 1]/w[sub n] = 2 + w[sub n - 1]/w[sub n].

If the ratios on the left-hand side are converging to r, then the ratios on the far right-hand side of this equation are approaching 1/r. As n arrow right Infinity, equation (5) becomes

r = 2 + q/r.

Solving for r, we find that the positive value for r is

2 + Square root of 2[sup 2] + 4/2 = 1 + Square root of = 2.4142135624....

When k = 3, equation (5) becomes

r = 3 + 1/r,

with positive solution

r = 3 + Square root of 3[sup 2] + 4/2

= 3 + Square root of 13/2

= 3. 3027756377....

In general, for the sequence w[sub n] defined by w[sub n + 1] = k x w[sub n] + w[sub n - 1], we find that the ratios of successive terms in a generalized Fibonacci sequence approach r[sub k], which is the solution of

(F7a) r[sub k] = k + 1/r[sub k].

These numbers, which we call nearly golden ratios, are given by the formula

r[sub k] = k + Square root of k[sup 2] + 4/2.

Just like the golden ratio, phi, each of the numbers r[sub k] differs from its reciprocal by an integer because (F7a) can be rewritten as

1/r[sub k] = r[sub k] - k.

Letting k run through all nonnegative integers, we have a complete list of positive real numbers whose reciprocals have the same decimal part as the numbers themselves, resulting in an interesting exercise for students. The proof begins with equation (F7a). Table 2 gives some examples.

What about property (F8)? Its more general form is

(F8a) r[sup n, sub k] = w[sub n] x r[sub k] + W[sub n -1].

I leave it to the reader to verify the proof using induction.


This brief introduction is not nearly the whole story. We do not know the whole story about the Fibonacci sequence itself, so how could we give a complete account of these Fibonacci-like sequences. Properties (F1) through (F8) are merely a sampler of some of the well-known Fibonacci properties, identities, and connections that the newer Fibonacci-like sequences satisfy. I hope that you and your students are motivated to set off on the trail of more.

TABLE 1 Generalized Fibonacci Sequences

Legend for Chart:

A - k

B - The First Few Terms of w[sub n]

A                     B

2     0, 1, 2, 5, 12, 29, 70, 169, 408, 985, 2378, 5741,...

3     0, 1, 3, 10, 33, 109, 360, 1189, 3927, 12 970,...

4     0, 1, 4, 17, 72, 305, 1292, 5473, 23 184, 98 209,..

5     0, 1, 5, 26, 135, 701, 3640, 18 901, 98 145,...

TABLE 2 The First Four Nearly Golden Ratios and Their Reciprocals

Legend for Chart:

A - k

B - r[sub  k]

C - 1/r[sub k]

A            B                     C

1     1.6180339887 ...      0.6180339887 ...

2     2.4142135623 ...      0.4142135623 ...

3     3.3027756377 ...      0.3027756377 ...

4     4.2360679775 ...      0.2360679775 ...

5     5.1925824035 ...      0.1925824035 ...


Source: Mathematics Teacher, May2000, Vol. 93 Issue 5, p364, 7p, 6 diagrams, 29c

 Author(s): Natsoulas, Anthula

THROUGHOUT HISTORY, different cultures have produced designs to be used as ornamentation, as part of ceremonies, and as religious symbols. Many of these designs are mathematical in nature, and their bases are often the transformations of reflection and rotation in the plane. The images form groupings that appear to have an underlying unity. Thus, history and art merge to create a medium through which students can study the concrete operations of reflection and rotation in the plane, as well as the more abstract concept of symmetry groups. The resulting patterns give students a sense of the potential for creativity inherent in mathematics. Exploring group symmetries within the context of such designs furnishes enriching experiences, connects art and history to mathematics, enhances the understanding of transformations in the plane, and shows the common underlying structure of algebra and geometry. Students should have the opportunity to see connections within mathematics and between mathematics and the various arenas of human activity and should develop an understanding of the types of reasoning that form the basis of mathematical thought.

All cultures participate in the six mathematical activities of counting, locating, measuring, designing, playing, and explaining (Bishop 1988); but designing results in some of the richest and most diverse outcomes. The beautiful designs created by different cultures mirror the uniqueness of their histories. Peoples of the Eastern and Western worlds have used mathematical ideas to create patterns in woven fabrics; ornamentation for religious objects and places of worship; and adornment of the walls, floors, and ceilings of the homes of nobles. A significant amount of mathematics, including the principles of symmetrical relationships, is implicit in such designs.

This article focuses on two types of symmetries--rotation and reflection, their underlying structure as a mathematical group, and their presence in the designs of diverse cultures. Patterns created by applying these symmetry operations offer students a visual image of closure, identity, inverse, and associativity, which form the axiomatic basis of algebra. Through patterns, this article intuitively develops the concept of symmetry groups and gives formal definitions of rotation and reflection symmetry and symmetry groups.

The design examples in this article focus primarily on those of Cyprus and Ethiopia, two nations whose mathematical art is not well known. The mosaics of Cyprus, typical of those found throughout the Roman world, date to between the fourth and eighth centuries C.E. and contain many intricate geometric patterns. It is believed that at one time designs for mosaics were collected in pattern books.

The form of Christianity introduced in Ethiopia in the first half of the fourth century and the art forms that developed from it became an integral part of the lives of its people. The Ethiopians developed elaborately designed crosses that they used both as jewelry and in religious processions. In the town of Lalibela, an important center of medieval Ethiopia, several rock-hewn churches built during the thirteenth century include geometric patterns.


A symmetry is defined to be a motion of an object such that the appearance of the object is unchanged. A reflection symmetry is determined by a line, called the line of reflection, through which the original object is reflected. For each point of the original object, its distance to the line is the same as the distance of its corresponding image point. A rotation symmetry is determined by a rotation of the object around a fixed point called the rotocenter. The amount of rotation can be expressed as a fraction of a full turn or by the degrees of rotation in a counterclockwise direction.

Figure 1 includes a range of designs from Ethiopia and Cyprus that display different kinds of symmetry. The teacher can ask students to group those items that appear to have the same kinds of symmetry. A set of objects that have the same kinds of symmetry belongs to the same symmetry group. Thus, in figure 1, items (a) and (d) both belong to the same symmetry group, since rotations of 180 degrees or reflections around a vertical or horizontal line through the center return the design to its original appearance. Similarly, the interior part of the cross in item (e) and the circular portion of item (f) belong to the same symmetry group, since both exhibit 90 degree rotation symmetry. In like manner, the two designs enclosed within the circles in item (c) belong together, since both exhibit 60 degree rotation symmetry. The reader should explore the various reflection and rotation symmetries of item (b).

The Ethiopian cross, excluding its base, in figure 2 has both reflection and rotation symmetry. It contains four lines of reflection. If the figure is rotated through a one-quarter turn, a one-half turn, or a three-quarter turn--or equivalently, 90 degrees, 180 degrees, and 270 degrees, respectively--the appearance of the object remains unchanged. These symmetries are shown in figure 3 with a second Ethiopian cross, again excluding the base shown at the top of the figure. Since the figures are hand carved, the curved lines may not all line up precisely, but the artist clearly had such symmetries in mind when creating the figure.

The mosaic design shown in figure 4 (p. 366) is from Kourion in Cyprus; it contains the same symmetries as the Ethiopian cross. Students can test the rotation symmetries with a piece of tracing paper on which a coordinate axis is drawn or with two overhead transparencies. They can place the origin at the rotocenter on top of a copy of the design, trace an outline of one of the arms, and rotate the paper or transparency to show the symmetry.


The square demonstrates the same rotation and reflection symmetries as the Ethiopian cross and the Cypriot mosaic design. See figure 5. A description of the symmetry motions can be simplified. For the square, instead of thinking of the one-quarter turn, one-half turn, and three-quarter turn as different motions, the one-quarter turn can be considered as the unit motion. Thus, the one-half turn is the one-quarter turn applied twice, and the three-quarter turn is the one-quarter turn applied three times. Students can verify this result by manipulating tracing paper or transparencies as previously described. In general, the smallest rotational symmetry of an object is represented by r and successive rotations by r[sup 2], r[sup 3], r[sup 4], and so on. For the square, r is the one-quarter turn and r[sup 2], r[sup 3], and r[sup 4] represent turns of two-quarters, or one-half; three-quarters; and four-quarters, respectively.

Although more than one line of reflection often exists, specifying only one line is sufficient. Reflection with respect to this line can be represented by m. The remaining reflections can then be created by combining rotation and reflection motions. In general, a sequence of r's and m's indicates that these symmetry motions are applied sequentially to an object, with the order in which they are applied being read from right to left. In this article, the symbol "diamond" is used to indicate the sequential application of motions. For the square, we can define the line of reflection to be the vertical one, as shown in figure 6. The sequence r diamond m indicates a reflection through this line followed by a rotation of one-quarter turn counterclockwise, which is equivalent to a reflection through the original diagonal AC. This sequence is shown in figure 7.


A symmetry group is a special case of a mathematical group, but great diversity exists among the members of any one symmetry group. In spite of the differences, the implicit mathematical characteristics that determine group membership allow even the untrained eye to recognize the unity. Figure 8 shows examples from Ethiopia and Cyprus that are members of one symmetry group; all the designs contain exactly four rotation symmetries and exactly four reflection symmetries. For item (c), consider the inner cross. For items (e), (f), and (g), consider only the outlines and not color or internal design variations. Members of a symmetry group that contains only the four rotation symmetries of the square are shown in figure 9 (p. 368). For the mosaic design, consider only the pattern outline and not color variations.


A complete discussion of symmetry groups includes two additional operations that can be applied to a figure. The identity symmetry motion, denoted by "1," leaves the original figure unchanged. An inverse symmetry motion returns the object to the original figure. In the square, for the basic rotation unit r of one-quarter turn counterclockwise, the inverse rotation is denoted by r[sup -1] and is a three-quarter turn counterclockwise. Thus, r[sup -1] diamond r = 1; that is, applying a counterclockwise three-quarter turn after applying a counterclockwise one-quarter turn leaves the original figure unchanged.


The equilateral triangle contains symmetries analogous to the reflection and rotation symmetries of the square. Figure 10 shows examples of designs that contain only threefold rotation symmetry. The rotocenter is the point of intersection of the angle bisectors of the triangle; the unit of rotation, r, is a one-third turn, or 120 degrees. A rotation of two-thirds of a turn, or 240 degrees, is represented as r[sup 2]. Figure 11 is a sketch of the triangle showing these rotation symmetries. The equilateral triangle also contains three lines of reflection, as shown in figure 12. Students can convince themselves that these lines are lines of reflection by drawing the lines on an equilateral triangle, cutting out the triangle, and folding along the lines.

Many designs that contain threefold rotation symmetry also contain the reflection symmetries of the equilateral triangle. To illustrate both rotation and reflection symmetries combined, a figure that has reflection symmetry through its center is placed in each third of the triangle, as in figure 13. The mosaic design shown in figure 14 is from Kourion in Cyprus; it illustrates the rotation symmetry of the equilateral triangle. The original is not well preserved, but the intended threefold symmetry of the pattern is evident. All figures that contain both the rotation and reflection symmetries of the equilateral triangle belong to a single symmetry group; see figure 15 for examples.

As with the square, indicating one line of reflection and one rotation is sufficient for the equilateral triangle. This line of reflection, m, can be the perpendicular bisector drawn from vertex A in the originating A ABC, as shown in figure 12. The rotation unit of 120 degrees is represented by r. If the lines of reflection m, m[sub 1], and m[sub 2] remain fixed and do not change position as the triangle is rotated, reflection in line m[sub 1] can be expressed as "m r," that is, a one-third turn followed by a reflection in m. Similarly, reflection in line m[sub 2] can be expressed as "m diamond r[sup 2]," that is, two one-third turns and reflection in m. Thus, all symmetries of the equilateral triangle can be expressed as a set of six motions in terms of r and m: [1, r, r[sup 2], m, mr, mr[sup 2]}. The symbol "diamond" can be omitted when the meaning of the sequence of motions is clear. Students should convince themselves that this set of six motions expresses all symmetries contained within the equilateral triangle. Again, a model with tracing paper can make the experience more concrete.


Students in advanced classes can explore consecutive applications of the symmetry motions in more depth and in abstract form. These applications can be related to a mathematical group that is a collection of elements and an operation applied to the elements that satisfy the following characteristics: (1) the set of elements is closed with respect to the defined operation; (2) an identity element exists; (3) for each element in the set, an inverse element exists; and (4) the operation is associative. Taking as the set of elements the symmetry motions of the equilateral triangle and the operation diamond as the application of the motion read from right to left, table ! (p. 370) shows the outcomes of applying diamond to the set {1, r, r[sup 2], m, mr, mr[sup 2]} with itself. The convention for reading the order of operations is row by column.

The outcomes in table 1 can be simplified to the symmetries shown in table 2 (p. 370). Students can verify that the equilateral triangle with the set of six symmetries and the operation of diamond satisfy the properties of a mathematical group. The outcomes from combining the six symmetries can be written in terms of the original set of symmetries. A study of the table verifies the properties of closure, identity, and inverse. Associativity can be explored by considering a number of examples, such as (m diamond r[sup 2])diamond r = m diamond (r[sup 2] diamond r) = m. Students can conclude that the property of associativity appears to hold, even though it has not been proved.


Symmetry groups can also help students see mathematics as a human activity that overcomes the sterility that is sometimes associated with it. The mathematical developments shown in this article offer an opportunity to develop an interdisciplinary unit among the mathematics, social studies, and art teachers. For the mosaics of Cyprus, the historical link could be studying the Roman world during classical and early medieval times. A link with art could include studying how mosaic designs are created on paper and transferred to tiles or onto pavement. Students could create their own mosaic designs on graph paper and then render them onto unit squares of wood or cardboard with small colored tiles set into mastic, using grout to fill in any remaining spaces. For the crosses of Ethiopia, the historical link could be studying the adaptation of Christianity by an African culture.

Students can also create their own designs to illustrate the different types of group symmetry. By working collaboratively with a defined set of symmetries, each group of students can create a design to illustrate the given set of symmetries. Graph paper, straightedges, and compasses are all that are needed, although computer software can serve as a modern tool. The differences among the resulting designs illustrate the common underlying mathematical concepts and the potential for diversity in their interpretation.


A mathematical group is often difficult for students to understand. Symmetry groups furnish a visual image for this abstract concept and a cultural environment in which it can be embedded. The designs that are members of any one symmetry group are both the same and different. The similarities exist because of the universality of the underlying mathematical principles; the differences exist because of the differences in the cultures that produce them. The mosaic designs of Cyprus and the religious art of Ethiopia are radically different with respect to the media that were used to create them and the uses to which they were put. But the significant mathematics that is at the base of their creation is the same and should not be taken lightly. As Stevens (1996, 168) quotes Herman Weyl,

[o]ne can hardly overestimate the depth of geometric imagination and inventiveness reflected in these patterns. Their construction is far from being mathematically trivial. The art of ornament contains in implicit form the oldest piece of higher mathematics known to us.

The visual images that lead to an informal definition of the concept of a symmetry group can lay the foundation for more formal definitions and higher levels of abstraction. For all students, the examples shown can provide a concrete visual image and intuitive notion of the mathematical unity that underlies a mathematical group.

TABLE 1 Application of Consecutive Motions of Symmetries of the Equilateral Triangle

Legend for Chart:

A - diamond

B - 1

C - r

D - r[sup 2]

E - m

F - mr

G - mr[sup 2]

A            B             C               D

            E             F               G

1            1             r              r[sup 2]

            m             mr             mr[sup 2]

r            r             r[sup 2]       r[sup 3]

            rm            rmr            rmr[sup 2]

r[sup 2]     r[sup 2]      r[sup 3]       r[sup 4]

            r[sup 2]m     r[sup 2]mr     r[sup 2]mr[sup 2]

m            m             mr             mr[sup 2]

            mm            mmr            mmr[sup 2]

mr           mr            mr[sup 2]      mr[sup 3]

            mrm           mrmr           mrmr[sup 2]

mr[sup 2]    mr[sup 2]     mr[sup 3]      mr[sup 4]

            mr[sup 2]m    mr[sup 2]mr    mr[sup 2]mr[sup 2]

TABLE 2 Application of Consecutive Motions of Symmetries of the Equilateral Triangle Simplified

Legend for Chart:

A - diamond

B - 1

C - r

D - r[sup 2]

E - m

F - mr

G - mr[sup 2]

A            B             C               D

            E             F               G

1            1             r               r[sup 2]

            m             mr              mr[sup 2]

r            r             r[sup 2]        1

            mr[sup 2]     m               mr

r[sup 2]     r[sup 2]      1               r

            mr            mr[sup 2]       m

m            m             mr              mr[sup 2]

            1             r               r[sup 2]

mr           mr            mr[sup 2]       m

            r[sup 2]      1               r

mr[sup 2]    mr[sup 2]     m               mr

            r             r[sup 2]        1

Fig. 1 Symmetrical designs from Cyprus and Ethiopia


Source: Mathematics Teacher, Dec99, Vol. 92 Issue 9, p786, 7p, 1 chart, 12 diagrams Author(s): Ryden, Robert

High school mathematics teachers are always looking for applications that are real and yet accessible to high school students. Astronomy has been little used in that respect, even though high school students can understand many of the problems of classical astronomy. Examples of such problems include the following: How did classical astronomers calculate the diameters and masses of Earth, the Moon, the Sun, and the planets? How did they calculate the distances to the Sun and Moon? How did they calculate the distances to the planets and their orbital periods? Many students are surprised to learn that most of these questions were first answered, often quite accurately, using mathematics that they can understand.

The NCTM's Standards stress the importance of connections among various branches of mathematics and between mathematics and other disciplines; the astronomy problems that follow combine algebra, geometry, trigonometry, data analysis, and a bit of physics. My geometry and algebra students have seen most of these problems and could understand them. They have also been able to experience making distance measurements themselves by using the method of parallax, which is explained in this article.


By the third century B.C.E., many scientists were convinced that Earth was spherical. One clue was that during an eclipse of the Moon, the edge of Earth's shadow always appeared to be an arc of a circle. Because of the belief that Earth was spherical, much discussion occurred about how to measure its circumference.

Eratosthenes, who was director of the great library at Alexandria, Egypt, found the first successful method. He had learned that at noon on the day of the summer solstice, in Syene, in southern Egypt, the bottom of a well was illuminated by the Sun; therefore, the Sun was directly overhead there. In Alexandria, in northern Egypt, the Sun was not directly overhead on that day. Any vertical pole casts a shadow. By measuring a pole's shadow and using the ratio of the shadow's length to the pole's height, as shown in figure 1, Eratosthenes was able to calculate Theta, the Sun's angle away from the vertical. Figure 2 shows how he used that information: Reasoning that the Sun's rays striking Alexandria were essentially parallel to those striking Syene, he realized that his angle Theta was the same as the difference in latitude between the two cities. Knowing the distance, D, between them, he was able to calculate the full circumference of Earth. His measure for Theta was 7 Degree 12', which is one-fiftieth of a complete circle.

Because caravans could cover the distance between the cities in fifty days, traveling at the rate of one hundred stadia a day, he assumed that the distance between the cities was five thousand stadia and that the circumference of Earth was therefore 50 x 5000, or 250 000, stadia. The actual length of a stadium in modern units is not known, but it is believed to have been about one-tenth of a mile, which makes Eratosthenes' value for the circumference agree remarkably well with the value accepted today.


Also in the third century B.C.E., Aristarchus of Samos measured the ratio of the Sun's distance from Earth to the Moon's distance from Earth by using a method illustrated in figure 3 (Abell 1964). He reasoned that at the first and third quarters of the Moon, the angles EM[sub 1]S and EM[sub 3]S must be right angles. All he needed was angle M[sub 1] EM[sub 3], and either a scale drawing or trigonometry would give him the distance ratio that he wanted. He assumed that the Moon's orbit is circular, that its orbital velocity is uniform, that the Sun is sufficiently near that angle M[sub 1]EM[sub 3] is measurably different from 180 degrees, and that he could observe the instants of first and third quarter sufficiently accurately. All his assumptions were incorrect, but his method makes sense in principle. He determined, inaccurately, that first quarter to third quarter took about one day longer than third quarter to first quarter. With this information and the length of the month, he determined that M[sub 1]ES was about 87 degrees and that the distance from Earth to the Sun was therefore about twenty times larger than the distance from Earth to the Moon.


In the sixteenth century, Copernicus, who had proposed the heliocentric theory of the Solar System, calculated the orbital periods of the planets and their distances from the Sun. He was able to give distances only in terms of the distance from Earth to the Sun. This Earth-to-Sun distance is called the astronomical unit (AU). For example, he found that the distance from Mars to the Sun was 1.5 AU--he could not give this distance in miles or other terrestrial units because he did not know the size of the AU in those units. The following paragraphs give Copernicus's methods for periods and distances, but the problem of the size of the AU was not solved until long after his time.

The orbital period of a planet, the time required for it to complete an orbit relative to the "fixed" stars, is called the sidereal period. We could determine the sidereal period easily if we could observe from a fixed point far outside the Solar System. Since we must instead observe from a moving platform, Earth, we must infer the sidereal period from the synodic period, which is the interval of time between one alignment of Sun, Earth, and a planet and the next equivalent alignment. Figures 4 and 5 (Abell 1964) illustrate how Copernicus determined sidereal periods from synodic periods. The procedure for inferior planets, that is, those closer to the Sun than Earth, differs slightly from that for superior planets, that is, those that are farther away.

Figure 4 shows Earth with Venus, an example of an inferior planet. At position 1, Earth (E[sub 1]), Venus (V[sub 1]), and the Sun are collinear. This orientation is easy to observe from Earth. After one sidereal period, Venus has made one orbit and returned to position V[sub 2] = V[sub 1]; but in that time Earth has moved to E[sub 2], so we cannot directly observe that Venus has completed an orbit. Venus catches up with Earth at position 3. One synodic period has elapsed since position 1 because the two planets are again collinear. From E[sub 1] to E[sub 3], Earth has made N orbits, and N (Earth) years have therefore elapsed, which is the synodic period of Venus. In general, N will not be an integer. In the same amount of time, Venus has made N + I orbits. The sidereal period, S, of Venus is the time for one orbit; that is,

S = time/number of orbits

= synodic period/number of orbits between alignments

= N Earth years/N + 1 orbits

= N/N + 1 Earth years/orbit.

Figure 5 shows Earth with Mars, an example of a superior planet. Both planets begin at position 1, where they are collinear with the Sun. Earth completes an orbit and returns to position E[sub 2] - E[sub 1], then catches Mars at position 3, where the planets and the Sun are again collinear; and one synodic period has elapsed. From position I to position 3, Earth has made N orbits; therefore, N (Earth) years have elapsed. This time, N will probably be greater than 1. In the same amount of time, Mars has made only N - 1 orbits. As with Venus, the sidereal period, S, is the time for one orbit; that is,

S = synodic period/number of orbits between alignments

= N Earth years/N - 1 orbits

= N/N - 1 Earth years/orbit.

For example, Jupiter's synodic period is 1.094 Earth years; S -1.094/(1.094 - 1) = 11.6 years.

Copernicus found orbital radii of inferior planets by using the idea illustrated in figure 6 (Abell 1964). When the planet is at greatest elongation, which is the maximum angular separation in the sky of a planet and the Sun, then angle EPS must be a right angle because the line of sight, EP, is tangent to the planet's orbit. If angle PES is measured, PS can be found by scale drawing or by trigonometry. As previously mentioned, PS will be expressed in terms of ES, the astronomical unit.

The orbital radius of a superior planet is a little more complicated to determine. Figure 7 (Abell 1964) illustrates Copernicus's reasoning. Position 1 is called opposition because when the planet is viewed from Earth, the planet is exactly opposite the Sun in the sky. Position 2, where the planet and the Sun are 90 degrees apart in the sky, that is, angle P[sub 2]E[sub 2]S - 90 Degrees, is called quadrature. Copernicus timed the interval between opposition and quadrature; because he knew the sidereal periods of Earth and the planet, he could determine the angles P[sub 1]SP[sub 2] and E[sub 1]SE[sub 2] as fractions of complete orbits. Angle P[sub 2]SE[sub 2] followed by subtraction; and then PS could be determined, again in terms of ES, the astronomical unit. For example, the time from opposition to quadrature for Mars is 104 days. Therefore,

E[sub 1] SE[sub 2] = 104 days/365 x 360 Degrees

approximately equal to 103 Degrees.

Since the sidereal period of Mars is 687 days,

P[sub 1]SP[sub 2] - 104 days/687 days x 360 Degrees

approximately equal to 55 Degrees.

By subtraction, angle P[sub 2]SE[sub 2] approximately equal to 48 Degrees; and by trigonometry, PS approximately equal to 1.5 ES approximately equal to 1.5 AU.

Table 1 shows the values that Copernicus obtained for the planets known at that time and compares them with modern values.

Copernicus still assumed, as other astronomers had before him, that planetary orbits were circles or combinations of circles. Johannes Kepler, a student of Tycho Brahe, discovered otherwise. At the end of the sixteenth century, Brahe made detailed star and planet observations covering a period of about twenty years. After Brahe's death Kepler spent years analyzing Brahe's data, concentrating on the data for Mars, and in 1609 he published his findings--that the planets move around the Sun in ellipses. That discovery, in spite of the fact that the eccentricity of Mars's orbit is only about one-tenth, is a tribute to his powers of analysis, as well as to the accuracy and thoroughness of Brahe's observations.

To determine that orbits were ellipses, Kepler had to calculate the distance from Mars to the Sun at many different places in its orbit. Figure 8 shows his method (Abell 1964). From any position E[sub 1] of Earth, the angle SE[sub 1]M is measured. The sidereal period of Mars is 687 days, after which Mars has returned to M and Earth, having made almost two complete revolutions, is at E[sub 2]. From E[sub 2], angle SE[sub 2]M is measured. At 687 days Earth is (2)(365.25) - 687 = 43.5 days short of two full revolutions, from which information angle E[sub 1]SE[sub 2] can be calculated. SE[sub 1] and SE[sub 2] are known (1 AU--but a problem arises with this assumption, as described in the following paragraph). From this information can be found E[sub 1]E[sub 2], which allows the solution of triangle E[sub 1]E[sub 2]M, which leads to triangle SE[sub 1]M or SE[sub 2]M and the distance SM. Kepler found SM at many points along the orbit of Mars by choosing from Brahe's records the elongations of Mars--angles SE[sub 1]M or SE[sub 2]M--on each of many pairs of dates separated from each other by intervals of 687 days.

A question that I have been unable to answer was how Kepler dealt with the fact that SE is not really constant because Earth's orbit is also an ellipse. I assume that he must have found a way around the problem, but without more information I can only speculate on how he did it.

Kepler published three findings, which have become known as Kepler's laws of planetary motion. They are as follows:

1. The planets move around the Sun in ellipses, with the Sun at one focus.

2. A line connecting a planet with the Sun will sweep out equal areas in equal times. This phenomenon occurs because a planet moves faster when it is closer to the Sun. In figure 9, the time interval from E[sub 3] to E[sub 4] equals the time interval from E[sub 1] to E[sub 2], and area SE[sub 1]E[sub 2] equals area SE[sub 3]E[sub 4].

3. The squares of the planets' periods of revolution are proportional to the cubes of their distances from the Sun. So P]sup 2] = Ka[sup 3], where a is the length of the semimajor axis of the elliptical orbit. When P is measured in years and a in astronomical units, K = 1.

Table 2 illustrates Kepler's third law for the planets known in his time. Incidentally, these data can be used for a wonderful problem in data analysis. During a unit on nonlinear data analysis, I gave my advanced-algebra students the data in the first three columns, and they were able to determine that P = f(a) is a power function with exponent 3/2.


As Duncan (1981) says, "Kepler's laws summed up neatly how the planets of the solar system behaved without indicating why they did so." Newton, who comes into the story at this point, built on Kepler's work to develop his law of universal gravitation, which has allowed us to weigh the Sun, Moon, and planets. Start with the formula

F = mv[sup 2]/r,

which is for the centripetal, or inward, force F needed to cause a mass m to move with velocity v around a circle of radius r. A planet of mass m[sub p] revolving around the Sun has velocity

v = 2 Pi r/P,

where P is its period of revolution. Substituting for v in the force formula gives

F = m[sub p] v[sup 2]/r

= m[sub p] 4 Pi[sup 2] r[sup 2]/r P[sup 2]

= M[sub p] 4 Pi[sup 2] r/P[sup 2]

Kepler's third law says that P[sup 2] = kr[sup 3], where k is a proportionality constant. Substituting further then gives

F = m[sub p] 4 Pi[sup 2] r/kr[sup 3]

= 4 Pi[sup 2]/k x m[sub p]/r[sub r],

that is,

F proportional to m[sup p]/r[sup 2].

The sun exerts that force, F, on the planet. At a given distance r, it is proportional to the mass of the planet.

By Newton's third law of motion, the planet exerts the same force on the Sun. Since the Sun's force on the planet depends on the mass of the planet, it seems reasonable to suppose that the planet's force on the Sun depends on the mass of the Sun, which means that the mutual force depends on both masses. The mutual force cannot depend on the sum of the masses, since doubling a mass doubles the force but doubling one term of a sum does not double the sum; that is, a + 2b is not twice a + b. Newton assumed that the force depended on the product of the masses, an assumption that agrees with the result that doubling either factor in a product doubles the product, that is, (2a) x b = a x (2b) = 2(ab). So for the Sun and a planet, the mutual force of attraction, F, is

F proportional to m[sub s] m[sub p]/r[sup 2]


F proportional to m[sub s] m[sub p]/r[sup 2]

where G is a constant that needs to be determined by experiment and r is the distance between the centers of the two objects. Newton spent many years investigating this phenomenon and it took the invention of a little thing called calculus to prove that r is the distance between the centers of the objects.

The preceding result is Newton's law of universal gravitation. By universal, Newton meant that it applies equally to all objects, both terrestrial and celestial. To test his law, Newton compared the falling of an object at the surface of Earth (the famous apple?) to the falling of the Moon. Figure 10 (Feynman 1995) shows what is meant by a "falling" Moon. In one second, the Moon travels from A to B in its orbit. If Earth did not attract the Moon, it would travel along the tangent instead. Thus the distance s is the distance it has "fallen." In right triangle ABC,

s/x = x/2r - s approximately equal to x/2r,

since s < x < r; or

s approximately equal to x[sup 2]/2r.

The quantity x is the distance that the Moon travels in one second. Since the moon's average distance from the center of Earth is about 385 000 km,

x = 1 second/1 month x 2 Pi r approximately equal to 4.24 x 10[sup -7] x 2 Pi x 3.85 x 10[sup 8] m approximately equal to 1026 m.

Substituting this result into the previous formula gives

s approximately equal to x[sup 2]/2r

approximately equal to (1026m)[sup 2]/2 x 3.85 x 10[sup 8] m

approximately equal to 0.0014 m.

So 0.0014 m is the distance that the Moon falls in one second. At the surface of Earth, which is 6400 km from its center, an object falls about 5 m in one second. If Earth's gravitational pull varies inversely as the square of the distance from its center, Newton reasoned, then the distance that the Moon falls in one second should be (6400/385000)[sup 2], or approximately 0.00028 times the distance that an object on Earth falls in one second. Our figures agree with Newton's reasoning, since (0.00028)(5) approximately equal to 0.0014.


The traditional way to determine the distance to inaccessible objects is by triangulation. Triangulation to very distant objects is usually done using the concept of parallax. Parallax describes the phenomenon that occurs when you hold your finger in front of your face and alternately close your left eye and your right eye. Your finger appears to shift its position with respect to the background. Figure 11 illustrates how this idea can be used to measure the distance to object O. First stand at a position A so that object O is aligned with some other object, much farther away than O. Then move to the side to a new position B. Object O and the more distant object are no longer aligned but rather subtend some angle q at your eye. This angle can be measured. If the more distant object is sufficiently far away, the lines to it from A and B are nearly parallel and <p approximately equal to <q. Angle p is called the parallax angle. As long as angle p is small, the baseline AB can be taken as an arc of a circle with center at O and radius x. Since AB and the parallax angle can be measured, distance x can be calculated using the arc-length formula from geometry, giving

AB = p/360 2 Pi x arrow right = 180 x AB/Pi p

Since distances to astronomical objects are so enormous, angle p is always very small, sometimes only a fraction of a second of arc; and so approximating a segment with an arc does not make any measurable difference. The smallness of angle p also explains why measuring the base angles at A and B, as would be done in solving a triangle that was less "long and skinny," is impractical.

Students cannot collect their own data for most of the problems in this article, but they can get hands-on experience using parallax to measure distances, as my geometry classes have done. In addition to a tape measure, the only equipment needed is a device that measures small angles with some accuracy. Working in groups of four, my students made their own parallax-measuring devices using a piece of Styrofoam about 60 cm by 15 cm. See figure 12. About 50 cm from one end, they placed a row of pins spaced so that adjacent pins would subtend angles of 0.5 degree when viewed from that end. Of course, they had to use the previously mentioned arc-length formula to calculate how far apart to place the pins. After making their measuring devices, they practiced measuring small angles and distances in the classroom, where I could be certain that they knew what they needed to do and what quantities they needed to measure, that is, angle q and baseline AB. I then sent the groups outside after giving each group a description of some specific object on campus, such as a water tower or telephone pole; a specific place to stand to measure its distance; and a specific object to use as the distant background object. I also gave each group a photograph with these objects marked, to help them orient themselves.

To find the length of the astronomical unit, astronomers could in theory measure the parallax of the Sun from two points on Earth's surface. Obtaining this measurement is nearly impossible in practice, though, because of the Sun's angular size, brightness, and distance. However astronomers can triangulate the distance to some planet, say, Mars, to obtain its distance from Earth in miles or kilometers. Its distance is already known in AU, and from this result, the size of an AU can be calculated. This result then gives the scale of the entire Solar System.

The story told here of necessity is incomplete, but I hope that it is tantalizing enough with all its connections to history and science to encourage interested teachers and students to explore the topic further.

The author thanks Craig Merow for his assistance in preparing this manuscript for publication.

TABLE 1 Orbital Radii of Planets, in AU

Legend for Chart:

A - Planet

B - Copernicus's Value

C - Modern Value

A                B         C

Mercury        0.36     0.387

Venus          0.72     0.723

Earth          1.00     1.00

Mars           1.5      1.52

Jupiter        5        5.20

Saturn         9        9.54

TABLE 2 Illustration of Kepler's Third Law

Legend for Chart:

A - Planet

B - Semimajor Axis a (AU)

C - Sidereal Period P (yrs.)

D - A[sup 3]

E - p[sup 2]

A              B        C         D       E

Mercury     0.387    0.241     0.058   0.058

Venus       0.723    0.615     0.378   0.378

Earth           1        1         1       1

Mars        1.524    1.881      3.54    3.54

Jupiter     5.203    11.86       141     141

Saturn      9.539    29.46       868     868


Source: Physics Today, Aug99, Vol. 52 Issue 8, p26, 6p, 2 diagrams, 1bw 

Author(s): Griffiths, Robert B.; Omnes, Roland

The traditional Copenhagen orthodoxy saddles quantum theory with embarrassments like Schrodinger's cat and the claim that properties don't exist until you measure them. The consistent-histories approach seeks a sensible remedy.

Students of quantum theory always find it a very difficult subject. To begin with, it involves unfamiliar mathematics: partial differential equations, functional analysis, and probability theory. But the main difficulty, both for students and their teachers, is relating the mathematical structure of the theory to physical reality. What is it in the laboratory that corresponds to a wavefunction, or to an angular momentum operator? Or, to use the picturesque term introduced by John Bell,(n1) what are the "beables" (pronounced BE-uh-bulls) of quantum theory--that is to say, the physical referents of the mathematical terms?

In most textbooks, the mathematical structures of quantum theory are connected to physical reality through the concept of measurement. Quantum theory allows us to predict the results of measurements--for example, the probability that this counter rather than that one will detect a scattered particle. That the concept of measurement played an important role in the early development of quantum theory is evident from Niels Bohr's account of his discussions with Albert Einstein at the 1927 and 1930 Solvay conferences.(n2) And it soon became part of the official "Copenhagen" interpretation of the theory.

But what may well have been necessary for the understanding of quantum theory at the outset has not turned out to provide a satisfactory permanent foundation for the subject. Later generations of physicists who have tried to make a measurement concept a fundamental axiom for the theory have discovered that this raises more problems than it solves. The basic difficulty is that any real apparatus in the laboratory is composed of particles that are presumably subject to the same quantum laws as the phenomenon being measured. So, what is special about the measuring process? Is not the entire universe quantum mechanical?

When quantum theory is applied to astrophysics and cosmology, the whole idea of using measurements to interpret its predictions seems ludicrous. Thus, many physicists nowadays regard what has come to be called "the measurement problem" as one of the most intractable difficulties standing in the way of understanding quantum mechanics.

Two measurement problems

There are actually two measurement problems that conventional textbook quantum theory cannot deal with. The first is the appearance, as a result of the measurement process, of macroscopic quantum superposition states such as Erwin Schrodinger's hapless cat. The second problem is to show that the results of a measurement are suitably correlated with the properties the measured system had before the measurement took place--in other words, that the measurement has actually measured something.

The macroscopic-superposition problem is so difficult that it has provoked serious proposals to modify quantum theory, despite the fact that all experiments carried out to date have confirmed the theory's validity. Such proposals have either added new, "hidden" variables to supplement the usual Hilbert space of quantum wavefunctions, or they have modified the Schrodinger equation so as to make macroscopic superposition states disappear. (For a discussion of two such proposals, see the two-part article by Sheldon Goldstein in PHYSICS TODAY, March 1998, page 42, and April 1998, page 38.) But even such radical changes do not resolve the second measurement problem.

Both problems can, however, be resolved without adding hidden variables to the Hilbert space and without modifying the Schrodinger equation. In a series of papers starting in 1984, an approach to quantum interpretation known as consistent histories, or decoherent histories, has been introduced by us and by Murray Gell-Mann and James Hartle.(n3) The central idea is that the rules that govern how quantum beables relate to each other, and how they can be combined to form sensible descriptions of the world, are rather different from what one finds in classical physics.

In the consistent-histories approach, the concept of measurement is not the basis for interpreting quantum theory. Instead, measurements can be analyzed, together with other quantum phenomena, in terms of physical processes. And there is no need to invoke mysterious long-range influences and similar ghostly effects that are sometimes claimed to be present in the quantum world.(n4)

Quantum histories

The two measurement problems, and the consistent-histories approach to solving them, can be understood by referring to the simple gedanken experiment shown in figure 1. A photon (or neutron, or some other particle; it makes no difference) enters a beam splitter in the a channel and emerges in the c and d channels in the coherent superposition:

(1) |a> arrow right |s> = (|c> + |d>)/square root of 2.

Here |a>, |c>, and |d> are wavepackets in the input and output channels, and |s> is what results from |a> by unitary time evolution (that is, by solving the appropriate Schrodinger equation) as the photon passes through the beam splitter.

The photon will later be detected by one of two detectors, C and D. To describe this process in quantum terms, we assume that |C> is the initial quantum state of C, and that the process of its detecting a photon in a wavepacket |c> is described by

(2) |c>|C> arrow right |C[sup *]>,

where |C[sup *]> is the triggered state of the detector after it has detected the photon. Once again, the arrow indicates the unitary time evolution produced by solving Schrodinger's equation. It is helpful to think of |C> and |C[sup *]> as physically quite distinct: Imagine that a macroscopically large pointer, initially horizontal in |C>, is moved to a vertical position in the state |C[sup *]> when the photon has been detected.

By putting together the processes (1), (2), and the counterpart of (2) that describes the detection of a photon in the d channel by detector D, one finds that the unitary time development of the entire system shown in figure 1 is of the form

(3) |a>|C>|D> arrow right |s> = (|C[sup *]>|D> + |C>|D[sup *]>)/ square root of 2.

Ascribing some physical significance to the peculiar macroscopic-quantum-superposition state |S> in (3) poses the first measurement problem in our gedanken experiment. The difficulty is that |S> consists of a linear superposition of two wavefunctions representing situations that are visibly, macroscopically, quite distinct: The pointer on C is vertical and that on D is horizontal for |C[sup *]>|D>, whereas for |C>|D[sup *]> the D pointer is vertical and the C pointer is horizontal. In Schrodinger's famously paradoxical example, the two distinct situations were a live and a dead cat. A great deal of effort has gone into trying to interpret |S> as meaning that either one detector or the other has been triggered, but the results have not been very satisfactory.(n5)

The first measurement problem is an almost inevitable consequence of supposing that, in quantum theory, a solution of Schrodinger's equation represents a deterministic time evolution of a physical system, in the same way as does a solution of Hamilton's equations in classical mechanics. That was undoubtedly Schrodinger's point of view when he introduced his equation. The probabilistic interpretation now universally accepted among quantum physicists was introduced shortly thereafter by Max Born. Since then, chance and determinism have maintained a somewhat uncomfortable coexistence within quantum theory, with many scientists continuing to share Einstein's view that resorting to probabilities is a sign that something is incomplete.

A stochastic theory

By contrast, the consistent-histories viewpoint is that quantum mechanics is fundamentally a stochastic or probabilistic theory, as far as time development is concerned, and that it is not necessary to introduce some deterministic underpinning of this randomness by means of hidden variables. The basic task of quantum theory is to use the time-dependent Schrodinger equation, not to generate deterministic orbits, but instead to assign probabilities to quantum histories--sequences of quantum events at a succession of times--in much the same way that classical stochastic theories assign probabilities to sequences of coin tosses or to Brownian motion. This perspective does not exclude deterministic histories, but those are thought of as arising in special cases in which the probability of a particular sequence of events is equal to 1.

For the gedanken experiment in figure 1, the consistent-histories solution to the first measurement problem consists of noting that a perfectly good description of what is happening is provided by assuming that the initial state is followed at a later time by one of two mutually exclusive possibilities: |C[sup *]>|D> or |C>|B[sup *]>. They are related to each other in much the same way as heads and tails in a coin toss. That is to say, the system is described by one (and, in a particular experimental run, only one) of the two quantum histories:

(4) |a>|C>|D> arrow right |C[sup *]>|D> or |a>|C>|D> arrow right |C>|D[sup *]>,

where the arrow no longer denotes unitary time development. Quantum theory assigns to each history a probability of 1/2. (Of course, to check this prediction, one would have to repeat the experiment using several photons in succession, each time resetting the detectors.)

The troublesome macroscopic quantum superposition state |S> of (3) appears nowhere in (4). Indeed, as we discuss below, the rules of consistent-histories quantum theory mean that |S> cannot occur in the same quantum description as the final detector states employed in (4). Therefore, the first measurement problem has been solved (or, at least it has disappeared) if one uses the stochastic histories in (4) in place of the deterministic history in (3).

The fundamental beables of consistent histories quantum theory--that is, the items to which the theory can ascribe physical reality, or at least a reliable logical meaning--are consistent quantum histories: sequences of successive quantum events that satisfy a consistency condition about which more is said below. A quantum event can be any wavefunction--that is to say, any nonzero element of the quantum Hilbert space. The two histories in (4), as well as the single history in (3), are examples of consistent quantum histories. They are thus acceptable quantum descriptions of what goes on in the system shown in figure 1.

At this point, the reader may be skeptical of the claim that the first measurement problem has been solved. We have simply replaced (3), with its troublesome macroscopic quantum superposition state, by the more benign pair of histories in (4). But as long as (3) is an acceptable history--as is certainly the case from the consistent-histories perspective--how can we claim that (4) is the correct quantum description rather than (3)? Or is it possible that both (3) and (4) apply simultaneously to the same system? Before attempting an answer, let us take a slight detour to introduce the concept of quantum incompatibility, which plays a central role in the consistent-histories approach to quantum theory.

Quantum incompatibility

The simplest quantum system is the spin degree of freedom of a spin-1/2 particle, described by a two-dimensional Hilbert space. Every nonzero (spinor) wavefunction in this space corresponds to a component of spin angular momentum in a particular direction taking the value 1/2 in units of h. Thus the quantum beables of this system, in the consistent-histories approach as well as in standard quantum mechanics, are of the form S[sub w] = 1/2, where w is a unit vector pointing in some direction in three-dimensional space, and S[sub w] is the component of spin angular momentum in that direction. (Actually, S[sub w] = 1/2 corresponds to a whole collection of wavefunctions obtained from each other through multiplication by a complex number, and thus to a one-dimensional subspace of the Hilbert space.)

The nonclassical nature of quantum theory begins to appear when one asks about the relationship of these beables, or quantum states, for two different directions w. If the directions are opposite, for example +z and -z, the states S[sub z] = 1/2 and S[sub -z] = 1/2 are two mutually exclusive possibilities, one of which is the negation of the other. Thus they are related in the same way as the results of tossing a coin: if heads (S[sub z] = 1/2) is false, tails (S[sub z] = -1/2) is true, and vice versa. This means, in particular, that the proposition "S[sub z] = 1/2 and S[sub z] = -1/2" can never be true. It is always false.

That this is a reasonable way of understanding the relationship between S[sub z] = 1/2 and S[sub z] = -1/2 is confirmed by the fact that if a spin-1/2 particle is sent through a Stern-Gerlach apparatus with its magnetic field gradient in the z direction, the result will be either S[sub z] = 1/2 or -1/2, as shown by the position at which the particle emerges. Precisely the same applies to any other component of spin angular momentum. Thus, for example, S[sub x] = 1/2 is the negation of S[sub x] = -1/2. (As an amusing aside, we note that when Otto Stern proposed in 1921 to demonstrate the quantization of angular-momentum orientation, Born assured him that he would see nothing, because such spatial quantization was only a mathematical fiction.(n6))

But what is the relationship of beables that correspond to components of spin angular momentum for directions in space that are not opposite to each other? How, for example, is S[sub x] = 1/2 related to S[sub z] = 1/27 In consistent-histories quantum theory, "S[sub x] = 1/2 and S[sub z] = 1/2" is considered a meaningless expression, because it cannot be associated with any genuine quantum beable, that is, with any element of the quantum Hilbert space. Note that every non-zero element in that space corresponds to S[sub w] = 1/2 for some direction w, so there is nothing left over that could describe a situation in which two components of the spin angular momentum both have the value 1/2.

Putting it another way, there seems to be no sensible way to identify the assertion "S[sub x] = 1/2 and S[sub z] = 1/2," with S[sub w] = 1/2 for some particular direction w. (For a more detailed discussion, see section 4A of reference 7.) That agrees, by the way, with what all students learn in introductory quantum mechanics: There is no possible way to measure S[sub x] and S[sub z] simultaneously for a spin-1/2 particle. From the consistent-histories perspective, this impossibility is no surprise: What is meaningless does not exist, and what does not exist cannot be measured.

Meaningless or simply false?

It is very important to distinguish a meaningless statement from a statement that is always false. "S[sub z] = 1/2 and S[sub z] = 1/2" is always false, because S[sub z] = 1/2 and S[sub z] = -1/2 are mutually exclusive alternatives. The negation of a statement that is always false is a statement which is always true. By contrast, the negation of a meaningless statement is equally meaningless. The negation of the meaningless assertion "S[sub x] = 1/2 and S[sub z] = 1/2," following the ordinary rules of logic, is "S[sub x] =-1/2 or S[sub z] =-1/2." In consistent-histories quantum theory, this latter assertion is just as meaningless as the former. How, after all, would one go about testing it by means of an experiment?

This spin-1/2 example is the simplest illustration of quantum incompatibility: Two quantum beables A and B, each of which can be imagined to be part of some correct description of a quantum system, have the property that they cannot both be present simultaneously in a meaningful quantum description. That is, phrases like "A and B" or "A or B," or any other attempt to combine or compare A and B, cannot refer to a real physical state of affairs. Many instances of quantum incompatibility come about because of the mathematical structure of Hilbert space and the way in which quantum physicists understand the negation of propositions. Others are consequences of violations of consistency conditions for histories. In either case, the concept of quantum, incompatibility plays a central role in consistent histories. Failure to appreciate this has, unfortunately, led to some misunderstanding of consistent-histories ideas.

Now let us return to the discussion of the histories in (3) and (4). The two histories in (4) are mutually exclusive; if one occurs, the other cannot. Think of them as analogous to S[sub z] = 1/2 and S[sub z] = -1/2 for a spin-1/2 particle. On the other hand, each of the histories in (4) is incompatible, in the quantum sense, with the history in (3), which one can think of as analogous to SI = 1/2. Indeed, the relationship between the state |S> in (3) and the states |C>|D[sup *]> and |C[sup *]>|D> in (4) is formally the same as that between the state S[sub x] = 1/2 and the states S[sub z] = 1/2 and S[sub z] = -1/2. Consequently, the question of whether (3) occurs rather than, or at the same time as, the histories in (4) makes no sense.

It may be helpful to push the spin analogy one step further. Imagine a classical spinning object subjected to random torques of a sort that leave L[sub x], the x component of angular momentum, unchanged while randomly altering the other two components, L[sub y] and L[sub z]. In such a case, a classical history that describes only L[sub x] will be deterministic; it will have a probability of 1. L[sub z], on the other hand, can be described by a collection of several mutually exclusive histories, each having a nonzero probability.

Of course, classical histories of this kind can always be combined into a single history, whereas the deterministic quantum history in (3), corresponding to the L[sub x] history in this analogy, cannot be combined with the stochastic histories in (4), the analogs of the L[sub z] histories. Nevertheless, the analogy has some value in that it suggests that (3) and (4) might be regarded intuitively as describing alternative aspects of the same physical situation. Although all classical analogies for quantum systems break down eventually, this one is less misleading than trying to think of (3) and the set of histories in (4) as mutually exclusive possibilities. It helps prevent us from undertaking a vain search for some "law of nature" that would tell us that (4) rather than (3) is the correct quantum description.

The second measurement problem

Particle physicists are always designing and building their experiments under the assumption that a measurement carried out in the real world can accurately reflect the state of affairs that existed just before the measurement. From a string of sparks or bubbles, for example, they infer the prior passage of an ionizing particle through the chamber. Extrapolating the tracks of several ionizing particles backward, they locate the point where the collision that produced the particles took place. But according to many textbook accounts of the quantum measuring process, retrodictions that use experimental results to infer what the particle was doing before this kind of measurement was made are not possible. Should we conclude, then, that experimenters don't take enough courses in quantum theory?

The consistent-histories analysis shows that the experimenters do, in fact, know what they are doing, and that such retrodictions are perfectly compatible with quantum theory. It also provides general rules for carrying out retrodictions safely, without producing contradictions or paradoxes. The consistent-histories approach even offers some insight into why the textbooks have often regarded retrodiction as dangerous.

The basic idea can be illustrated once again by reference to figure 1. Suppose the photon has been detected by detector C. In which channel was it just prior to detection: channel c or d? The very nature of the question tells us that (3) is of no help; we must resort to the histories in (4). But even they are inadequate, because they tell us nothing about what the photon is doing at intermediate times. To address that question, we must consider the following refinements of the histories in (4):

(5) |a>|C>|D> arrow right |c>|C>|D> arrow right |C[sup*]>|D>, |a>|C>|D> arrow right |d|C>|D> arrow right >|C>|D[sup *]>,

in which intermediate events have been added to describe the photon after it passes through the beam splitter, but before it is detected. The consistent-histories rules assign a probability of 1/2 to each of these histories. That means it is impossible, given the initial state, to predict whether the photon will leave the beam splitter through channel c or d. But if the final detector state is |C[sup *]>|D>, meaning that C has detected the photon, then the first history in (5), not the second, is the one that actually occurred. So, at the intermediate time, the photon was in state |c> rather than |d>. That is to say, it was in the c channel.

Why has this rather obvious way of solving the second measurement problem been overlooked for so long? Probably because a quantum physicist who grew up with the standard textbooks will describe the situation in figure 1 by means of a pair of histories

(6) |a>|C>|D> arrow right |s>|C>|D> arrow right |C[sup *]>|D>, |a>|C>|D> arrow right |s>|C>|D> arrow right |C>|D[sup *]>,

in which, at the intermediate time, the photon is in the superposition state |s> defined in (1). He will wait until the measurement takes place and then "collapse" the wavefunction for reasons that he may not understand very well. But at least they make more sense to him than does the macroscopic quantum superposition state |S> of (3).

From the standpoint of consistent histories, such a physicist is, in effect, employing the histories in (6), which are perfectly good quantum beables, as part of a stochastic quantum description. However, if the photon is in the superposition state |s> at the intermediate time, quantum incompatibility implies that it makes no sense to ask whether it is in the c channel or the d channel. That question can be asked only in the context of the histories in (5).

The existence of a quantum description employing the set of histories in (6), in which the question of the relationship between the measurement result and the location of the photon before the measurement is meaningless, does not invalidate the conclusion reached by means of the histories in (5), which provide a definite answer to that question. It is a quite general feature of quantum reasoning that various questions of physical interest can be addressed only by constructing an appropriate quantum description. That is quite unlike classical physics, where a single description, such as specifying a precise point in the phase space of a mechanical system, suffices to answer all meaningful questions.

Consistency conditions

The beables in consistent-histories quantum theory are a collection of mutually exclusive histories to which probabilities are assigned by the dynamical laws of quantum mechanics (Schrodinger's equation). If the histories involve just two times, as in (4), these probabilities are given by the usual Born rule--namely, the absolute square of the inner product of the time-evolved initial state and the final state in question. Histories involving three or more times, as in (5), require a generalization of the Born rule and additional consistency conditions to assure that the probabilities make physical sense.

Not all collections of mutually exclusive histories satisfy the mathematical conditions of consistency. The consistent-histories approach ascribes physical meaning only to histories that satisfy the consistency conditions. Other cases are regarded as meaningless; that is to say, they are rather like trying to simultaneously ascribe values for S[sub x] and S[sub z] to a spin-1/2 particle. (See the box above for additional remarks on consistency conditions.)

Consistency conditions are needed for a consistent discussion of the quantum double-slit experiment,(n8) in which a wavepacket approaches the slits at time t[sub 1], it passes through one or the other slit just before t[sub 2], and it arrives at t[sub 3] at some point in the interference zone, where waves from the two slits interfere with each other. It turns out that histories in which the particle passes through a particular slit and then arrives at a particular point in the interference zone do not satisfy the consistency conditions, and thus do not constitute acceptable quantum beables. That will come as no surprise to generations of students who have been taught that asking which slit the particle passes through is not a sensible question. In this respect, the consistency conditions support the physicist's usual intuition at the same time as they provide a precise mathematical formulation applicable in other situations where intuitive arguments are not sufficient for precise analysis.

On the other hand, if there are detectors just behind the two slits, one's physical intuition says that it should be sensible to say which slit the particle passes through. Such intuition is used all the time in designing experiments in which collimators are placed in front of detectors. In that case, the relevant histories, which are the analogs of (5), turn out to be consistent. Furthermore, even if there are no detectors behind the slits, there are consistent histories in which the particle passes through a particular slit and then arrives in a spread-out wavepacket in the interference zone, rather than at a particular point. (See the box for more details in an analogous situation involving a Mach-Zehnder interferometer.)

The physical consequences of consistency conditions are still being explored, and there is not yet complete agreement even on their mathematical form. However, the different formulations one finds in references 9, 10, and 11 do not seem to make any significant difference in most physical applications.

Classical limit

Because classical mechanics provides an excellent description of the motion of macroscopic objects in the everyday world, one would expect that quantum theory, in an appropriate limit, would yield the laws of classical physics to very good approximation. This conclusion is supported by Paul Ehrenfest's argument, which one finds in elementary textbooks, to the effect that average values of certain quantum observables satisfy equations similar to those of classical mechanics. But that is not a satisfactory solution to the problem of the classical limit, for two reasons: One wants to know how individual systems behave, not just the ensemble to which such an average applies. Furthermore, such an average, in the usual textbook understanding of quantum theory, refers to the results of measurements, and is not valid when measurements are not made.

In the consistent-histories approach, the classical limit can be studied by using appropriate subspaces of the quantum Hilbert space as a "coarse graining," analogous to dividing up phase space into nonoverlapping cells in classical statistical mechanics. This coarse graining can then be used to construct quantum histories. It is necessary to show that the resulting family of histories is consistent, so that the probabilities assigned by quantum dynamics make good quantum mechanical sense. Finally, one needs to show that the resulting quantum dynamics is well approximated by appropriate classical equations.

Demonstrating all this in complete detail is a difficult problem. But so is the analogous problem of finding the behavior of a large number of particles governed by classical mechanics. Indeed, the problem of showing that a system of classical particles will exhibit thermodynamic irreversibility, a typical macroscopic phenomenon, has not yet been settled to everyone's satisfaction, despite a continuing effort that goes back to Ludwig Boltzmann's work a century ago. (See the articles by Joel Lebowitz in PHYSICS TODAY, September 1993, page 32, and by George Zaslavsky in this issue, page 39.)

Nonetheless, calculations carried out by one of us,(n11, n12) and by Gell-Mann and Hartle,(n10) indicate that, given a suitable consistent family, classical physics does indeed emerge from quantum theory. Of course the classical equations are only approximate. They must be supplemented by including a certain amount of random noise, as one would expect from the fact that quantum dynamics is a stochastic process. In many circumstances, this quantum noise will not have much influence, but it can be amplified in systems that exhibit (classical) chaotic behavior. Even so, because the classical dynamics of such systems is noisy for all practical purposes, even if it is deterministic in principle, they are not likely to exhibit distinctive quantum effects.

The consistency of a family of histories for a macroscopic system is often ensured by quantum decoherence, an effect closely related to thermodynamic irreversibility. (See the article by Wojciech Zurek in PHYSICS TODAY, October 1991, page 36.) Demonstrating that quantum systems actually exhibit irreversible behavior in the thermodynamic sense, on the other hand, is not trivial. There are conceptual and computational difficulties similar to those that arise when one considers a classical system of many particles. Nonetheless, there seems at present to be no difficulty, in principle, that prevents us from understanding macroscopic phenomena in quantum terms, including what happens in a real measurement apparatus. Thus, by interpreting quantum mechanics in a manner in which measurement plays no fundamental role, we can use quantum theory to understand how an actual measuring apparatus functions.

We are grateful to Todd Brun, Sheldon Goldstein, James Hartle, and Wojciech Zurek for comments on the manuscript. One of us (Griffiths) acknowledges financial support from the National Science Foundation through grant PHY 9602084.

Consistency Conditions: An Application

The consistency conditions as formulated in reference 9 are obtained by associating with each of the histories in a particular family a "weight" operator on the Hilbert space, and then requiring that the weight operators for mutually exclusive histories be orthogonal to each other--the operator inner product being generated by the trace. This somewhat abstract prescription is best understood by working through simple examples, such as the one in section 6C of reference 8. Here, we give an application of the consistency conditions to a situation of some physical interest.

Consider the Mach-Zehnder interferometer illustrated in figure 2. A wavepacket of light passing through the first beam splitter B[sub 1] is reflected by a pair of mirrors, C and D, onto a second beam splitter B2 preceding the output channels e and f. The effect of B[sub 1] on the wavepacket |a> of a photon in the initial a channel at time t[sub 1] is to produce, at a slightly later time t[sub 2], the same kind of superposition |s> of wavepackets |c> and |d> in the c and d arms of the interferometer as we had in equation (1). The effect of the second beam splitter is given by

(7) |c> arrow right (|e> + |f>)/ square root of 2 |d> arrow right (-|e>+ |f>)/square root of 2,

where |e> and |f> are wavepackets in the output channels at t[sub 3]. The optical paths have been so arranged that the two |e> components in (7) appear with opposite phases.

Therefore, when we combine (1) and (7), we see that the photon entering at a must emerge in channel f, corresponding to the three-time history

(8) |a> arrow right |s> arrow right |f>,

which satisfies the consistency conditions simply because it is a solution of Schrodinger's equation.

On the other hand, the pair of mutually exclusive histories

(9) |a> arrow right |c> arrow right |f> and |a> arrow right |d> |f?,

in which the particle passes through either the c or d arm at the intermediate time t[sub 2] and then emerges in the f channel, are not consistent, because the corresponding weight operators are not orthogonal. The reader may check this by the methods of reference 9, but it will require some work.

Consequently, it makes no sense to say that the particle passes through the c or the d arm and then emerges in the f channel. However, the two histories

(10) |a> arrow right |c> arrow right (|e> + |f>)/ square root of 2 |a> arrow right |d> arrow right (-|e> + |f>)/ square root of 2

are consistent, because here the weight operators are orthogonal. Again we leave the proof as an exercise. Thus it makes perfectly good sense to say that the photon passes through the c arm and emerges in a certain coherent superposition of states in the two output channels, or through the d arm to emerge in a different superposition.

This Mach-Zehnder example is analogous to the canonical double-slit experiment, if one regards passing through the c or d arm as analogous to passing through the upper or lower slit, and emerging in e or f as analogous to the particle arriving at a point of minimum or maximum intensity in the double-slit interference zone.


Source: Evolutionary Computation, Summer99, Vol. 7 Issue 2, p125, 25p, 2 diagrams, 14 graphs Author(s): Lanzi, Pier Luca


The XCS classifier system represents a major advance in learning classifier systems research because (1) it has a sound and accurate generalization mechanism, and (2) its learning mechanism is based on Q-learning, a recognized learning technique. In taking XCS beyond its very first environments and parameter settings, we show that, in certain difficult sequential ("animat") environments, performance is poor. We suggest that this occurs because in the chosen environments, some conditions for proper functioning of the generalization mechanism do not hold, resulting in overly general classifiers that cause reduced performance. We hypothesize that one such condition is a lack of sufficiently wide exploration of the environment during learning. We show that if XCS is forced to explore its environment more completely, performance improves dramatically. We propose a technique, based on Sutton's Dyna concept, through which wider exploration would occur naturally. Separately, we demonstrate that the compacmess of the representation evolved by XCS is limited by the number of instances of each generalization actually present in the environment. The paper shows that XCS's generalization mechanism is effective, but that the conditions under which it works must be clearly understood.


Learning classifier systems, XCS, generalization, genetic operators.

1 Introduction

Autonomous agents are not, in general, able to deal with the complexity of real environments. The ability of an agent to generalize over the different situations it experiences is essential in order to learn tasks in real environments. In fact, an agent which generalizes properly is able to synthesize, in a compact way, the knowledge it acquires so as to manipulate the concepts it learns.

Generalization is a very important feature of XCS, the classifier system introduced by Wilson (1995). XCS has been shown to evolve near-minimal populations of classifiers that are accurate and maximally general (Kovacs, 1997; Wilson, 1997a). Recently, Kovacs (1997) proposed an optimality hypothesis for XCS and presented experimental evidence of his hypothesis with respect to the Boolean multiplexer, a known testbed for studying generalization in learning classifier systems (Wilson, 1987; Wilson, 1995).

In taking XCS beyond its very first environments and parameter settings, Lanzi (1997) reported experimental results for problems involving artificial animals, animats (Wilson, 1987), showing that in difficult sequential problems XCS performance may fail dramatically. The author observed that in these kinds of tasks the generalization mechanism of XCS can be too slow to delete overly general classifiers before they proliferate in the population. In order to avoid this problem, Lanzi (1997) introduced a new operator, called specify, which helps XCS delete overly general classifiers by replacing them with more specific offspring. An alternate solution was suggested by Wilson in which the random exploration strategy employed in his first experiments with XCS was replaced with biased exploration (Wilson, 1997b).

Until recently (Kovacs, 1996; Lanzi, 1997b), the analysis of the generalization capabilities of XCS has been presented without considering the relation between XCS's performance and the environment structure. As a result it is not clear why one environment is easy to solve, while a similar one can be much more difficult.

The aim of this paper is to suggest an answer to this question enabling a better understanding of the generalization mechanism of XCS, while giving a unified view of the observations in Lanzi (1997) and Wilson (1997). First, we extend the results presented by Lanzi comparing the performance of XCS when it uses specify and when it employs the biased exploration strategy. The comparison is done in two new environments, Maze5 and Maze6, and then in Woods 14, the ribbon problem introduced by Cliff and Ross (1994). The results we present demonstrate that specify can adapt to all the three test environments while XCS with biased exploration may fail to converge to optimal solutions as the complexity of the environment increases. Although these results are interesting, they simply report experimental evidence and do not explain XCS's behavior which is our major goal. In order to explain XCS's behavior, we analyze the assumptions which underlie generalization in XCS and Wilson's generalization hypothesis (Wilson, 1995). We study XCS's generalization mechanism in depth and formulate a specific hypothesis. We verify our hypothesis by introducing a meta-exploration strategy, teletransportation, which we use as a validation tool.

We end the paper discussing another important aspect of generalization within XCS-the capability of XCS to evolve a maximally compact representation of the learned task. We show that, in particularly difficult environments, where few generalizations are admissible, XCS evolves generalizations right up to the limit of the instances actually offered by the environment.

The remainder of this paper is organized as follows: Section 2 gives a brief overview of the current version of XCS, and Section 3 presents the design of the experiments we employed in this paper. XCS with specify, referred to as XCSS, and XCS with biased exploration are compared in Section 4 using Maze5 and Maze6. In Section 5, the same comparison is done in the Woods14 environment. The results described in the previous sections are discussed in Section 6 where we formulate a hypothesis in order to explain why XCS may fail to converge to an optimal solution and discuss the implications introduced by our hypothesis. We verify our hypothesis in Section 7 by introducing teletransportation. We suggest how the ideas underlying teletransportation might be implemented in real-world applications in Section 8. Section 9 addresses the conditions under which XCS evolves a compact representation of a learned task, and Section 10 summarizes the results.

2 Description of XCS

We now overview XCS according to its most recent version (Wilson, 1997a). We refer the interested reader to Wilson (1995) for the original XCS description or to Kovacs's report (Kovacs, 1996) for a more detailed discussion for implementors.

Classifiers in XCS have three main parameters: (1) the prediction p, which estimates the payoff that the system expects if the classifier is used; (2) the prediction error c, which estimates the error of the prediction p; and (3) the fitness F, which evaluates the accuracy of the payoff prediction given by p and thus is a function of the prediction error Epsilon.

At each time step, the system input is used to build the match set [M] containing the classifiers in the population whose condition part matches the sensory configuration. If the match set is empty a new classifier which matches the input is created through covering. For each possible action a[sub i] in the match set, the system prediction P(a[sub i]) is computed as the fitness weighted average of the classifier predictions that advocate the action a[sub i] in [M]. P(a[sub i]) gives an evaluation of the expected payoff if action a[sub i] is performed. Action selection can be deterministic, the action with the highest system prediction is chosen, or probabilistic, the action is chosen with a certain probability among the actions with a non-null prediction.

The classifiers in [M], which propose the selected action, form the current action set [A]. The selected action is then performed in the environment and a scalar reward r is returned to the system together with a new input configuration.

The reward r is used to update the parameters of the classifiers in the action set corresponding to the previous time step [A][sub -1]. Classifier parameters are updated as follows. First, the Q-learning-like payoff P is computed as the sum of the reward received at the previous time step and the maximum system prediction, discounted by a factor Gamma (0 </= Gamma < 1). P is used to update the prediction p by the Widrow-Hoff delta rule (Widrow and Hoff, 1960) with learning rate Beta (0 </= Beta </= 1): p[sub j] arrow left p[sub j] + Beta (P - p[sub j]). Likewise, the prediction error Epsilon is adjusted with the formula: Epsilon arrow left Epsilon[sub j] + Beta(|P - p| Epsilon). The fitness update is slightly more complex. Initially, the prediction error is used to evaluate the classification accuracy Kappa of each classifier as Kappa = exp(ln Alpha(Epsilon - Epsilon[sub 0])/Epsilon[sub 0]) if Epsilon > Epsilon[sub 0] or Kappa = 1 otherwise. Subsequently the relative accuracy Kappa' of the classifier is computed from Kappa as Kappa' = Kappa/Sigma[sub [A][sub -1]] Kappa. Finally, the fitness parameter is adjusted by the rule F arrow left F + Beta(Kappa' - F).

The genetic algorithm in XCS is applied to the action set. It selects two classifiers with probability proportional to their fitnesses and copies them. It performs crossover on the copies using probability Chi while using probability Mu to mutate each allele.

Macroclassifiers. Introduced by Wilson (1995), an important innovation with XCS is the definition of macroclassifiers. These are classifiers that represent a set of classifiers with the same condition and the same action by means of a new parameter called numerosity. Whenever a new classifier has to be inserted in the population, it is compared to existing ones to check whether there already exists a classifier with the same condition-action pair. If such a classifier exists then the new classifier is not inserted in the population. Instead, the numerosity parameter of the existing (macro) classifier is incremented. If there is no classifier in the population with the same condition-action pair then the new classifier is inserted in the population.

Macroclassifiers are, essentially, a programming technique that speeds up learning by reducing the number of classifiers XCS has to process. Wilson shows that use of macroclassifiers substantially reduces the population for normal mutation rates, especially if the environment offers significant generalizations. In addition, he shows that the number of macroclassifiers is a useful statistic for measuring the level of generalization of the solution by the system.

Subsumption Deletion and Specify. Since XCS was introduced, two genetic operators have been proposed as extensions to the original system: subsumption deletion (Wilson, 1997a) and specify (Lanzi, 1997b).

Subsumption deletion was introduced to improve the generalization capability of XCS. Subsumption deletion acts when classifiers created by the genetic algorithm are inserted in the population. Offspring classifiers created by the GA are replaced with clones of their parents if: (1) they are specializations of the two parents, i.e., they are subsumed by their parents, (2) their parents are accurate, and (3) the parameters of their parents have been updated sufficiently. If all these conditions are satisfied the offspring classifiers are discarded and copies of their parents are inserted in the population; otherwise, the offspring are inserted in the population.

The idea of subsumption deletion is that, since the goal of XCS is to evolve an accurate, maximally general representation, it is useless to specialize classifiers that are already accurate. Accordingly, with subsumption deletion, accurate classifiers can produce only more general offspring.

Specify was introduced to assist the generalization mechanism of XCS in eliminating overly general classifiers. Specify acts when a significant number of overly general classifiers are in the action set. This condition is detected by comparing the average prediction error of classifiers in the action set Epsilon [A] with the average prediction error of classifiers in the population Epsilon [P]. If Epsilon [A] is twice Epsilon[P] and the classifiers in [A] have been updated, on average at least N[sub sp] times, then a classifier is randomly selected from [A] with probability proportional to its prediction error. The selected classifier is used to generate one offspring classifier in which each # symbol is replaced, with a probability of P[sub Sp], with the corresponding digit in the system input. The resulting classifier is then inserted in the population and another is deleted if necessary.

3 Design of Experiments

The experiments presented in this paper were conducted in the woods series of environments. These are grid worlds in which each cell can contain a tree (a T symbol), food (an F symbol), or can be empty. An animat placed in the environment must learn to reach food cells. The animat senses the environment by eight sensors, one for each adjacent cell, and can move in to any of the adjacent cells. If the destination cell contains a tree, the move does not take place. If the destination cell is blank, the move does take place. Finally, if the cell contains food, the animat moves, eats the food, and receives a constant reward. Each sensor is represented by two bits: 10 indicates the presence of tree T; 11 indicates food F; 00 represents an empty cell. Classifier conditions are 16 bits long (2 bits x 8 cells), while the eight actions are represented with three bits.

Each experiment consists of a number of problems that the animat must solve. For each problem, the animat is randomly placed in a blank cell of the environment; then it moves under the control of the system until it enters a food cell, eats the food, and receives a constant reward. The food immediately re-grows and a new problem begins. We employed the following exploration/exploitation strategy (Wilson, 1995; Wilson, 1996): before a new problem begins, the animat decides with probability 0.5 whether it will solve the problem in exploration or exploitation.

We employed two different exploration strategies: random exploration and biased exploration. In random exploration, the system selects the action randomly among those in the match set. In biased exploration, the system decides with a probability P[sub s] whether to select an action randomly or to choose the action which predicts the highest payoff (a typical value for P[sub s] is 0.5). In exploitation, the animat always selects the action which predicts the highest payoff and the GA does not act. In order to evaluate the final solutions evolved, exploration is turned off in each experiment during the last 1000 problems and the system works in exploitation only. The performance of XCS is computed as the average number of steps to food in the last 50 exploitation problems. Every statistic presented in this paper is averaged over ten experiments.

4 XCS in Maze5 and Maze6

The first results reported in the literature for XCS by Wilson (1995) are limited to two regular and aperiodic environments, Woods1 and Woods2, in which the optimal solution requires only a few steps to reach a food position. It can be described by a small number of very general classifiers and, roughly speaking, we say that these environments permit many generalizations. These initial experiments were extended by Lanzi (1997) to a more challenging environment, Maze4, in which the optimal solution requires longer sequences of actions to reach the goal, and the environment permits only a few generalizations. The author observed that in difficult sequential problems the system performance can fail dramatically. It was argued that this happens because in particularly difficult situations, characterized by long sequences of actions and only a few admissible generalizations, the generalization mechanism of XCS can be too slow to eliminate overly general classifiers before they proliferate in the population causing a significant decrease in the system performance (briefly, we say that overly general classifiers corrupt the population)(Lanzi, 1997b). The specify operator was thus introduced in order to help XCS recover from overly general classifiers.

Wilson (1997) suggested that another important factor underlying what was observed in Lanzi (1997) is the amount of random exploration the agent performs. Accordingly, he proposed a different solution in which the amount of random exploration that the agent performs is reduced by replacing random exploration, employed in the first work on XCS with biased exploration. Wilson (1997) also suggested that the behavior discussed in Lanzi (1997) may occur when no classifier in the action set is very accurate. When this occurs, the classifier fitness calculation, which estimates the classifier accuracy with respect to the action set, will give them all substantial fitnesses producing inappropriate results. Specify detects such conditions because it is activated by the error parameter and not by the accuracy. Thus it is able to recover from this type of situation by eliminating the source of inaccuracy in the action set.

We now extend previous results presented in the literature by comparing the two solutions in two new Markovian (i.e., all the states are distinguishable) environments: Maze5 and Maze6 (Figure 1 (a) and Figure 1 (b)). We compare four algorithms for each environment: (i) XCS according to the original definition, that is, without subsumption deletion; (ii) XCS without don't care symbols (#s are not introduced in the initial population, covering nor during mutation); (iii) XCS with specify, referred to here as XCSS; (iv) XCS with biased exploration.

Notice that the performances of algorithms (i) and (ii) are two important references. The former indicates what the original system can do when the generalization mechanism is in operation; while the performance of algorithm (ii) defines the potential capabilities of XCS without generalization operating. Before proceeding, we wish to point out that the results presented are not intended to indicate which strategy is best for solving the proposed problems. Our aim is to analyze more general phenomena which can be easily studied in simple environments but can be difficult to examine in more complex environments, where other settings may not work.

4.1 The Maze5 Environment

We apply the four algorithms to Maze5 using a population of 1600 classifiers.(n1) Results for the four algorithms are shown in Figure 2. Curves are averaged over ten runs. As Figure 2 shows, XCS evolves a solution for Maze5 that is not optimal (algorithm (i)). Conversely, when generalization does not act, i.e., no #s are used, the system easily reaches the optimum (algorithm (ii)).

When a mechanism to help XCS recover from overly general classifiers is added to XCS, we observe an improvement: both algorithms (iii) and (iv) converge to high performance. Specifically, XCS with biased exploration (algorithm (iv)) slowly converges to a near optimal policy; however, XCSS (algorithm (iii)) rapidly converges to a fully optimal solution that is also stable. The analysis of single runs shows that sometimes XCS with biased exploration fails to converge to a stable solution, while XCSS always reaches the optimum in a stable way. This phenomenon is more evident in the experiments with XCS (algorithm (i)) where in the majority of the cases the system does not reach a stable solution.

Lanzi (1997) observed that XCSS is stable with respect to the population size. To verify this, we applied XCS with biased exploration and XCSS to Maze5 using only 800 classifiers. The results described in Figure 3 show that, even with a small population size, XCSS still converges to a near optimal solution and remains stable. On the contrary, XCS's performance significantly decreases. The analysis of single runs exhibits an increase in the number of experiments in which XCS with biased exploration cannot reach a stable solution leading to a reduction in the overall performance.

4.2 The Maze6 Environment

Maze6 is based on Maze5 but includes a set of obstacles covering a small number of free cells. The two environments are topologically similar, however, the following experiments show that Maze6 is much more difficult for XCS to solve.

In this second experiment, we applied the same four versions of XCS to Maze6. The results described in Figure 4 confirm the results for Maze5. XCS does not converge to an optimal solution when generalization is required while when no # symbols are employed the system easily reaches an optimal performance. Furthermore, there is almost no difference between the performance of XCS with random exploration (i) and XCS with biased exploration (iv). Again, XCS with specify converges to a stable optimum (see Figure 2).

In comparing the performance of XCS in these two environments it is worth noting that, although the two environments are very similar, the performance of XCS in Maze6 is at least five times worse than in Maze5.

These results suggest that when the environment becomes more complex, biased exploration may not guarantee the convergence to a stable solution. Conversely, XCSS evolves a stable near optimal solution for Maze6 even if the population size is reduced to 800 classifiers (see Figure 5).

4.3 The Specify Operator and Biased Exploration

The results presented in this section support the findings previously presented in Lanzi (1997). Specify successfully helps the system recover from situations in which overly general classifiers may corrupt the population before the generalization mechanism of XCS eliminates them. Although biased exploration is adequate in simple environments, such as Maze5, it may become infeasible in more complex environments.

In our opinion, this happens because biased exploration is a global solution to the behavior we discussed, while specify is a local solution. Lanzi (1997) observed that XCS acts in environmental niches and suggested that these should be considered a fundamental element for operators in XCS. Specify follows this principle and directly corrects potentially dangerous situations in the niches where they are detected. Biased exploration on the other hand acts on the whole population and must take into account the structure of the entire environment.

5 XCS in Woods14

Cliff and Ross (1994) presented experimental results for ZCS (Wilson, 1994), the system from which XCS was derived. They show that the failure in learning an optimal policy depends on the length of the sequence of actions required to reach food: the longer the sequence is, the more difficult the environment.

Our experiments in Maze5 and Maze6 might seem to confirm the results presented for ZCS. XCS in fact performs better in Maze5, which requires an average of 4.6 steps to reach food, than in Maze6, where the animat takes an average of 5.05 steps to reach food. However, the minor difference between the average number of steps in the two environments seems too small to justify the significant difference in system performance.

We now extend the results presented in the previous section by analyzing the performance of XCS in an environment requiring a long sequence of actions to reach the goal state. For this purpose, we apply three different versions of XCS in the Woods 14 environment. Woods 14 (Figure 6) is a simple environment, which consists of a linear path of 18 blank cells to a food cell, and has an expected optimal path to food of nine steps.

Initially, we applied XCS with biased exploration and XCS without generalization to Woods14 with a population of 2000 classifiers. General parameters are set as in the previous experiment except for the discount factor ? which is set to 0.9. The performance of XCS with biased exploration in Woods14 is shown in Figure 7. The performance of XCS when the generalization mechanism does not act, is shown in Figure 8. Curves are averaged over ten runs.(n2)

These results show that, even if biased exploration is introduced, XCS does not converge to an optimum in the Woods14 environment. However, when # symbols are not used, XCS easily reaches the optimum. The former result may indicate that the problems encountered with XCS depend on the length of the expected optimal path to food. The latter results shown in Figure 8 also suggest that XCS can solve problems which involve long sequences of actions. This result is extremely important; it shows that XCS is a better model of a classifier system than ZCS, because it is able to build long chains of actions, a task in which ZCS fails (Cliff and Ross, 1994).

In the second experiment, we apply XCSS to Woods14 with 2000 classifiers.(n3) Figure 9 reports the performance of XCSS in Woods 14; the curve, averaged over ten runs, shows that XCSS can evolve an optimal solution for Woods 14.

Although these results are interesting, they do not explain the causes which underlie the observed behavior. We need to study the generalization mechanism of XCS and Wilson's generalization hypothesis in order to understand XCS's behavior. This is the subject of the next section where we discuss the generalization capabilities of XCS and formulate a hypothesis to explain our results.

6 Generalization with XCS in Animat Problems

6.1 The Generalization Mechanism of XCS

The experimental results discussed in the previous two sections demonstrate that some grid worlds are more difficult for XCS to navigate than others. For example, in the Woods2 environment (see Wilson (1997a)) XCS easily produces optimal solutions; in others, such as Maze5, Maze6 and Woods 14, XCS may require special exploration policies and/or special operators.

Here we analyze the generalization mechanism of XCS in order to understand which factors may influence the performance of the system. We start by reconsidering Wilson's generalization hypothesis, which explains the fundamental principles of generalization in XCS as follows:

"Consider two classifiers C1 and C2 having the same action, where C2's condition is a generalization of C1's. That is, C2's condition can be generated by C1's by changing one or more of C1's specified (1 or 0) alleles to don't cares (#). Suppose C1 and C2 have the same epsilon, and are thus equally accurate.

Every time C1 and C2 occur in the same action set, their fitness values will be updated by the same amount. However, because C2 is a generalization of C1 it will tend to occur in more match sets than C1, and thus probably (depending on the action-selection regime) in more action sets. Because the GA occurs in action sets, C2 will have more reproductive opportunities and thus its number of exemplars will tend to grow with respect to C1's [...]. Consequently, when C1 and C2 next meet in the same action set, a larger fraction of the constant fitness update would be "steered" toward exemplars of C2, resulting via the GA in yet more exemplars of C2 relative to C 1. Eventually, it was hypothesized, C2 would displace C1 from the population." (Wilson, 1995)

Wilson's hypothesis explains how XCS develops a tendency to evolve maximally general classifiers. But what happens when an overly general classifier appears in the population?

Overgeneral classifiers are such that, due to the presence of some don't care symbols, they match different niches with different rewards and thus will become inaccurate. Since the GA in XCS bases fitness upon classifier accuracy, overly general classifiers tend to reproduce less and will eventually be deleted.

In Section 6.2, we will analyze the generalization mechanism in detail, to show why it may sometimes work incorrectly.

6.2 Are Overgeneral Classifiers Inaccurate?

The generalization mechanism of XCS is sound so it is not clear why it may fail in certain environments. Lanzi (1997) observes that generalization in XCS is achieved through evolution; therefore, there may be cases in which the generalization mechanism can be too slow to delete overly general classifiers, and these have enough time to proliferate in the population.

We believe that Wilson's generalization hypothesis is correct; accordingly, we argue that XCS fails in learning a certain task when some terms of the hypothesis do not hold. First, we observe that:

For overly general classifiers to be "deleted", i.e., reproduce less and then be deleted, they must be observed by the system to be inaccurate. However, this happens only if overly general classifiers are applied in distinct environmental niches.

We argue that in XCS it is not always true that an overly general classifier will become inaccurate; in fact, due to the parameter update, a classifier becomes inaccurate only when it is applied to situations which have different payoff levels. However, this only happens when the classifier is applied in different situations, i.e., environmental niches. There are applications in which, due to the structure of the environment and to the exploration policy, the animat does not visit all the niches with the same frequency, but rather it stays in a certain area of the environment for a while and then moves to another one. In such situations, Wilson's generalization hypothesis may fail because overly general classifiers which should be inaccurate may be evaluated as accurate.

Consider for example an overly general classifier that matches two niches belonging to two different areas of the environment. As long as the system stays in the area belonging to the first niche, its parameters will be updated accordingly to the payoff level of the first niche. As long as the animat does not visit the second niche, the classifier appears accurate even if it is globally overly general.(n4) The overly general classifier is thus selected for reproduction and the system allocates resources, i.e., copies, to it. When the animat moves to the other area of the environment belonging to the second niche, the classifier starts becoming inaccurate because the payoff level that it predicts is no longer correct. At this point, two things may happen. First, perhaps the classifier did not reproduce sufficiently in the first niche; therefore, the (macro) classifier is deleted because it has become inaccurate: the animat thus "forgets" what it learned in the previous area. Second, if the overly general classifier reproduced sufficiently when in the initial niche, the (macro) classifier survives enough to adjust its parameters in order to become accurate with respect to the current niche. Therefore, the overly general classifier continues to reproduce and mutate in the new niche, and can produce even more overly general offspring. This behavior can be summarized as follows:

XCS usually learns a global policy. However, if the environment is not or cannot be visited frequently, it tends to learn a local policy that can produce overly general classifiers, which by definition cause performance errors.

Note that the phenomenon we discuss does not concern the general problem of having incomplete information about the environment caused by a partial exploration. The environments we use are small enough that, after the first two hundred problems, the system has tried almost all the possible environmental niches. Instead, our statement deals with the capability of XCS in evolving a stable solution. Thus our hypothesis states that:

XCS fails to learn an optimal policy in environments where the system is not very likely to explore all the environmental niches frequently.

This hypothesis concerns the capability of the agent to explore all of the environment in a uniform way; therefore it is related to the environment structure and to the exploration strategy employed. Since the exploration strategies previously employed within XCS in animat problems select actions randomly, our hypothesis is directly related to the average random walk to food. The smaller it is, the more likely the animat will be able to visit all positions in the environment frequently. The larger the average random walk, the more likely the animat is to visit certain areas of the environment more frequently. Our hypothesis, therefore, can explain why in certain environments XCS with biased exploration performs better than XCS with random exploration. When using biased exploration, the animat performs a random action only with a certain probability, otherwise it employs the best action. Accordingly, the animat is not likely to spend much time in a certain area of the environment but, following the best policy it learned, it moves to another area. When the environmental niches are more separated, such as in Maze6 and Woods14, the animat is unable to visit all the niches as frequently as would be necessary in order to evolve an optimal policy.

6.3 Discussion

We proposed a hypothesis in order to characterize the situations in which XCS may not converge to an optimal policy. The hypothesis we formulated concerns the concept of environmental niche and suggests that XCS can fail to converge to a global optimum if the environmental niches are not explored frequently. We thus observe that the system should not explore one area of the environment for a long time; instead, it should frequently change environmental niche. Otherwise, XCS may start to learn locally, evolving classifiers which are correct with respect to a specific area but are inaccurate in some other area.

Notice that our hypothesis is not a matter of the environment or of XCS alone but depends upon the interaction between them. An environment in which the animat is likely to visit all the possible areas will be easily solved by XCS with the usual random exploration strategy.

We want to point out that, although the approach we followed to study the behavior of XCS regards a specific kind of environments, i.e., grid-worlds, the conclusions we draw appear to be general and therefore can be extended to other environments.

7 Verification of the Hypothesis

According to the hypothesis presented in the previous section, XCS can fail to converge to the optimum in those environments where the system is not likely to explore all the environmental niches frequently. If our hypothesis is correct, the phenomena we have discussed should not appear when XCS employs an exploration strategy guaranteeing frequent exploration of all the environmental niches.

In this section we validate our hypothesis empirically. We introduce a meta-exploration strategy, teletransportation, that we use as a theoretical tool to verify our argument. The strategy can be applied to any exploration strategy previously employed with XCS. Accordingly, we refer to it as a meta-exploration strategy rather than an exploration strategy.

Teletransportation works as follows: when in exploration, the animat is placed randomly in a blank cell of the environment; then it moves following one of the possible exploration strategies proposed in the literature, random or biased. If the animat reaches a food cell within a maximum number M[sub es] of steps, the exploration ends; otherwise, if the animat does not find food by M[sub es] steps, it is moved, i.e., teletransported, to another blank cell and the exploration phase is restarted. Teletransportation guarantees for small M[sub es] values that the animat visits all the possible niches with the same frequency; while for large M[sub es] this strategy becomes equivalent to the exploration strategy employed without teletransportation, e.g., random or biased.

We apply XCS with teletransportation (XCST) to the environments previously discussed (Maze5, Maze6 and Woods14) using the same parameters settings employed in the original experiments. Figure 10 compares the performance of XCST and XCS with biased exploration in Maze5, when a population of 1600 classifiers is employed and the M[sub es] parameter is set to 20 steps. Results show that, in Maze5 XCST converges to the optimum. As Figure 11 shows, XCST's performance is stable near the optimum even when only 800 classifiers are employed in the population. We have similar results when XCST is applied to Maze6 (see Figure 12). The comparison of the performance for XCST and XCS shows that XCST converges to an optimal solution while XCS with biased exploration, for the same parameter settings, cannot reach the optimum.

Figure 13 compares a typical performance of XCS with biased exploration with a typical performance of XCST when both systems are applied to Woods14. The immediate impression is that XCST's performance is not very stable and is only near optimal. However, to fully understand Figure 13, we have to analyze how XCST learns. When in exploration, XCST continuously moves in the environment in order to visit all the niches frequently. Accordingly, the animat does not learn the optimal policy in the usual way, by "trajectories", i.e., starting in a position and exploring until a goal state is reached.

XCST's policy instead emerges from a set of experiences of a limited number of steps the animat has collected while it was learning in the environment. The system immediately learns an optimal policy for the the positions near the food cells, then it extends this policy during subsequent explorations in the other areas of the environment. We can think of the artificial animal, the animat, as a natural animal that first secures a good path to food and then extends its knowledge to other areas of the environment. In Maze6, the policy is extended very rapidly because the positions of the environment are near to the food position. In Woods14, the analysis of single runs shows that XCST almost immediately learns an optimal policy for the first eight positions; then the policy also converges for the subsequent eight positions. At the end, the performance is near optimal because for the last two positions of Woods14, the most difficult ones, the optimal policy is not completely determined.

The experiments with XCST in Woods14 highlight a limitation of teletransportation as an exploration strategy: since the environment is explored uniformly, the positions for which it is difficult to evolve an optimal solution requiring more experience converge slowly toward an optimal performance.

8 Exploration, Generalization, Models and Animats

Teletransportation is the heuristic we used to validate our hypothesis concerning generalization in XCS. From this perspective teletransportation should be considered a theoretical tool used in our experiments to support our hypothesis. Unfortunately, teletransportation cannot be applied to general problems, such as physical autonomous agents, because it would require the presence of a trainer that, every M[sub es] steps, picks up the agent and takes it to another area of the environment. We can, however, develop a technique from the teletransportation idea, feasible for general problems, through which a wider exploration of the environment can be guaranteed.

8.1 Related Work

As we pointed out previously, XCS usually learns a global policy, but it may tend to evolve local policies in those environments where the agent is not able to visit all the areas with the same frequency. This problem is not novel in the area of reinforcement learning. Many reinforcement learning algorithms, in order to converge to the optimum, require that the environment is visited uniformly. For example, when neural networks are employed, all the areas of the environment have to be explored with the same frequency, otherwise the neural network may overfit locally.

Solutions to this kind of problem for reinforcement algorithms have already been proposed. Sutton (1990) introduced the Dyna architecture which integrates the learning algorithm with a model of the environment that is built up by experience. The model is then employed to simulate exploration in other areas of the environment or for planning. Another solution is the one proposed by Lin (1993) where the idea of experience replay is introduced: past experienced trajectories to goal states are memorized and subsequently used to recall past experiences in order to avoid local overfitting.

8.2 Dyna Architecture for XCS

Teletransportation may be implemented for real problems by integrating XCS with a model of the environment built during exploration. The model can be subsequently employed as in Sutton (1990) to simulate exploration in other areas of the environment, while the agent explores one specific environmental niche. The model may also be used for planning. The simplest way to develop a model of the environment in a discrete state/action space, like grid-worlds, is to memorize the past experience as quadruples of the form (s, a, s', r), where: s is current sensory input; a is the action the agent selected when it perceived s; s' is the sensory input returned when the agent perceiving s has performed a; finally, r is the immediate reward the agent received for performing a when in s. This type of model, similar to Riolo's (1991) work on latent learning, is easily integrated into XCS. The overall system, which we call Dyna-XCS, works as follows.

When in exploration, the animat is placed randomly in a blank cell of the environment and then it moves under the control of XCS using one of the usual exploration strategies, i.e., random or biased. If the animat reaches a food cell by M[sub es] steps, the exploration ends. Otherwise, if the animat does not find food "in time" (in M[sub es] steps) the system stops exploring the environment and starts using the model of the environment in order to simulate an exploration experiment on the model. Accordingly, the current sensor configuration is memorized, and a new exploration starts in the model. Exploration within the model is very similar to the exploration the agent performs in the environment. First, the initial position is determined randomly among the states which appear in the first position of the quadruples that have been experienced. Then exploration continues on the model until S[sub es] steps have been performed, or the animat has reached a food cell in the model. At this point, the animat ends the simulated exploration in the model and restarts the exploration in the environment at the same position in which the exploration in the environment stopped.

8.3 Discussion

The Dyna-XCS system we implemented is still under experimentation. However, the initial results show that there is almost no difference in performance between XCST and Dyna-XCS using the three environments employed in this paper. We expected this result. As observed previously, these environments are quite small. Therefore, after the system has solved a few hundred problems, it has also tried almost every condition/action pair and has an almost a complete description of the environment. Accordingly, the exploration with the model becomes almost identical to the exploration in the environment.

These initial results highlight the major problem with the implementation based on experience quadruples: the memory required to store the model by quadruples dramatically grows as the exploration in the environment proceeds because a complete description of the environment is likely to be produced. Therefore, this solution is only feasible in small environments. More complex environments instead require algorithms to produce a compact representation of the model.

A possible solution may consist of introducing a hybrid architecture in which XCS is used for learning, and a different type of algorithm used for building the environment model, for instance, a neural network. However, this type of solution would introduce elements which are not related to the philosophy underlying XCS. A more elegant solution can be suggested.

XCS is a learning algorithm which may be used for learning the environment model. We propose a solution in which the XCS system that has to learn to reach food in the environment is coupled with an XCS system that is employed to learn the environment model. The second system should have classifiers whose: (1) conditions represent a state/action pair (8, a); (2) actions represent the prediction of the next sensory state (s') and the immediate reward (r) XCS expects to gain when getting to s'. This version of XCS learns a predictive model of the environment, an extension already proposed by Wilson (1995) in the original XCS paper. Recently, Stolzmann (1997) introduced an Anticipatory Classifier System that is designed to learn an environmental model.

9 Evolving a Compact Representation

Previously, we discussed how the generalization mechanism of XCS and the structure of the environment influence system performance. Another important aspect of generalization in XCS concerns the capability of XCS to evolve a compact representation of the learned task.

9.1 Generalization and Task Representation in XCS

Results reported in the literature show that XCS can evolve near minimal populations of accurate and maximally general classifiers (Wilson, 1997a). Recently, Kovacs (1997) proposed an optimality hypothesis which states that XCS tends to evolve the minimal population with respect to the Boolean multiplexer function.

With respect to animat problems, we now discuss how XCS develops a tendency to evolve near minimal populations. We show how, in certain environments, XCS may fail to evolve a compact representation and may produce redundant representations of certain tasks.

Consider again Woods14 (Figure 6). Every position in Woods14 is uniquely determined by the position of the two adjacent free cells. Therefore, in each classifier condition only two bits are sufficient to characterize a specific environmental niche. Since classifiers in Woods14 are 16 bits long, for each niche there are 2[sup 14] possible classifiers belonging to that niche only. According to Wilson's hypothesis, general classifiers should reproduce more than specific ones since general classifiers appear in more match sets. Unfortunately, the last statement is not always true.

For example, consider the two conditions 1010001010001010 and ####0#####0#####. Although the second condition has many more don't care symbols, both conditions match only the third free position in Wood14. We can say that the latter condition is formally more general than the former condition because it has more # symbols. However, the latter condition is not concretely more general than the former because it matches the same number of niches. Note that in XCS the pressure toward more general classifiers is effective only if the generality of the classifiers is concretely exploited in the environment (i.e., general classifiers match more niches). Accordingly, in environments that offer few chances of building concrete generalizations, like Woods14, the pressure toward concretely more general classifiers is lost because Wilson's generalization hypothesis does not apply.(n5) In such situations XCS rapidly evolves a set of classifiers that exploit the maximum generalization offered by the environmental states. Then, by recombination and mutation of these classifiers, the system can start producing classifiers that are formally more general (contain more # symbols) but that, in practice, do not match more niches (they are not concretely more general). As a consequence the representation of the task can become redundant.

As an example, we apply XCST to Maze6 with a population of 1600 classifiers. Figure 14 reports the number of macroclassifiers in the population, and the curve is averaged over ten runs. Notice that the number of macroclassifiers grows immediately and then reaches an equilibrium value which depends on the genetic pressure. The analysis of final populations shows that only a few of the macroclassifiers represent more than one microclassifier.

9.2 The Role of Subsumption Deletion

We now address how the representation of the task that the agent is learning can be compacted in environments, such as Maze6, where a pressure toward more general classifiers cannot be developed.

Subsumption deletion was introduced by Wilson (1997) to improve generalization with XCS. However, early experiments with Maze5, Maze6 and Woods 14 (not reported here) show that its introduction may decrease system performance and, in some cases, prevent the system from converging to a stable policy.

Next, we analyze an important aspect of subsumption deletion in order to explain why it may compromise XCS's performance and how the observed behavior is related to specify and teletransportation.

Subsumption deletion acts when new classifiers created by the GA must be inserted in the population and replaces offspring classifiers with clones of their parents if: (1) the offspring classifiers are more specific than their parents, and (2) the parameters of their parents have been updated sufficiently. As has been observed (Lanzi, 1997a), XCS with subsumption deletion evolves formal generalizations: #s are not inserted in the classifiers because they are necessary in order to match more niches; instead, the system converges to a population in which classifiers contain as many # symbols as possible without becoming inaccurate. XCS with subsumption deletion tends to produce classifiers which apply to many more conditions than those the agent experienced. They are also likely to be inaccurate if the environment is extended -for example, if a new area of the environment is discovered.

In such cases the specify operator can be useful in recovering from classifiers that are overly general in the new area of the environment. In fact, all the classifiers that are overly general in the new area will become inaccurate and specify will be activated.

This aspect of subsumption deletion is strictly related to the accuracy parameter and therefore to teletransportation. In fact, subsumption deletion relies upon the accuracy parameter and thus it may corrupt the population when overly general classifiers are evaluated as accurate. We developed a series of experiments in which XCS with biased exploration was applied to the environments previously presented. Results (not reported here) show that the performance of the system is highly decreased when subsumption deletion is used.

After we introduced teletransportation, we repeated the set of experiments to test whether the decrease of performance when subsumption was used depended on the presence of overly general classifiers that were evaluated as accurate.

As Figure 15 shows, XCST's performance is still optimal when subsumption deletion is employed. The comparison of the population size in macroclassifiers for the two systems in Figure 16 shows that it compacts the representation producing a smaller population. These results suggest that the decrease in XCS's performance when XCS employs subsumption deletion is related to the problem of local learning that was introduced in the previous sections.

9.3 Discussion

It is worth noting that the aspects of subsumption deletion we've just discussed are closely related to the phenomena discussed in the first part of the paper. Discovering a new area of the environment in which the evolved classifiers are overly general is similar to the case in which an animat learns locally because it cannot explore all the areas of the environment frequently. Therefore, we might observe that specify and teletransportation are complementary. However, the two problems are quite distinct in that we may have an environment where every area can be visited frequently in which, suddenly, a door opens and the animat faces an unexplored area. Most important, teletransportation acts through evolution and therefore it is slower than specify, which is a heuristic and thus acts much faster than genetic evolution.

10 Summary

We presented a study of the generalization mechanism of XCS to explain some of the previously reported results. We analyzed Wilson's generalization hypothesis which explains how generalization in XCS works. Then, we stated a hypothesis which suggests that XCS may not converge to the optimum when, due to the structure of the environment and to the exploration strategy, the system is not able to visit all the areas of the environment frequently. We verified our hypothesis by introducing a meta-exploration strategy, teletransportation, which was used as a theoretical tool during the validation phase. Subsequently, we suggested how the ideas underlying teletransportation might be implemented in a real application integrating XCS with a model of the environment in a Dyna architecture. A possible implementation of such an architecture was then discussed.

Finally, we analyzed the conditions under which XCS may fail to evolve a compact representation of the learned task. We showed this is likely to happen in environments where there is no direct relation between the number of don't care symbols a classifier condition has and the number of environmental configurations the condition matches. In such cases, there is no pressure for further generalization beyond what may be permitted by available states. The system rapidly reaches the maximum generalization that the environmental states permit. Then it can start evolving a redundant representation of the learned task. Accordingly, we showed how subsumption deletion can be effective in evolving a more compact solution.


I wish to thank all the people who helped me during the course of this work. Stewart W. Wilson, who was always available for discussing the generalization issue and for reviewing the early versions of this paper. Marco Dorigo, for the many discussions on learning classifier systems when I was in Brussels. Marco Colombetti, who supports my work and is always available for discussions. Trevor Collins, for his invaluable effort in trying to improve my English. Finally, Gabriella, for her never ending patience.

This research was partially funded by a grant to Marco Colombetti for the year 1996 from the Fondo di Ricerca d'Ateneo (University Research Fund) of the Politecnico di Milano.

(n1) General parameters for the four algorithms are set as follows: Beta=0.2, Gamma=0.71, Theta=25, Epsilon[sub 0]=0.01, Alpha=0.1, Chi=0.8, Mu=0.01, Delta=0.1, Phi=0.5, P[sub #]=0.3, P[sub I]=10.0, Epsilon[sub I]=0.0, F[sub I]=10.0.

Specific parameters are set as follows: for algorithm (ii), P[sub #]=0.0 and mutation does not insert any # symbol; in (iii) specify parameters are set as N[sub Sp]=20 and P[sub Sp]=0.5; finally, biased exploration chooses an action randomly with a probability P[sub s]=0.3. We refer the interested reader to Wilson (1995) for a detailed description of each of the parameters.

(n2) Due to the significant difference of the results it is not possible to use the same scale on both plots.

(n3) XCS's parameters are set as in the previous experiment, except for the specify probability parameter, P[sub Sp], which is set to 0.8.

(n4) This happens because the classifier parameters are always updated according to the payoff level of the first niche, but never according to the payoff level of the second niche.

(n5) This phenomenon was already noticed by Wilson (1995) where, discussing the generalization produced by XCS in Woods2, Wilson observed that XCS produced classifiers which matched the same niches but contained different numbers of # symbols.


Source: Mathematical Intelligencer, Summer99, Vol. 21 Issue 3, p6, 6p, 7 diagrams, 1 graph, 2bw 

Author(s): Bertram, Edward; Horak, Peter

Many mathematicians are now generally aware of the significance of graph theory as it is applied to other areas of science and even to societal problems. These areas include organic chemistry, solid state physics and statistical mechanics, electrical engineering (communications networks and coding theory), computer science (algorithms and computation), optimization theory, and operations research. The wide scope of these and other applications has been well documented (e.g., [4, 11]).

However, not everyone realizes that the powerful combinatorial methods found in graph theory have also been used to prove significant and well-known results in a variety of areas of pure mathematics. Perhaps the best known of these methods are related to a part of graph theory called matching theory. For example, results from this area can be used to prove Dilworth's chain decomposition theorem for finite partially ordered sets. A well-known application of matching in group theory shows that there is a common set of left and right coset representatives of a subgroup in a finite group. Also, the existence of matchings in certain infinite bipartite graphs played an important role in Laczkovich's affirmative answer to Tarski's 1925 problem of whether a circle is piecewise congruent to a square. Other applications of graph theory to pure mathematics may be found scattered throughout the literature.

Recently, a collection of examples [10] showing the application of a variety of combinatorial ideas to other areas has appeared. There, for example, matching theory is applied to give a very simple constructive proof of the existence of Haar measure on compact topological groups, but the other combinatorial applications do not focus on graph theory. The graph-theoretic applications presented here do not overlap with those in [10], and no attempt has been made at a survey. Rather, we present five examples, from set theory, number theory, algebra, and analysis, whose statements are well known or are easily understood by mathematicians who are not experts in the area.

Additional criteria for choosing these five examples were that the statement can be formulated using few definitions and that the proof can be explained in a relatively short space, without too much technical detail. The proof should exhibit the strength and elegance of graph-theoretic methods, although, in some cases, one must consult the literature in order to complete the proof.


For the convenience of our readers, we recall the necessary definitions from graph theory.

An (undirected) graph G = (V, E) is a pair in which V is a set, the vertices of G, and E is a set of 2-element subsets of V, the edges of G. An edge e is an element of E is denoted by e = xy, x and y being the end vertices of e. Here, e is incident with x (and e is incident with y). The degree of a vertex v, deg(v), is the number of edges incident with v. In a directed graph, or simply digraph, G = (V, E), the (directed) edges are ordered pairs of vertices of V and are denoted by e = (x, y).

A trail of length n in a graph G (digraph G) is a sequence of vertices x[sub 0], x[sub 1], x[sub 2], ..., x[sub n] (x[sub i] is an element of V), such that for i = 0, 1, ..., n - 1, x[sub i]x[sub i + 1] is an edge of G ((x[sub i]x[sub i + 1]) is an oriented edge of G). If x[sub 0] = x[sub n], then the trail is said to be closed. When all the vertices in the sequence are distinct, the trail is called a path. A closed trail, all of whose vertices are distinct except for x[sub 0] and x[sub n], is called a cycle.

A graph G is connected if any two vertices of G are joined by a path in G. Otherwise, G is said to be disconnected. The components of G are the maximal connected subgraphs of G. A tree is a connected graph without cycles. A graph G = (V, E) is said to be bipartite if V can be partitioned into two nonempty subsets A and B such that each edge of G has one end vertex in A and one end vertex in B. Then, G is also denoted by G = (A, B; E).

If (H, Center Dot) is a group and S a set of generators of H, not necessarily minimal, the Cayley graph G(H, S), of (H, Center Dot) with respect to S, has vertices x, y, ... is an element of H, and xy is an edge if and only if either x = y Center Dot a or y = x Center Dot a for some a is an element of S.

If G is any graph and e = xy an edge of G, then by a contraction along e, we mean the graph G' which arises from G by identifying the vertices x and y (see Fig. 1).

We say that a graph G[sub 1] is contractible onto a graph G[sub 2] if there is a sequence of contractions along edges which transforms G[sub 1] to G[sub 2].

The automorphism group of a graph G is the group of all permutations p of the vertices of G with the property that p(x)p(y) is an edge of G iff xy is an edge of G. A group H of permutations acting on a set V is called semiregular if for each x is an element of V, the stabilizer H[sub x] : = {h is an element of H | x[sup h] = x} consists of the identity only, where x[sup h] denotes the image of x under h. If H is transitive and semiregular, then it is regular.

Cantor-Schroder-Bernstein Theorem

Our first example is a graph-theoretical proof of the classical result of Schroder and Bernstein. Actually, the theorem was stated by Cantor, who did not give a proof. The theorem was proved independently by Schroder (1896) and Bernstein (1905). The idea behind the proof presented here can be found in [8].

Theorem (Cantor-Schroder-Bernstein): Let A and B be sets. If there is an injective mapping f: A right arrow B and an injective mapping g: B right arrow A, then there is a bijection from A onto B, that is, A and B have the same cardinality.

Proof. Without loss of generality, we may assume that A and B are disjoint. Define a bipartite graph G = (A, B; E), where xy is an element of E if and only if either f(x) = y or g(y) = x, x is an element of A, y is an element of B. By our hypothesis, 1 </= deg v </= 2 for each vertex v of G. Therefore, each component of G is either a one-way infinite path (i.e., a path of the form x[sub 0], X[sub 1], ..., x[sub n], ...), or a two-way infinite path (of the form ... X[sub -n], x[sub -n + 1], ..., x[sub -1], x[sub 0], X[sub 1], ..., X[sub n], ...), or a cycle of even length with more than two vertices, or an edge. Note that a finite path of length >/= 2 cannot be a component of G. Hence, there is in each component a set of edges such that each vertex in the component is incident with precisely one of these edges. Hence, in each component, the subset of vertices from A is of the same cardinality as the subset of vertices from B.

Fermat's (Little) Theorem

There are many proofs of Fermat's Little Theorem, even short algebraic or number-theoretic proofs. The first known proof of the theorem was given by Euler, in his letter of 6 March 1742 to Goldbach. The idea of the graphtheoretic one presented below can be found in [5] where this method, together with some number-theoretic results, was used to prove Euler's generalization to nonprime modulus.

Theorem (Fermat): Let p be a prime such that a is not divisible by p. Then, a[sup p] - a is divisible by p.

Proof Consider the graph G = (V, E), where V is the set of all sequences (a[sub 1], a[sub 2], ..., a[sub p]) of natural numbers between 1 and a (inclusive), with a[sub i] Is not equal to a[sub j] for some i Is not equal to j. Clearly, V has a[sup p] - a elements. For any u is an element of V, u = (u[sub 1], ..., u[sub p-1], u[sub p]), let us say that uv is an element of E just in case v = (u[sub p], u[sub 1], ..., u[sub p-1]). Clearly, each vertex of G is of degree 2, so each component of G is a cycle, of length p. But then, the number of components must be (a[sup p] - a)/p, so p|a[sup p] - a.

Nielson-Schreier Theorem

Let H be a group and S be a set of generators of H. Then, a product of generators and their inverses which equals (the identity) 1 is called a trivial relation among the generators in S if 1 can be obtained from that product by repeatedly replacing xx[sup -1] or x[sup -1]x by 1, otherwise such a product is called a nontrivial relation. A group H is free if H has a set of generators such that all relations among the generators are trivial. In [1] Babai proved the Nielson-Schreier Theorem on subgroups of free groups, as well as other results in diverse areas, from his "Contraction Lemma." The particular case of this lemma when G is a tree, and its use in proving the Nielson-Schreier Theorem, was also observed by Serre [12, Chap. 1, Sec. 3]. The proof of the Contraction Lemma below is somewhat technical, although it uses only the ideas from group theory and graph theory we have already recalled, and is omitted here.

Contraction Lemma. Let H be a semiregular subgroup of the automorphism group ora connected graph G. Then, G is contractible onto some Cayley graph of H.

If H is a group and h is an element of H, consider the permutation h[sub R] of H obtained by multiplying all the elements of H on the right by h. The collection H[sub R] = {h[sub R]: h is an element of H} is a regular group of permutations (under composition) and is called the (right) regular permutation representation of H.

It is known [1] that G is a Cayley graph of the group H if and only if G is connected and H[sub R] is a subgroup of the automorphism group of G.

Corollary. If J is a subgroup of a group H, then any G(H, S) is contractible onto G(J, T) for some set T of generators of J.

Proof H[sub R], the regular representation of H, acts naturally as a regular permutation group on G(H, S), which is connected. Thus, the subgroup of H[sub R] corresponding to the elements of J is a semiregular subgroup of the automorphism group of G(H, S). Now apply the Contraction Lemma.

Theorem (Nielson-Schreier): Any subgroup of a free qroup is free.

Proof. We first show that in any group H and for any set S of generators of H, the Cayley graph G(H, S) contains a cycle of length > 2 if and only if there is a nontrivial relation among the generators in S. To show this, suppose x[sub 0], x[sub 1], ..., x[sub n] = x[sub 0] is a cycle of G(H, S). Then, there are a[sub i] is an element of S, 1 </= i </= n, such that x[subi -1]a[sup epsilon][sub i] = x[sub i], where epsilon[sub i] is an element of {1, -1}. Hence, x[sub n] = x[sub n-1]a[sup epsilon][sub n] = x[sub n-2]a[sup epsilon][sub n-1] a[sup epsilon][sub n] = ... = x[sub 0]a[sup epsilon][sub 1]a[sup epsilon][sub 2] ... a [sup epsilon][sub n], i.e., the identity 1 = a[sup epsilon][sub 1] a[sup epsilon][sub 2] ... a[sup epsilon][sub n]. If this were a trivial relation, then there would exist an integer i, 1 </= i </= n, such that a[sub i] = a[sub i-1] and epsilon[sub i] = -epsilon[sub i + 1]. However, this implies that x[sub i-1] = x[sub i + 1], a contradiction. Similarly, if a[sup epsilon][sub 1] ... a[sup epsilon][sub n] = 1 is a nontrivial relation, then x[sub 0], x[sub 1], ..., x[sub n-1], x[sub n], where x[sub i] = x[sub i-1]a[sup epsilon][sub i] 1 </= i </= n, and x[sub 0] = x[sub n], is a closed trail in G(H, S), which must contain a cycle.

Suppose now that H is a free group, S a minimal set of generators of H, and J a subgroup of H. Since there is no nontrivial relation on the elements of S, G(H, S) does not contain a cycle. Also, from the corollary above, G(H, S) is contractible onto G(J, T) for some set T of generators of J. Because any contraction of a cycle-free graph is again cycle-free, G(J, T) must be cycle-free, and, thus, there is no nontrivial relation on the elements of T. Hence, J must be a free group, freely generated by T.

In [7] the interested reader may further pursue the substantial use of elementary graph theory in giving simplified proofs of important theorems in combinatorial group theory.

Existence of a Nonmeasurable Set

The following proof of the existence of a subset of the real numbers R which is non-measurable in the Lebesgue sense is due to Thomas [15]. He wrote his paper while an undergraduate student. We realize that many readers may still prefer Vitali's proof. However, it is quite unexpected that this theorem can be reduced to the theorem below, an easily proved result in measure theory, by using only discrete mathematics.

A simple, well-known result from graph theory says that a graph (V, E) is bipartite if and only if all its cycles are of even length. For a proof, it suffices to prove it for connected graphs only. Choose any x is an element of V and define V[sub 1] = {y is an element of V: any path connecting x and y is of odd length} and V[sub 2] = {y is an element of V: any path connecting x and y is of even length}. Since there are no odd cycles in (V, E), any two paths connecting x and y are of the same parity, so V[sub 1] and V[sub 2] yield a partition of V. From the definitions of V[sub 1] and V[sub 2], it follows that no edge has both vertices in the same V[sub i], so the graph is bipartite. The second implication is obvious.

Consider now the graph T = (R, E), where xy is an element of E if and only if |x - y| = 3[sup k], with k an integer. In order to show that T is bipartite, suppose that X[sub 0], X[sub 1], x[sub 2], ..., x[sub n-1], x[sub n] = X[sub 0] is a cycle of T of length n. Then, by definition of T, x[sub n] = x[sub n-1] +/- 3[sup k][sub n] = X[sub n-2> +/- 3[sup k][sub n] +/- 3[sup k][sub n] = ... = x[sub 0] +/- 3[sup k][sub 1] +/-3[sup k][sub 2] +/- ... +/- 3[sup k][sub 3] and, thus,

Multiple line equation(s) cannot be represented in ASCII text

where {k[sub i]}[sup n][sub 1] is a set of integers. Multiplying both sides by 3[sp N], where N is an integer such that N + k[sub i] > 0, 1 </=+ i </= n, yields

Multiple line equation(s) cannot be represented in ASCII text

which implies that n is even, since otherwise the left side of the above equation is odd, a contradiction. Thus, T is bipartite.

Hence, there are sets A and B with A intersection cap B = *[This character cannot be converted to ASCII text], A union cup B = R, such that each edge of T is incident with one vertex in A and the other vertex in B. If both A and B were measurable, then at least one of them, say A, would have positive measure. Furthermore, for each integer k, A + 3[sup k] subset or is equal to B, which yields A intersection cup (A + 3[sup k]) = *[This character cannot be converted to ASCII text]. Since 3[sup k] right arrow 0 as k right arrow -infinity, this contradicts the following theorem, which is a standard result in measure theory. For the convenience of the reader, we include the proof from [15].

Theorem. Let M be a set of real numbers with positive Lebesgue measure. Then, there exists a delta > 0 such that for every x is an element of R, |x| < delta, M union cup (M + x) Is not equal to *[This character cannot be converted to ASCII text].

Proof. Find a closed set F and an open set G with F subset or is equal to M and F subset G such that 3lambda(G) < 4lambda(F) (where lambda is Lebesgue measure). Since G is a countable union of disjoint open intervals, there is one among them, say I, such that 3lambda(I) < 4lambda(F intersection cup I). Let delta = 1/2lambda(I) and suppose that |x| < lambda. Then, I union cup (x + I) is an interval of length less than 3/2lambda(I) which contains both F intersection cup I and x + (F intersection cup I). The last two sets cannot be disjoint, since otherwise

Multiple line equation(s) cannot be represented in ASCII text

which is a contradiction. Hence, *[This character cannot be converted to ASCII text] Is not equal to (F intersection cup I) union cup (x + (F intersection cup I)) subset or is equal to M intersection cup (x + M), completing the proof.

Remark. It is well known that a nonmeasurable set cannot be constructed without using the axiom of choice. Our graph T is not connected, and, in fact, each component of T has only a countable number of vertices. Thus, to define A and B, we need to make use of this axiom.

Sharkovsky's Theorem

Let f: R right arrow R be a continuous function. A point x is an element of R is called a k-periodic point off if f[sup k](x) = x and f[sup i](x) Is not equal to x for i = 1, 2, ..., k - 1. Here, f[sup n] is the nth iterate off, i.e., f[sup n] = f*[This character cannot be converted to ASCII text]f[sup n-1].

If f has a k-periodic point, is it necessary that f have an m-periodic point for some m Is not equal to k?

In 1964, Sharkovsky [13] gave a complete and amazing answer to this question with the following

Theorem. Let f: R right arrow R be a continuous function with a k-periodic point. Then, f has an m-periodic point if k precedes m in the following ordering (S) of all the natural numbers:

Multiple line equation(s) cannot be represented in ASCII text

This is best possible, since whenever k and m are natural numbers and m precedes k, there exists a continuous function f: R right arrow R with a k-periodic point, but no m-periodic point.

The original proof by Sharkovsky is very complicated, and, later, several mathematicians presented much simpler proofs. In some of them, graph theory was used, with the most important step being made by Straffin [14]. He defined a digraph associated with a periodic point of a function and proved the crucial result.

For this purpose, let x be a k-periodic point of a function f. Then, the distinct values [x, f(x), f[sup 2](x), ..., f[sup k-1] (x)} determine k - 1 finite intervals I[sub 1], I[sub 2], ..., I[sub k-1], labeled from left to right, after locating these numbers in their natural order on the x (and y) axis (see, for example, Fig. 2). Define a digraph G = (V, E) by V = {I[sub 1], ..., I[sub k-1]} with (I[sub i], I[sub j]) is an element of E whenever f(I[sub i]) contains or is equal to I[sub j]. For example the digraph corresponding to the 4-periodic point x = 0 of f, seen in Fig. 2, is the graph given in Fig. 3.

A closed trail in a digraph is said to be nonrepetitive if it does not consist entirely of a cycle of smaller length traced several times. For example, the digraph in Fig. 3 has nonrepetitive trails of lengths 1 and 2 only. Now, we are able to state Straffin's theorem, which turns the problem of the existence of a periodic point into a problem about the corresponding digraph.

Theorem [14]. If the digraph associated with a k-periodic point of a function f has a nonrepetitive closed trail of length m, then f has an m-periodic point.

Figure 4 shows the digraph associated with any 3-periodic point of a function. Clearly, this digraph contains a nonrepetitive closed trail of arbitrary length, showing that the existence of a 3-periodic point off implies that f has periodic points of all orders. This special case, and other results on systems with 3-periodic points, were proved in 1975 by Li and Yorke [9], when Sharkovsky's theorem was still little noticed.

The reader is referred to Straffin's one-page proof of his theorem above, which is modeled after Li and Yorke's. Straffin's proof makes essential use of two lemmas which are standard in analysis courses:

Lemma 1. Suppose I and J are closed intervals, f continuous, and J Subset f(I). Then there is a closed interval Q Subset I such that f(Q) = J.

Lemma 2. Suppose I is a closed interval, f continuous, and I Subset f(I). Then, f has a fixed point in I.

Using his theorem above, Straffin proved some parts of Sharkovsky's Theorem, and his approach subsequently allowed several authors to complete the proof (see [3, 6]). In the proof of Sharkovsky's Theorem presented in [2] graphs were used without applying Straffin's result. To give some of the flavor of the proofs in [3, 6] we sketch the proof of a partial result, showing that in the ordering S, all even integers lie after all the odd integers (see [6]).

Theorem. If a continuous function f: R right arrow R has a point of odd period 2n + 1 (n >/ = 1), then it has periodic points of all even periods.

Proof (sketch). For n = 1, the proof was given above. Now, suppose n > 1 and assume by way of induction that the theorem is true whenever f has a point of odd period 2m + 1, where 3 </ = 2m + 1 < 2n + 1. Straffin proved generally that the digraph corresponding to a periodic point of period k contains a closed trail of length k in which some vertex is repeated exactly twice. In our case, k = 2n + 1, and this closed trail can, therefore, be decomposed into two closed nonrepetitive trails, one of which has odd length, say 2m + 1 < 2n + 1. If this closed trail is of length greater than one, the assertion follows by our induction assumption and the previous theorem. If not, then Straffin proved that our digraph must contain the directed subgraph given in Fig. 5. This subgraph has a cycle of length 2, and one of length 4. For any even number t > 4, we may begin a nonrepetitive closed trail of length t at the bottom right-hand vertex, traverse the 4-cycle once, and follow this by traversing the 2-cycle exactly (t - 4)/2 times. By the previous theorem, the existence of all even periods follows.


Source: Mathematical Intelligencer, Summer99, Vol. 21 Issue 3, p38, 10p, 10 diagrams, 1bw Author(s): Vardi, Ilan

In my opinion, it is not only the serious accomplishments of great and good men which are worthy of being recorded, but also their amusements.

The title of this paper is a result of comments on earlier drafts by mathematicians: "This is not mathematics, this is history!" and by historians of mathematics: "This is not history, this is mathematics!" After some reflection, I came to the conclusion that the historians were right and the mathematicians were wrong--for example, I have found little difference between reading papers of Atle Selberg (1917-, Fields Medal 1950) and Archimedes (287-212 BC) (who both lived in Syracuse!). I believe that the mathematicians I spoke to were expressing a generally held belief that reading mathematical papers that are over a hundred years old is history of mathematics, not mathematics. Thus, the reconstruction of Heegner's solution to the class-number-one problem (1952) appeared in a mathematics journal [52], while a reconstruction of the missing portions of Archimedes's The Method (250 BC) appeared in a history journal [29].

To me, reading and proving results about a mathematical paper, whether it was written in 1950 or 250 BC, is always mathematics, though the latter case might be called "ancient mathematics." At least as to Greece, this is accepted by some eminent mathematicians [30, p. 21]:

Oriental mathematics may be an interesting curiosity, but Greek mathematics is the real thing ... The Greeks, as Littlewood said to me once, are not clever schoolboys o r "scholarship candidates, " but "Fellows of another college." So Greek mathematics is "permanent, " more perhaps even than Greek literature. Archimedes will be remembered when Aeschylus is forgotten, because languages die and mathematical ideas do not.

I am saying that ancient Greek mathematicians were in every essential way similar to modern mathematicians. In fact, some mathematicians might find more in common with Archimedes and Euclid than with many colleagues of their departments, and even reading the original Greek--a subject traditionally taught in High School [9]--seems easier than understanding, say, the proof that every semistable elliptic curve is modular [59].

Nineteenth-century mathematicians dedicated much of their research to elementary Euclidean geometry. It is possible that some mathematicians of that era felt that the influence of the past was too great, as Felix Klein wrote [38, Vol. 2, p. 189]:

Although the Greeks worked fruitfully, not only in geometry but also in the most varied fields of mathematics, nevertheless we today have gone beyond them everywhere and certainly also in geometry.

For whatever reason, geometers recently tend to distance themselves from Euclidean geometry. For example, the book Unsolved Problems in Geometry [16], part, of a series on "unsolved problems in intuitive mathematics, " does not have a section devoted to classical Euclidean geometry, and with few exceptions, such as [10], articles on this subject are relegated to "lowbrow" publications. Yet earlier in this century, Bieberbach, Hadamard, and Lebesgue all wrote books on elementary Euclidean geometry [13] [27] [44], and excellent books and articles on ancient mathematics are still being written [31] [55]. See [17] for further analysis of these issues.

In this paper, I will give an example of ancient mathematics by using techniques that Archimedes developed in his paper The Method to derive results that he proved in his paper On Spirals. I will try to present these in a way that Archimedes might understand [57], in particular, the diagrams are intended to conform to ancient Greek standards [46]. I will also indicate how ideas in these papers can lead to some surprising results (e.g., Exercise 4 below). The paper will include such exercises as may challenge the reader to understand concepts of Archimedes as he expressed them.

(I have concentrated on the works of Archimedes because these are most similar to modern mathematical research papers, sharply focused on problems and their solution. By comparison, the works of Euclid read like a generic textbook; and so little is known about Euclid that it cannot be ruled out that he was actually a "consortium." Moreover, it seems likely that the works of Euclid are based on the efforts of earlier mathematicians [24] [39].[1])

The balance of the paper shows how a precise knowledge of ancient mathematics allows one to navigate in the sea of inaccuracies and misconceptions written about the history of mathematics. This also gives one perspective on cultural aspects of mathematics, as it forces one to understand ideas of first-rate mathematicians whose cultural background is very different from the present one. For example, it can help you read The New York Times [37]:

"Alien intelligences may be so far advanced that their math would simply be too hard for us to grasp, " [Paul] Davies said. "The calculus would have baffled Pythagoras, but with suitable tuition he would have accepted it."

Reading this paper should make it clear that Archimedes could have been Pythagoras's calculus tutor, thus refuting any implication that calculus was an unknown concept to ancient Greeks.

It is my hope that I can convince mathematicians that there are many interesting and relevant ideas to be uncovered in ancient Greek mathematics, and that it might be worthwhile to take a first-hand look, being wary of popular accounts and secondary sources, this one included!

Extending Archimedes's Method

In 1906 the Danish philologist J.L. Heiberg went to Constantinople to examine a manuscript containing mathematical writing which had been discovered seven years earlier in the monastery of the Holy Sepulchre at Jerusalem. What he found was a 10th-century palimpsest--a parchment containing works of Archimedes that, sometime between the 12th and 14th centuries, had been partially erased and overwritten by religious text. Heiberg managed to decipher the manuscript [33] and found that it included a text of The Method, a work of Archimedes previously thought lost. (The story of the transmission of Archimedean manuscripts given in [18] reads like a chapter from The Maltese Falcon. Late bulletin: Heiberg's palimpsest was sold by Christie's for $2, 000, 000--see Jeremy Grey's article in this issue.)

This discovery had a significant impact on the understanding of ancient Greek mathematics, for two reasons. The first is the aim of the paper, summarized by Archimedes[2] [54, Vol. 2, p. 221]:

Moreover, seeing in you, as I say, a zealous student and a man of considerable eminence in philosophy, who gives due honour to mathematical inquiries when they arise, I have thought fit to write out for you and explain in detail in the same book the peculiarity of a certain method, with which furnished you will be able to make a beginning in the investigation by mechanics of some of the problems in mathematics. I am persuaded that this method is no less useful even for the proofs the theorems themselves. For some things first became clear to me by mechanics, though they had later to be proved geometrically owing to the fact that investigation by this method does not amount to actual proof; but it is, of course, easier to provide the proof when some knowledge of the things sought has been acquired by this method rather than to seek it with no prior knowledge.

This is a radical divergence from all other extant Greek works, as T.L. Heath explains [6, Supplement, p. 6]:

Nothing is more characteristic of the classical works of the great geometers of Greece, or more tantalising, than the absolve of any indication of the steps by which they worked their ways to the discovery of their great theorems. As they have come down to us, these theorems are finished masterpieces which leave no traces of any rough-hewn stage, no hint of the method by which they were evolved ... A partial exception is now furnished by The Method; for here we have a sort of lifting of the veil, a glimpse of the interior of Archimedes' workshop as it were.

The other surprising aspect of The Method is the revelation that Archimedes worked with infinitesimals, for example, "The triangle GammaZA is composed of the straight lines drawn in GammaZA" [54, Vol. 2, p. 227], "The cylinder, the sphere and the cone being filled by circles thus taken." [8, vol. 3, p. 91], see [1] [42]. As every mathematician knows, infinitesimals were reinvented by mathematicians such as Cavalieri (1598-1647) and Leibniz (1646-1716), see [2] [21]. Archimedes used them to compute the area and volumes of various geometrical figures including what he considered his greatest achievement:[3]

Any cylinder having for its base the greatest of the circles in the sphere, and having its height equal to the diameter of the sphere, is one-and-a-half times the sphere, a result he subsequently proved, On the Sphere and Cylinder, I, Corollary to Proposition 34 [54, Vol. 2, p. 125]. Archimedes understood that his method does not produce valid proofs due to its use of infinitesimals, [4] though it is unclear if the same is true of his successors. In any case, it is easy to make the arguments rigorous, given present knowledge. The basic ideas of The Method are still presented in contemporary calculus courses [35] [51, p. 709], and a physical model of Archimedes's argument has been built [25].

On the other hand, Archimedes's On Spirals is a masterpiece of rigorous mathematics. In this paper, Archimedes computes the area and tangent of a spiral, and, in doing so, derives much of the Calculus I curriculum, including related rates, limits, tangents, and the evaluation of Riemann sums. This is reflected by the fact that a number of contemporary Calculus texts outline the basic idea of Archimedes's computation of the area of a spiral [3, p. 3] [12, p. 75], though both these works avoid technical difficulties by substituting a parabola, but then incorrectly imply that Archimedes used such an approach for the parabola [3, p. 8] [12, p. 75]. The considerable length of the paper is a consequence of proving these results from basic principles. Unfortunately, it does not yet have a faithful English translation [56]; Heath's intent in [6] was to capture the modern flavor of Archimedes's works in order to make them more accessible. A generally faithful French translation, including the Greek text, is available [8].

The mechanical method does not seem to produce directly the area of a spiral, or even the area of a circle (also computed by Archimedes), so one might wonder how he first derived them. W.R. Knorr [40] has suggested that the writings of Pappus of Alexandria (fourth century AD) indicate that Archimedes wrote an earlier version of On Spirals which used a different argument to compute the area of the spiral, but then rejected it as inelegant (this approach is developed in the solution to Exercise 1). The object of the next section is to show how a natural extension of the mechanical method easily produces these results.

Weighing a Spiral

The Method relies on a mechanical analogy by using a balance to compare objects. This requires a few simple assumptions and facts about the properties of a lever, which are developed (sometimes implicitly) in Archimedes's On the Equilibrium of Planes I, [6] [18, Chapter IX]. These can be summarized by

Assumption 1. Two objects will balance each other if the distances of their centers of gravity to the fulcrum are inversely proportional to their weights. The center of gravity of an object lies on an axis of symmetry.

When only the weight of an object is relevant to an argument, I will place it on a pan suspended from the balance. The object and any of its sections will then be assumed to have their centers of gravity at the point where the pan is suspended. I will also make extra assumptions not seen in Archimedes's works (however, see the solution to Exercise 4).

Assumption 2. A plane figure is composed of circular arcs with common center, and each circular arc weighs the same as a line segment of equal length.

Exercise 1. What happens if you instead decompose plane figures into radii with common center?

I will first show how the mechanical method can be used to derive Archimedes's formula for the area of a circle given in Measurement of the Circle, Proposition 1 [54, Vol. 1, p. 317].[5]

Proposition 1. Any circle is equal to a right-angled triangle in which one of the sides about the right angle is equal to the radius, and the base is equal to the circumference.

Exercise 2. Explain why Proposition 1 is equivalent to the familiar formula: Area of a circle = piR[sup 2].

Suspend two pans on opposite sides of a balance and at equal distances to the fulcrum. On one pan, place a circle with center at A and radius AB, on the other place a line segment CD of length AB. By Assumption 2, the circle is composed of circumferences with center A and radius AE for any E lying on AB. For each such circumference, place a line segment FG perpendicular to CD, of length the circumference through E, such that its endpoint F lies on CD and CF is equal to AE. By Assumption 2, the line segment FG is in equilibrium with the circumference through E. The resulting figure is a right triangle of height AB, base the circumference through B, and it balances a circle of radius AB, which is the statement of Proposition 1.

Exercise 3. Why is the resulting figure in this construction a triangle?

Exercise 4. Generalize the following heuristic from The Method [6, Supplement]: "... judging from the fact that any circle is equal to a triangle with base equal to the circumference and height equal to the radius of the circle, I apprehended that, in like manner, any sphere is equal to a cone with base equal to the surface of the sphere and height equal to the radius."

Archimedes's definition of a spiral and its relevant components is given by [54, Vol. 2, p. 183]:

    1. If a straight line drawn in a plane revolve uniformly

       any number of times about a fixed extremity until it

       return to its original position, and if, at the same

       time as the line revolves, a point move uniformly along

       the straight line, beginning at the fixed extremity, the

       point will describe a spiral in the plane.

    2. Let the extremity of the straight line which remains

       fixed while the straight line revolves be called the

       origin of the spiral.

    3. Let the position of the line, from which the straight

       line began to revolve, be called the initial line of the


Proposition 2. The area inside a spiral anywhere within its first revolution is one third the sector of a circle with center at the origin of the spiral, radius equal to the distance of the point describing the spiral to the origin, and angle equal to the angle between the line and the initial line. (Archimedes gave areas for complete revolutions only, but his proof also applies to this case.)

Consider a spiral with origin A, initial line AB, and C the position of the point describing the spiral. Consider also a balance arm DE of length twice AC and let the midpoint F of DE be the fulcrum. On this balance suspend a pan from D and place the spiral region in the pan.

By Assumption 2, the spiral region is composed of arcs GH for each G lying on AC, where H is the intersection of the circle with center A and radius AG and the spiral. Extend AH to intersect the circle of center A and radius AC at I. Consider a line segment JK of length equal to the arc CI and crossing DE at L such that JK and DE are perpendicular, L is the midpoint of JK, and FL is equal to AG. I claim that JK and the arc GH are in equilibrium. To see this note that, by Exercise 3, the length of an arc is proportional to its radius, so that

              arc GH : arc CI :: AG : AC,

and the result follows from the assumption that the arc CI has its center of gravity at D and from Assumption 1. Now extend the arc CI to intersect AB at M; then the arc CI is equal to the arc CIM minus the arc IM, and by the definition of spiral, IM is proportional to AH. Since the arc CIM remains constant in this argument, the second part of Exercise 3 shows that the arc CIM minus JK is proportional to FL, which means that the resulting figure is an isosceles triangle which balances the inside of the spiral.

The exact same argument shows that the area between the spiral and the initial line that lies within the same sector balances the same isosceles triangle, but reversed so that, its vertex lies on the fulcrum. The crucial step is to recall the following

Fact: The center of gravity of a triangle lies at the intersection of the medians, and the medians of a triangle intersect each other in a ratio of 2:1.

The first part is suggested by the observation that a median divides a triangle into two triangles of equal weight. and its proof is one of the main results of On the Equilibrium of Planes L The second part is an easy exercise [15, Section/Sections1.4] and follows from On the Equilibrium of Planes I, Proposition 15, generalized to trapezoids.

This shows that reversing the first triangle places the center of gravity twice as far from the fulcrum, so the second triangle will balance twice the first. One concludes that the inside of the spiral weight one half the outside of the spiral and thus one third of the sector of the circle, which is the statement of Proposition 2.

Exercise 5. Evaluate the area of the spiral using the same procedure as for the circle, i.e., by only comparing weights placed on pans.

Exercise 6. Use the mechanical method to compute the Center of gravity of a spiral region.

A Modern Translation

The basic observation is that Assumption 2 extends Archimedes's method to polar coordinates. Consider a curve r = f(theta) in polar coordinates, where, for simplicity, f(r) is an increasing function, so there is an inverse function theta = g(r) (this notation is more convenient given the difficulties of Exercise 1). To compute the area of a region A lying inside the curve and having 0 </= theta </= Theta, one partitions A into thin circular shells of width h > 0, as in Figure 6. Using the formula thetar[sup 2]/2 for the area of a sector of angle theta and radius r, each shell has area (Theta - theta)rh + R(r, h), where the error R(r, h) is less than the area of the small shell element of area (r + h)h(g(r + h) - g(r)), see Figure 6, and this is less than Crh[sup 2], for some constant C, assuming that g(r) is well behaved. It follows that, ignoring terms of order h[sup 2], the area of each shell is (Theta - theta)rh, which is the length of the bottom arc of the shell multiplied by h. This shows why the first part of Assumption 2 holds. All these shells have area a linear function of h up to an error term of lower order, and form a disjoint union of A, which shows why the second part of Assumption 2 holds. Letting h arrow right 0, it follows that the area of A is

Multiple line equation(s) cannot be represented in ASCII text

The standard derivation of this formula uses the formula rdrd theta for the area element in polar coordinates

Multiple line equation(s) cannot be represented in ASCII text

A circle is simply g(r) = 0, which yields 2pi "[Greek text cannot be converted in ASCI text]"[sup R, sub 0] rdr = piR[sup 2].

A spiral, in polar coordinates, is given by the equation r = atheta, for some constant a, which can be written as theta = kr, where k = 1/a. By the above, the area of the spiral is

Multiple line equation(s) cannot be represented in ASCII text

where the term on the right is seen to be 1/3 the area of the sector of the circle of radius R and angle Theta, yielding Proposition 2.

Any proof of this formula is equivalent to evaluating such integrals. Archimedes evaluated "[Greek text cannot be converted in ASCI text]"[sup R, sub 0]r[sup 2]dr by decomposing it into Riemann sums and obtaining a closed form for the sum 1[sup 2] + ... + n[sup 2]. In Proposition 2 this integral is computed by realizing it as the moment of a triangle and evaluating this as its weight multiplied by the distance of its center of gravity from the fulcrum.

The Way of Archimedes

The Calculus Reform movement has emphasized experimentation over rigor in calculus education and has been criticized as a result [53]. To defend its position that physical problems should be used to discover mathematical results, Harvard Calculus appeals to Archimedes and The Method [35, p. vii]:

The Way of Archimedes: Formal definitions and procedures evolve from the investigation of practical problems.

This principle accurately represents the works of Archimedes, but a disparity arises in that Harvard Calculus postpones mathematical rigor indefinitely; Archimedes's name should not be associated with such an endeavor. For example, the method of exhaustion used by Archimedes is essentially the epsilon-delta argument abandoned by Harvard Calculus, as B.L. van der Waerden writes [58, p. 220]:

... the estimations, which occur in the summing of infinite series and in limit operations, the `epsilontics', as the calculation with an arbitrary small epsilon is sometimes called, were for Archimedes an open book. In this respect, his thinking is entirely modern.

Moreover, Archimedes held in contempt those who did not furnish proofs of their results. In the introduction to On Spirals, Archimedes reveals that he intentionally announced false theorems in order to expose some of his contemporaries [6]:

... I wish now to put them in review one by one, particularly as it happens that there are two among them which [are wrong and which may serve as a warning to] those who claim to discover everything but produce no proofs of the same may be confuted as having actually pretended to discover the impossible.

Harvard Calculus fails miserably when measured against this Way of Archimedes. Apart from the passage quoted above, the word "theorem" appears in [35] only in the name "Fundamental Theorem of Calculus." Compare this with a standard calculus text [22], which lists 130 theorems in its index. Even more revealing, the only instance of the word "proof" I located in [35] was in Archimedes's introduction to the method quoted above and used in [35] to justify "The Way of Archimedes." In fact, this quote emphasizes that discovery of the answer to a problem leads to a theorem whose proof is facilitated by knowledge of the answer. My interpretation is not Calculus Reform but

Problem-Solving: When faced with a problem, use any method that allows you to conjecture the answer, then find a rigorous proof.

A recent development: The second edition of [35] has taken a more moderate approach to Calculus Reform and now includes some complete proofs [35, 2nd Edition, p. 78] and the epsilon-delta definition of a limit [35, 2nd Edition, p. 128]. However, this new edition no longer includes "The Way of Archimedes."

Popular Misconceptions

It must be noted that the penultimate remark of the previous section paraphrases E.T. Bell [11, p. 31]: "In short he used mechanics to advance his mathematics. This is one of his titles to a modern mind: he used anything and everything that suggested itself as a weapon to attack his problems." However, strong opinions such as those expressed in [11] are fraught with danger, and it is instructive to include the continuation of this passage:

To a modern all is fair in war, love, and mathematics; to many of the ancients, mathematics was a stultified game to be played according to the prim rules imposed by the philosophically-minded Plato. According to Plato only a straightedge and a pair of compasses were to be permitted as the implements of construction in geometry. No wonder the classical geometers hammered their heads for centuries against `the three problems of antiquity': to trisect an angle; to construct a cube having double the volume of a given cube; to construct a square equal to a circle.

This has since been discredited, see [24] [41] (better yet, look at original sources, e.g., as collected in [54, Vol. 1, Chapter 9]); and van der Waerden writes [58, p. 263],

The idea, sometimes expressed, that the Greeks only permitted constructions by means of compasses and straight edge, is inadmissible. It is contradicted by the numerous constructions, which have been handed down, for the duplication of the cube and the trisection of the angle.

In particular, Archimedes trisected the angle with ruler and compass in Proposition 8 of The Book of Lemmas [6, p. 309], see [20] [31, Section 31]. The history of this misconception might prove an interesting subject for further study.

Unfortunately, it is only one of a number of popular misconceptions about the limitations of Greek science [56]. For example, Isaac Asimov (1920-1992) has written [5],

To the Greeks, experimentation seemed irrelevant. It interfered with and detracted from th e beauty of pure deduction ... To test a perfect theory with imperfect instruments did not impress the Greek philosophers as a valid way to gain knowledge ... The Greek rationalization for the "cult of uselessness" may similarly have been based on a feeling that to allow mundane knowledge (such as the distance from Athens to Corinth) to intrude on abstract thought was to allow imperfection to enter the Eden of true philosophy. Whatever the rationalization, the Greek thinkers were severely limited by their attitude. Greece was not barren of practical contributions to civilization, but even its great engineer, Archimedes of Syracuse, refused to write about his inventions and discoveries ... to maintain his amateur status, he broadcast only his achievements in pure mathematics.

This passage is contradicted by numerous examples of Greek scientific experiments, for example, Eratosthenes's measurement of the earth [4]. Asimov may be excused for paraphrasing Plutarch's account of Archimedes in his Life of Marcellus, written circa 75 AD [49] [54, Vol. 2, p. 31]:

Yet Archimedes possessed so lofty a spirit, so profound a soul, and such a wealth of scientific inquiry, that although he had acquired through his inventions a name and reputation for divine rather than human intelligence, he would not deign to leave behind a single writing on such subjects. Regarding the business of mechanics and every utilitarian art as ignoble or vulgar, he gave his zealous devotion only to those subjects whose elegance and subtlety are untrammeled by the necessities of life ...

Despite Plutarch's ancient credentials, he had no better insight into Archimedes's scientific contribution, which contradict his story. The reader is already aware that The Method shows that physical considerations played an important role in Greek mathematics. But Asimov and Plutarch are completely refuted by Archimedes in The Sand Reckoner [6] [18]:

While examining this question I have, for my part tried in the following manner, to show with the aid of instruments, the angle subtended by the sun, having its vertex at the eye. Clearly, the exact evaluation of this angle is not easy since neither vision, hands, nor the instruments required to measure this angle are reliable enough to measure it precisely. But this does not seem to me to be the place to discuss this question at length, especially because observations of this type ha ye often been reported. For the purposes of my proposition, it suffices to find an angle that is not greater than the angle subtended at the sun with vertex at the eye and to then find another angle which is not less than the angle subtended by the sun with vertex at the eye.

A long ruler having been placed on a vertical stand placed in the direction the rising sun is seen, a little cylinder was put vertically on the ruler immediately after sunrise. The sun, being at the horizon, can be looked at directly, and the ruler is oriented towards the sun and the eye placed at the end of the ruler. The cylinder being placed between the sun and the eye, occludes the sun. The cylinder is then moved further away from the eye and as soon as a small piece of the sun begins to show itself from each side of the cylinder, it is fixed.

If the eye were really to see from one point, tangents to the cylinder produced from the end of the ruler where the eye was placed would make an angle less than the angle subtended by the sun with vertex at the eye. But since the eyes do not see from a unique point, but from a certain size, one takes a certain size, of round shape, not smaller than the eye and one places it at the extremity of the ruler where the eye was placed ... the width of cylinders producing this effect is not smaller than the dimensions of the eye.

... It is therefore clear that the angle subtended by the sun with vertex at the eye is also smaller than the one hundred and sixty fourth part of a right angle, and greater than the two hundredth part of a right angle.

The correct value of the angular diameter of the sun is now known to average about 34' [26, p. 95], i.e., the 159th part of a right angle. It is important to note that this shows not only that ancient Greeks frequently performed experiments, but that Archimedes dealt with experimental error and also compensated for the fact that the human eye is part of the observational instrument, thus anticipating scientists such as Hermann von Helmoltz (1821-1894) [34]. A translation and analysis of The Sand Reckoner is given in [56].

Answers to Exercises

Exercise 1. A naive approach leads to incorrect results, evidence of the dangers of using infinitesimals, and indicating why Archimedes did not consider his method to be rigorous. For example, taking the radii of a circle of radius R, with respect to the circumference, and reordering them to form a rectangle, yields area 2piR[sup 2]. For a general figure, it's not even clear how to pick the radii. To make sense of what is going on, one regards radii as limits of sectors, i.e., infinitesimal triangles. In the case of the circle, this means that the weight of a radius, with respect to the circumference, is equal to one half its length. This can be loosely interpreted as the argument Archimedes used to compute the area of the circle [1]. In the general case, the following is justified:

Assumption 3. The weight of a radius is proportional to the square of its length.

In modern notation, this is simply

Multiple line equation(s) cannot be represented in ASCII text

where the radii have been chosen with respect to the unit circle. Given Assumption 3, one can compute the area of the spiral by using Pappus's argument [48, Book 4, Proposition 21], see also [32, p. 377] [41, p. 162].

To compute the weight of a spiral region, take each radius of the spiral, starting from the final radius, and place a disk with diameter equal to this radius at height the current angle so the resulting figure is a cone. Similarly, for each radius of the sector place a disk with diameter equal to this radius at height the current angle, resulting in a cylinder with the same base and height as the cone.

Since Euclid's Proposition 2 of Book 12 proves that "circles are to one another as the squares on the diameter, " Assumption 3 shows that the ratio of the weight of the spiral region to the weight of the sector is the same as the ratio of the volume of the cone to the volume of the cylinder. But Euclid's Proposition 10 of Book 12 proved that the volume of a cone is one third the cylinder with the same base and height, so the spiral weighs one third of the sector, which is the statement of Proposition 2. (Note that equilateral triangles could have been used instead of circles resulting in a pyramid whose volume is easier to compute.)

Knorr [40] comments that this appeal to three-dimensional figures might have been considered inelegant by Archimedes as it uses volumes to compute areas. On the other hand, reversing this argument and using the evaluation above shows that the volume of a cone can be computed by the mechanical method, a result which does not appear in The Method.

Exercise 2. In modern notation, Archimedes's formulation of Proposition 1 is Area of circle of radius R = Integral[sup R, sub 0] 2pirdr, for the integral represents the area of a right triangle with base R and height 2piR.

Exercise 3. This is equivalent to the fact that the length of an arc of fixed angle is proportional to its radius. In particular, pi exists, see [45] [56]. The proof is similar to [23, Book 12, Proposition 2] cited in Exercise 1, and is implicit in Archimedes's Measurement of the Circle. Similarly, the length of an arc of fixed radius is proportional to its angle.

Exercise 4. By analogy with Assumption 2, consider a sphere as being composed of spherical shells centered at the center of the sphere, where each shell weighs the same as a circle of equal area. The justification follows exactly as in Proposition 2: Consider two pans suspended at equal distances from the fulcrum of a balance. On one pan, place a sphere of center A and radius AB and on the other a line CD of length equal to AB. For each E on AB there is a spherical shell passing through E, and consider a circle of area equal to this spherical shell with center at F lying on CD, where CF equals AE, and such that the circle is perpendicular to CD. The resulting figure is a cone with base the area of the sphere and height the radius of the sphere; since it balances the sphere, the claim is justified.

The similarity of this argument to the one of Proposition 1 suggests that Archimedes may have been implicitly aware of the ideas of this paper. Moreover, the reader may verify that the heuristic of this exercise and its justification directly generalize to higher dimensions (a different generalization is given in [19]):

Proposition 3. The volume of an n-dimensional ball is equal to the volume of a cone whose base has n - 1-dimensional volume equal to the (n - 1)-dimensional volume of the boundary of the ball and height equal to the radius of the ball.

Exercise 5. The procedure, when applied to the spiral, yields a section of a parabola. The general formula for such areas was computed by Archimedes in The Quadrature of the Parabola, and in this case it states that the resulting area is four-thirds the triangle with same base and height as the section of the parabola. Since the height and base are equal to the final radius and half the final radius, respectively, Proposition 2 follows.

Exercise 6. Further extensions of Archimedes's method could be a subject for investigation. As Archimedes wrote in The Method [6, Supplement, p.13],

I deem it necessary to expound the method partly because I have already spoken of it but equally because I am persuaded that it will be of no little service to mathematics; for I apprehend that some, either of my contemporaries or of my successors, will, by means of the method when once established, be able to discover other theorems in addition, which have not yet occurred to me.


I would like to thank Alain Herreman, Reviel Netz, and David Wilkins for helpful comments.

1 Hence I question the curriculum of St. John's College, which purports to educate its students by following an historical sequence of original sources. Its reading list also includes the ancient textbook [47].

2 Archimedes is addressing Eratosthenes of Cyrene (circa 284-194 B.C), director of the library of Alexandria, famous for his accurate measurement of the circumference of the earth [14] and his sieve to compute prime numbers [47].

3 Archimedes requested that a diagram of a sphere inscribed in a cylinder along with their proportion be placed on his grave, which Cicero reported finding in 75 B.C. when he was treasurer of Sicily [54, Vol. 2, p. 33].

4 In The Quadrature of the Parabola Archimedes gave what he considered to be a rigorous proof using the mechanical method of a result conjectured in a similar way in The Method, but using infinitesimals.

5 A similar method was used by Rabbi Abraham bar Hiyya (1070-1136), see V.J. Katz, review of "Force and Geometry in Newton's Principia" by F. de Gandt, American Math. Monthly 105 (1998), 386-392 and F. Sanchez-Faba, Abraham Bar Hiyya and his "Libro de Geometria, " (Spannish) Gac. Math., I 32 (1998), 101-115.


А также другие работы, которые могут Вас заинтересовать

2098. Мощность излучения антенн 281.36 KB
  Входное сопротивление передающей антенны определяется отношением напряжения к току на ее входных клеммах и характеризует антенну как нагрузку для генератора.
2099. Коэффициент согласования передающей антенны 25.36 KB
  Генератор нагружен на согласованную с ним линию без потерь, то при включении на конце линии нагрузки с сопротивлением, равным волновому, вся мощность от генератора будет поглощена этим сопротивлением.
2100. Электрическая прочность и высотность антенн 16.38 KB
  Электрическая прочность антенны характеризуется наибольшей мощностью или наибольшим напряжением в антенне, при которых еще не происходит электрический пробой диэлектриков конструкции антенны или окружающего антенну воздуха.
2101. Действующая длина передающей антенны 150.62 KB
  Выражение для напряжённости электрического поля в дальней зоне антенны с любым распределением тока вдоль ее оси может быть записано в таком же виде, как и для диполя Герца, имеющего равномерное распределение тока.
2102. Коэффициент направленного действия и коэффициент усиления передающей антенны 24.31 KB
  КНД передающей антенны определяется сравнением данной антенны с некоторой эталонной антенной, направленные свойства которой хорошо известны. В качестве эталонных широко используются: совершенно ненаправленный (изотропный) излучатель, диполь Герца, полуволновой вибратор.
2103. Поляризационные характеристики передающей антенны 144.83 KB
  Поляризация передающей антенны определяется по поляризации ее поля излучения, как правило, по электрическому вектору, который, в общем случае, с течением времени изменяет как свою величину, так и направление в каждой точке пространства.
2104. Приемные антенны 72.42 KB
  Процесс приема - преобразование радиоволн, пришедших в пункт расположения приемной антенны, в направляемые электромагнитной волны, воздействующие на входное устройство приемника.
2105. Основные параметры приёмной антенны 18.99 KB
  Внутреннее сопротивление приемной антенны. ДН приемной антенны по напряжению – зависимость амплитуды ЭДС (тока) на клеммах антенны от направления прихода плоской электромагнитной волны при прочих равных условиях (зависит только от свойств самой антенны).
2106. Энергетические соотношения в цепи приемной антенны 101.51 KB
  Целесообразно различать в режиме приема собственно приемник и приемное устройство – приемник, антенна, фидер. Соответственно нужно различать чувствительность приемника и чувствительность приемного устройства.