Concatenated page-by-page transcript. Born-digital pages came through pdf.js; scanned pages were transcribed by Claude vision OCR. Pages marked unreadable failed multiple OCR retries (heavy redaction, microfilm artifacts, or blank separators) and are kept in place for audit.
- Page 1born-digital extraction
UNCLASSIFIED / /*@R-OFFtGHH-U6EONEP Defense Intelligence Reference Document be Oe Acquisition Threat Support 11 March 2010 ICOD: 1 December 2009 An Introduction to the Statistical Drake Equation UNCLASSIFIED / APO@R-OPPECEEUSE-OnEF
- Page 2born-digital extraction
UNCLASSIFIED / / An Introduction to the Statistical Drake Equation Prepared by: (b)(3).10 USC 424 Defense Intelligence Agency Author: (b)(6) Administrative Note COPYRIGHT WARNING: Further dissemination of the photographs in this publication is not authorized. This product is one in a series of advanced technology reports produced in FY 2009 under the Defense Intelligence Agency, }(b)(3):10 USC 424 Advanced Aerospace Weapon System Applications (AAWSA) Program. Comments or questions pertaining to this document should be addressed to |(b)(3):10 USC 424;(b)(6) |, AAWSA Program Manager, Defense Intelligence Agency, ATTN: (b)(8):10 USC 424 Bldg 6G00, Washington, DC 20340-5100. UNCLASSIFIED / / POR6F
- Page 3born-digital extraction
UNCLASSIFIED / /P@R-OFFECERE OSE ONCTT Contents Eo TEPC CEG asics viecdcunivcawaresaseniensondenddnccucdsgedvunvaccs cessysvanecuscguasewedsnanepuaceatamaseonteree iv 2. The Key Question: How Far are They ? ....cccscrescceseccsccuccccoususeverccurarsceveveserevusererss 4 3. Computing AW By Virtue of the Drake Equation (1961) ......cccscseseneccncnsncucesucecmeceuas 7 4. The Drake Equation is Over-Simplified ..........ccccsesevereresesuseserevevecerecesuresaresenesurer 10 5. The Statistical Drake EqQuations 2... .ccccccccnescecesenssensseussccecnseeesnsenensensnsnsesrneenaesnanees 11 6. Solving the Statistical Drake Equation By Virtue of the Central Limit Theorem COLT) Of SEATISTICS .raccrcsancrncencrcnanenanananancnensnensnananananenscancaananszenansaene ea hanaesensans errr ee | 7. An Example Explaining the Statistical Drake Equation .......sscecesesssseeueees daaaebenunn 15 8. Finding the Probability Distribution of the Et-Distance By Virtue of the Statistical Drake OU Cl OM raccsasendoceseareictimusuteradasasecapcncwans ihpnesiavonssuunca panesuenaemenavavaacewedeepeasaus 18 9, The “Data Enrichment Principle” as the Best CLT Consequence Upon the Statistical Drake Equation (Any Number of Factors Allowed) ......sccscsssesencsenenees 23 LO COMGCIUSION S wiisecnvsiecescsccccacacvesuoawacctussensidevssaudaecouvequacannswuuns vies cxeucnseeseudancsanesusosse ee Appendix A: Proof of Shannon’s 1948 Theorem Stating That the Uniform Distribution is the “Most Uncertain” One Over a Finite Range of VALUES iicecstcastcscssscsucens svenspdsvenashausasewaunseueendenwavudan\aussssdesundenaueWeusanaaaas 25 Appendix B: Original Text of the Author’s Paper #IAC-08-A4.1.4 Entitled the Statistical Drake Equation ....sccccscsnancnsvansnaucuevansnecaneuancuaueuauananensnenenen 28 RGEIOPENGCOS:. cciidencncndedchanandntackucinadechvnvawawkwiewcueneuekesauesevewuscuadusamunddedscauky sue evcceaulsceastas 55 iii UNCLASSIFIED / /®@-OFE2CEE USE ONTT
- Page 4born-digital extraction
UNCLASSIFIED / /SQ2.Q5E5LGLAinSE-Gaiee An Introduction to the Statistical Drake Equation | i. Introduction SETI (an acronym for “Search for Extraterrestrial Intelligence”) is a relatively new branch of scientific research, having begun only in 1959. Its goal is to ascertain whether alien civilizations exist in the universe, how far from us they exist, and possibly how much more advanced than us they may be. As of 2009, the only physical tools we know that could help us get in touch with aliens are the electromagnetic waves an alien civilization could emit and we could detect. This forces us to use the largest radiotelescopes on Earth for SETI research, because the higher our collecting area of electromagnetic | radiation is, the higher our sensitivity is (that is, the farther in space we can probe). Yet, even by using the largest radiotelescopes on Earth (the 310-meter dish at Arecibo, for instance), we cannot search for aliens beyond, say, a few hundred light years away. This is a very, very small amount of space around us within our galaxy, the Milky Way, that is about 100,000 light years in diameter. Thus, current SETI can cover only a very tiny fraction of the galaxy, and it is — not surprising that in the past 50 years of SETI searches, NO extraterrestrial civilization was discovered. Quite simply, we did not get far encugh! This demands the construction of much more powerful and radically new radiotelescopes. Rather than big and heavy metal dishes, whose mechanical problems hamper SETI research too much, we are now turning to “software | radictelescopes,” where a large number of small dishes (ATA = Allen Telescope Array, and ALMA = Atacama Large Millimeter/submillimeter Array) or even just of simple dipoles (LOFAR = Low Frequency Array) using state-of- the-art electronics and very-high-speed computing can outperform the classical radiotelescopes in many regards. The final dream in this field is the SKA (= Square Kilometer Array), currently being designed and expected to be completed around 2020. 2. The Key Question: How Far are They ? But still, the key question remains: how far are they? Or, more correctly, how far do we expect the NEAREST extraterrestrial civilization to be from the Solar System in the galaxy? This question was first faced in a scientific manner back in 1961 by the same scientist who also was the first experimental SETI radio astronomer ever: the American, Frank Donald Drake (born 1930). He first considered the shape and size of the galaxy where we are living: the Milky Way. This is a spiral galaxy measuring some 100,000 light years in diameter and some 16,000 light years in thickness of the Galactic Disk at half- way from its center. That is: The diameter of the galaxy is (about) 100,000 light years, (abbreviated ly) i.e., its radius, Roane iS about 50,000 ly. UNCLASSIFIED / / b@RO5E5GRir UG Oia
- Page 5born-digital extraction
UNCLASSIFIED / /602-055456001-U6E-O0hR= The thickness of the Galactic Disk at half-way from its center, A,,,,,.,, iS about 16,000 ly. The volume of the galaxy may then be approximated as the volume of the corresponding cylinder, i.e. ¥, = Raia h. (1) Galaxy Now consider the sphere around us having a radius r. The volume of such a sphere is a 3 4 “( arenes | (2) Vour, Sphere = = ‘Ces Ore te pre 3 5) In the last equation, we had to divide the distance “ET_Distance” between ourselves and the nearest ET civilization by 2 because we are now going to make the unwarranted assumption that aff ET civilizations are equally spaced from each other in the galaxy! This is a crazy assumption, clearly, and should be replaced by more scientifically-grounded assumptions as soon as we know more about our Galactic Neighborhood. At the moment, however, this is the best guess that we can make, and so we shall take it for granted, although we are aware that this is a weak point in the reasoning. Furthermore, let us denote by WV the total number of civilizations now living in the galaxy, including ourselves. Of course, this number WV is unknown. We only know that N21 since one civilization does at least exist! Having thus assumed that ET civilizations are UNIFORMLY SPACED IN THE GALAXY, we can then write down the proportion: Vi; ray Vour Sphere GE ee. (3) N l That is, upon replacing both (1) and (2) into (3): : os ET_Distance ® RGatay 3 2 } a ee (4) N 1 H The last equation contains two unknowns: W and ET_Distance, and so we don’t know which one it is better to solve for. However, we may suppose that, by resorting to the (rather uncertain) knowledge that we have about the Evolution of the galaxy through the last 10 billion years or so, we might somehow compute an approximate value for NV. Then, we may solve (4) for ET_Distance thus obtaining the (AVERAGE) DISTANCE BETWEEN ANY PAIR OF NEIGHBORING CIVILIZATIONS IN THE GALAXY (DISTANCE LAW) UNCLASSIFIED / SPO OFEICGE-U6>Ohe =:
- Page 6born-digital extraction
UNCLASSIFIED// ET Di (N) 3 6 Revtarvlt C (5) stance ) = ——_—-——- = ——= ayy Ny where the positive constant C is defined by C= ¥6 Revatesy UGatay © 28845 light years . (6) Equations (5) and (6) are the starting point to understand the origin of the Drake equation that we discuss in detail in Section 3 of this paper. Let us just complete this section by pointing out three different numerical cases of the distance law (5): *« We know that we exist, so VWmay not be smaller than 1, i.e., N21. Suppose then that we are alone in the galaxy, i.e., that V=1. Then the distance law (5) yields as distance to the nearest civilization from us just the constant C, i.e., 28,845 light years. This is about the distance in between ourselves and the center of the galaxy (i.e, the Galactic Bulge). Thus, this result seems to suggest that, if we do not find any extraterrestrial civilization around us in these outskirts of the galaxy where we live, we should look around the Galactic Center first. And this is indeed what is happening, i.e., many SETI searches are actually pointing the antennas towards the Galactic Center, looking for beacons (see, for instance ref. [1]). e Suppose next that N=1000, i.e. there are about a thousand extraterrestrial communicating civilizations in the whole galaxy right now. Then the distance law (5) yields an average distance of 2,885 light years. This is a distance that most radiotelescopes in Earth may not reach for SETI searches right now: hence the need to build larger radiotelescopes, like ALMA, LOFAR and the SKA. e Suppose finally that V=1000000, i.e., there are a million communicating civilizations now in the galaxy. Then the distance law (5) yields an average distance of 288 light years. This is within the (upper) range of distances that our current radiotelescopes may reach for SETI searches, and that justifies all SETI searches that have been done so far in the first fifty years of SETI (1960-2010). In conclusion, interpolating the above three special cases of V, we may say that the distance law (5) yields the following key diagram of the average ET distance vs. the assumed number of communicating civilizations, NV, in the galaxy right now (Figure 1): UNCLASSIFIED / /
- Page 7born-digital extraction
UNCLASSIFIED / / Average DISTANCEof the nearest ET civilization vs. the ASSUMED NUMBER of ET civilizations in the Gak 2006 SF fas < 2 175 be fa 9 a 150¢ 5 8 is} 5 125( he 3 For} 3 100 3 G 2 7A i a i 5. 50 = ww a id 0 0 1ogdog = 200000) 200000) 400000 500000) 600000 700000) 8e0000 | 900000) 1000000 ASSUM ED NUMBER of civilizations in the Galaxy «that is, N in the Drake equation) Figure 1. DISTANCE LAW; i.e., the Average Distance (plot along the vertical axis in light years) Versus the NUMBER of Communicating Civilizations ASSUMED to Exist in the Galaxy Right Now 3. Computing WN By Virtue of the Drake Equation (1961) In the previous section, the problem of finding how close the nearest ET civilization may be was “solved” by reducing it to the computation of N, the total number of extraterrestrial civilizations now existing in this galaxy. In this section the famous Drake equation is described, that was propased back in 1961 by Frank Donald Drake (born 1930) to estimate the numerical value of WV. We believe that no better introductory description of the Drake equations exists other than the one given by Carl Sagan in his 1983 book “Cosmos” (ref. [2]), in its turn based on the famous TV series “Cosmos.” So, in this paragraph we report Carl Sagan’s description of the Drake equation unabridged. “But is there anyone out there to talk to? With a third or a half a trillion stars in our Milky Way galaxy alone, could ours be the only one accompanied by an inhabited planet? How much more likely it is that technical civilizations are a cosmic commonplace, that the galaxy is pulsing and humming with advanced societies, and, therefore, that the nearest such culture is not so very far away — perhaps transmitting from antennas established on a planet of a naked-eye star just next door. Perhaps when we look up at the sky at night, near one of those faint pinpoints of light is a world on which someone quite different from us is then glancing idly at a star we call the Sun and entertaining, for just a moment, an outrageous speculation. UNCLASSIFIED / / @R=@5E5GHiRUGE-OMie=
- Page 8born-digital extraction
UNCLASSIFIED / /02-@-55GE41- UG OR It is very hard to be sure. There may be several impediments to the evolution of a technical civilization. Planets may be rarer than we think. Perhaps the origin of life is not so easy as our laboratory experiments suggest. Perhaps the evolution of advanced life forms is improbable. Or it may be that complex life forms evolve more readily, but intelligence and technical societies require an unlikely set of coincidences — just as the evolution of the human species depended on the demise of the dinosaurs and the ice- age recession of the forests in whose trees our ancestors screeched and dimly wondered. Or perhaps civilizations arise repeatedly, inexorably, on innumerable planets in the Milky Way, but are generally unstable; so all but a tiny fraction are unable to survive their technology and succumb to greed and ignorance, pollution and nuclear war. It is possible to explore this great issue further and make a crude estimate of N, the number of advanced civilizations in the galaxy. We define an advanced civilization as one capable of radio astronomy. This is, of course, a parochial if essential definition. There may be countless worlds on which the inhabitants are accomplished linguists or superb poets but indifferent radio astronomers. We will not hear from them. N can be written as the product or multiplication of a number of factors, each a kind of filter, every one of which must be sizable for there to be a large number of civilizations: e Ns, the number of stars in the Milky Way galaxy. e fp, the fraction of stars that have planetary systems. e ne, the number of planets in a given system that are ecologically suitable for life. e ff, the fraction of otherwise suitable planets on which life actually arises. e ff, the fraction of inhabited planets on which an intelligent form of life evolves. ° fc, the fraction of planets inhabited by intelligent beings on which a communicative technical civilization develops. e fL, the fraction of planetary lifetime graced by a technical civilization. Written out, the equation reads N=WNs-fo-ne>fl- fi: fo fi (7) All of the f’s are fractions, having values between 0 and 1; they will pare down the large value of As. To derive V we must estimate each of these quantities. We know a fair amount about the early factors in the equation, the number of stars and planetary systems. We know very little about the later factors, concerning the evolution of intelligence or the lifetime of technical societies. In these cases our estimates will be little better than guesses. I invite you, if you disagree with my estimates below, make your own choices and see what implications your alternative suggestions have for the number of advanced civilizations in the galaxy. One of the great virtues of this equation, due to Frank Drake of Cornell, is that it involves subjects ranging from stellar and planetary astronomy to organic chemistry, evolutionary biology, history, politics and abnormal psychology. Much of the Cosmos is in the span of the Drake equation. 8 UNCLASSIFIED/
- Page 9born-digital extraction
UNCLASSIFIED / SPOR OPPreraAE Use Oner We know Ns, the number of stars in the Milky Way galaxy, fairly well, by careful counts of stars in a small but representative region of the sky. It is a few hundred billion; some recent estimates place it at 4 x 1011. Very few of these stars are of the massive short- lived variety that squander their reserves of thermonuclear fuel. The great majority have lifetimes of billions or more years in which they are shining stably, providing a suitable energy source for the energy and evolution of life on nearby planets. There is evidence that planets are a frequent accompaniment of star formation: in the satellite systems of Jupiter, Saturn and Uranus, which are like miniature solar systems; in theories of the origin of the planets; in studies of double stars; in observations of accretion disks around stars; and is some preliminary investigations of gravitational perturbations of nearby stars.’ Many, perhaps even most, stars may have planets. We take the fraction of stars that have planets, fp, as roughly equal to 1/3. Then the total number of planetary systems in the galaxy would be Ns fp ~ 1.3 x 1014 (the symbol ~ means “approximately equal to”). If each system were to have about ten planets, as ours does, the total number of worlds in the galaxy would be more than a trillion, a vast arena for the cosmic drama. In our own solar system there are several bodies that may be suitable for life of some sort: the Earth certainly, and perhaps Mars, Titan and Jupiter. Once life originates, it tends to be very adaptable and tenacious. There must be many different environments suitable for life in a given planetary system. But conservatively we choose ne=2. Then the number of planets in the galaxy suitable for life becomes Ns fp ne ~ 3 x 1077. Experiments show that under the most common cosmic conditions the molecular basis of life is readily made, the building blocks of molecules able to make copies of themselves. We are now on less certain grounds; there may, for example, be impediments in the evolution of the genetic code, although I think this is unlikely over billions of years of primeval chemistry. We choose ff ~ 1/3, implying a total number of planets in the Milky Way on which life has arisen at least once as Ns fp ne ff~ 1x 104, a hundred billion inhabited worlds. That in itself is a remarkable conclusion. But we are not yet finished. The choices of f and fc are more difficult. On the one hand, many individually unlikely steps had to occur in biological evolution and human history for our present intelligence and technology to develop. On the other hand, there must be quite different pathways to an advanced civilization of specified capabilities. Considering the apparent difficulty in the evolution of large organisms, represented by the Cambrian explosion, let us choose ff x fc = 1/106, meaning that only 1 per cent of planets on which life arises actually produce a technical civilization. This estimate represents some middle ground among the varying scientific options. Some think that the equivalent of the step from the emergence of trilobites to the domestication of fire goes like a shot in all planetary systems; others think that, even given ten or fifteen billion years, the evolution of a technical civilization is unlikely. This is not a subject on which we can do much experimentation as long as our investigations are limited to a single planet. Multiplying ‘Carl Sagan was writings these lines back in the 1970's, when no extrasolar planets had been discovered yet. The first such discovery occurred in 1995, when Michel Mayor and Didier Queloz, working at the “Observatoire de Haute Provence” in France, discovered the first extrasolar planet orbiting the nearby star 51 Peg. This first extrasolar planet was hence named 51 Peg B. Many more extrasolar planets were discovered around nearby stars ever since. As of April 2009, 347 extrasolar planets (exoplanets) are listed in the Extrasolar Planets Encyclopaedia. UNCLASSIFIED / -2@R-OL51604L118=, ONLY.
- Page 10born-digital extraction
UNCLASSIFIED/ / these factors together, we find Ns fp ne fl fi fo ~ 1 x 10°, a billion planets on which technical civilizations have arisen at least once. But that is very different from saying that there are a billion planets on which technical civilizations now exist. For this we must also estimate fL. What percentage of the lifetime of a planet is marked by a technical civilization? The Earth has harbored a technical civilization characterized by radio astronomy for only a few decades out of a lifetime of a few billion years. So far, then, for our planet ff is less than 1/108, a millionth of a percent. And it is hardly out of the question that we might destroy ourselves tomorrow. Suppose this were a typical case, and the destruction so complete that no other technical civilization - of the human or any other species — were able to emerge in the five or so billion years remaining before the Sun dies. Then Ns fp ne fl fi fc fL ~ 10, and, at a given time there would be only a tiny smattering, a handful, a pitiful few technical civilizations in the galaxy, the steady state number maintained as emerging societies replace those recently self-immolated. The number NV might be even as small as 1 if civilizations tend to destroy themselves soon after reaching a technological phase; there might be no one for us to talk with but ourselves. And that we do but poorly. Civilizations would take billions of years of tortuous evolution, and then snuff themselves out in an instant of unforgivable neglect. But consider the alternative, the prospect that at least some civilizations learn to live with technology; that the contradictions posed by the vagaries of past brain evolution are consciously resolved and do not Jead to self destruction; or that, even if major disturbances occur, they are reveres in the subsequent billions of years of biological evolution. Such societies might live to a prosperous old age, their lifetimes measured perhaps on geological or stellar evolutionary time scales. If 1 percent of civilizations can survive technological adolescence, take the proper fork at this critical historical branch point and achieve maturity, then fL ~ 1/100, N ~ 10’, and the number of extant civilizations in the galaxy is in the millions. Thus, for all our concern about the possible unreliability of our estimates of the early factors in the Drake equation, which involve astronomy, organic chemistry and evolutionary biology, the principal uncertainty comes to economics and politics and what, on Earth, we call human nature. It seems fairly Clear that if self-destruction is not the overwhelmingly preponderant fate of galactic civilizations, then the sky is softly humming with messages from the stars. These estimates are stirring. They suggest that the receipt of a message from space is, even before we decode it, a profoundly hopeful sign. It means that someone has learned to live with high technology; that it is possible to survive technological adolescence. This alone, quite apart from the contents of the message, provides a powerful justification for the search for other civilizations. 4. The Drake Equation is Over-Simplified In the nearly fifty years (1961-2009) elapsed since Frank Drake proposed his equation, a number of scientists and writers tried to find out which numerical values of its seven independent variables are more realistic in agreement with our present-day knowledge. Thus there is a considerable amount of literature about the Drake equation nowadays, and, as one can easily imagine, the results obtained by the various authors largely differ from one another. In other words, the value of NV, that various authors obtained by different assumptions about the astronomy, the biology and the sociology implied by the Drake equation, may range from a few tens (in the pessimist’s view) to some 10 UNCLASSIFIED / /4
- Page 11born-digital extraction
UNCLASSIFIED / / P@fe@FtehirU6E-00ie-= million or even billions in the optimist’s opinion. A lot of uncertainty is thus affecting our knowledge of NV as of 2010. In all cases, however, the final result about V has always been a sheer number, i.e., a positive integer number ranging from 1 to millions or billions. This is precisely the aspect of the Drake equation that this author regarded as “too simplistic” and improved mathematically in his paper #IAC-08-A4.1.4, entitled “The Statistical Drake Equation” and presented on October 1*, 2008, at the 59% International Astronautical Congress (IAC) held in Glasgow, Scotland, UK, September 29% thru October 3, 2008. That paper is attached herewith as Appendix B. Newcomers to SETI and to the Drake equation, however, may find that paper too difficult to be understood mathematically at a first reading. Thus, I shall now explain the content of that paper “by speaking easily.” I thank the reader for his or her attention. 5. The Statistical Drake Equation We start by an example. Consider the first independent variable in the Drake equation (7), i.e., As, the number of stars in the Milky Way galaxy. Astronomers tell us that approxirnately there should be about 350 millions stars in the galaxy. Of course, nobody has counted (or even seen in the photographic plates) a// the stars in the galaxy! There are too many practical difficulties preventing us from doing so: just to name one, the dust clouds that don’t allow us to see even the Galactic Bulge (i.e. the central region of the galaxy) in the visible light (although we may “see it” at radio frequencies like the famous neutral hydrogen line at 1420 MHz}. So, it doesn’t make any sense to say that Ns = 350 x 10S, or, say (even worse) that the number of stars in the galaxy is (say) 354,233,321, or similar fanciful exact integer numbers. That is just silly and non-scientific. Much more scientific, on the contrary, is to say that the number of stars in the galaxy is 350 million plus or minus, say, 50 millions (or whatever values the astronomers may regard as more appropriate, since this is just an example to let the reader understand the difficulty). Thus, it makes sense to REPLACE each of the seven independent variables in the Drake equation (7) by a MEAN VALUE (350 millions, in the above example) PLUS OR MINUS A CERTAIN STANDARD DEVIATION (50 millions, in the above example). By doing so, we have made a great step ahead: we have abandoned the too-simplistic equation (7) and replaced it by something more sophisticated and scientifically more serious: the STATISTICAL Drake equation. In other words, we have transformed the classical and simplistic Drake equation (7) into an advanced statistical tool for the investigation of a host of facts hardly known to us in detail. In other words still: * We replace each independent variable in (7) by a RANDOM VARIABLE, labeled D, (from Drake). » We assume that the MEAN VALUE of each D, is the same numerical value previously attributed to the corresponding independent variable in (7). ¢ But now we also ADD A STANDARD DEVIATION o,, on each side of the mean value, that is provided by the knowledge gathered by scientists in each discipline encompassed by each D,. 11 UNCLASSIFIED / /®@ RO FEtGEHEUSEOnEY
- Page 12born-digital extraction
UNCLASSIFIED / Having so done, the next question is: How can we find out the PROBABILITY DISTRIBUTION for each p,? For instance, shall that be a Gaussian, or what? This is a difficult question, for nobody knows, for instance, the probability distribution of the number of stars in the galaxy, nat to mention the probability distribution of the other six variables in the Drake equation (7). There is a brilliant way to get around this difficulty, though. We start by excluding the Gaussian because each variable in the Drake equation is a POSITIVE (or, more precisely, a non-negative) random variable, while the Gaussian applies to REAL random variables only. So, the Gaussian is out. Then, one might consider the large class of well-studied and positive probability densities called “the gamma distributions,” but it is then unclear why one should adopt the gamma distributions and not any other. The solution to this apparent conundrum comes from Shannon’s Information Theory and a theorem that he proved in 1948: “The probability distribution having maximum entropy (= uncertainty) over any FINITE range of real values is the UNIFORM distribution over that range,” This is proven in Appendix A of the present document: So, at this point, we assume that each of the seven p, in (7) is a UNIFORM random variable, whose mean value and standard deviation is known by the scientists working in the respective field (let it be astronomy, or biology, or sociology). Notice that, for such a uniform distribution, the knowledge of the mean value yz» and of the standard deviation o, automatically determines the RANGE of that random variable in between its lower (called a; } and upper (called 4, ) limits: in fact these limits are given by the equations fa,= Hp, -V3 0p, 8 le, = Hp, +V30y, (8) (the “surprising” factor /3 in the above equations comes from the definitions of mean value and standard deviation: please see equations (12), (15) and (17) in Appendix B for the relevant proof). So the uniform distribution of each random variable p, is perfectly determined by its mean value and standard deviation, and so are all its other properties. The next problem is the following: OK, since we now know everything about each uniformly distributed p,, what is the probability distribution of N , given that WV is the product (7) of all the p, ? In other words, not only do we want to find the analytical expression of the probability density function of V, but we also want to relate its mean value 4, to all mean values Hp, Of the D,, and its standard deviation c,, to all standard deviations Gp, Of the D,. 12 UNCLASSIFIED/ /
- Page 13born-digital extraction
UNCLASSIFIED / 4-@R-O5EGEir UGE ON This is a difficult problem. It occupied the author’s mind for no less than about ten years (1997-2007). It is actually an ANALYTICALLY UNSOLVABLE problem, in that, to the best of this author’s knowledge, it is IMPOSSIBLE to find an analytic expression for any FINITE PRODUCT of uniform random variables p, . This result is proven in Sections 2 thru 3.3 of Appendix B (unfortunately!). 6. Solving the Statistical Drake Equation By Virtue of the Central Limit Theorem (CLT) of Statistics The solution to the problem of finding the analytical expression for the probability density function of WV in the statistical Drake equation was found by this author in September 2007. The key steps are the following: e Take the natural logs of both sides of the statistical Drake equation (7). This changes the product into a sum. e The mean values and standard deviations of the logs of the random variables D, may all be expressed analytically in terms of the mean values and standard deviations of the D,. e Recall the Central Limit Theorem (CLT) of statistics, stating that (loosely speaking) if you have a SUM of independent random variables, each of which is ARBITRARILY DISTRIBUTED (hence, also including uniformly distributed), then, when the number of terms in the sum increases indefinitely (i.e. for a sum of random variables infinitely long)... the SUM RANDOM VARIABLE TENDS TO A GAUSSIAN. » Thus, the natural log of WVtends to a Gaussian. e Thus, NV tends to the LOGNORMAL DISTRIBUTION. e The mean value and standard deviations of this lognormal distribution of A may all be expressed analytically in terms of the mean values and standard deviations of the logs of the D, already found previously. This result is fundamental. All the relevant equations are summarized in the following Table 1. This table is actually the same as Table 2 of the author’s original paper IAC-08-A4.1.4, entitled “The Statistical Drake Equation” and presented by him at the International Astronautical Congress (IAC) held in Glasgow, UK, on October 1%, 2008. This original paper is reproduced in Appendix B. To sum up, not only is it found that V approaches the completely known lognormal distribution for an INFINITY of factors in the statistical Drake equation (7), but the way is paved to further applications by removing the condition that the number of terms in the product (7) must be FINITE. 13 UNCLASSIFIED / (POR OFEIGE=U62Ohe=
- Page 14born-digital extraction
UNCLASSIFIED / /§@8-@555GR41=U6-5-0Aisie This possibility of ADDING ANY NUMBER OF FACTORS IN THE DRAKE EQUATION (7) was not envisaged, of course, by Frank Drake back in 1961, when “summarizing” the evolution of life in the galaxy in SEVEN simple STEPS. But today, the number of factors in the Drake equation should already be increased: for instance, there is no mention in the original Drake equation of the possibility that asteroidal impacts might destroy the life on Earth at any time, and this is because the demise of the dinosaurs at the K/T impact had not been yet understood by scientists in 1961, and was so only in 1980! In practice, the number of factors should INCREASE as much as necessary in order to get better and better estimates of WV as long as our scientific knowledge increases. This is called the “Data Enrichment Principle” and believe should be the next important goal in the study of the statistical Drake equation. Finally, a numerical example explaining how the statistical Drake equation works in the practice will be given in the next section. 14 UNCLASSIFIED/
- Page 15born-digital extraction
UNCLASSIFIED / / R@R-@-F1GRA1H 65-08 Table 1. Summary of the Properties of the Lognormal Distribution That Applies to the Random Variable N = Number of ET Communicating Civilizations in the Galaxy Random variable N = number of communicating ET civilizations in galax Probability distribution (infu) a ge (7 = 0} Probability density function Mean value Standard deviation All the moments, i.e. k-th moment Mode (= abscissa of the lognormal peak) Value of the Mode Peak N ¢ Skewness K. a Kurtosis Ky gear 255 git 43 et = Expression of sin terms of the lower (a) and upper (b;) limits of the Drake uniform input random variables D; Expression of o’in terms of the lower (a) and upper (0) limits of the Drake uniform input random variables D; Hye Ride = Mpeak = © 1a) lela) I a,b,[In (6,}~In(a, r i-t i-l (bo,-4,) 7. An Example Explaining the Statistical Drake Equation To understand how things work in practice for the statistical Drake equation, please consider the following table 2. It is made up of three columns: e The first column on the left lists the seven input sheer numbers that also become » The mean values (middle column). ® Finally the last column on the right lists the seven input standard deviations. 15 UNCLASSIFIED / #6? @=tGhi-UGEONEE
- Page 16born-digital extraction
UNCLASSIFIED/ / The bottorn line is the classical Drake equation (7). We see that, for this particular set of seven inputs, the classical Drake equation (i.e. the product of the seven numbers} yields a total of 3500 communicating extraterrestrial civilizations existing in the galaxy right now. N i= Ns fp-ne-f-f-fe Table 2. Input Values (i.e. mean values and standard deviations) for the Seven Drake Uniform Random Variables Di. The first colurnn on the left lists the seven input sheer numbers that also become the mean values (middle column). Finally the last column on the right lists the seven input standard deviations. The bottom line is the classical Drake equation (7). The statistical Drake equation, however, provides a much more articulated answer than just the above sheer number WV = 3500. In fact, a MathCad code written by this author and capable of performing all the numerical calculations required by the statistical Drake equation for a given set of seven input mean values plus seven input standard deviations, yields for V the lognormal distribution (thin curve) plotted in Figure 2. We see immediately that the peak of this thin curve (i.e. the mode) falls at about fumde = peu Oo e* = 250 (this is equation (99) of Appendix B), while the median (fifty- fifty value splitting the lognormal density in two parts with equal undergoing areas) falls At ADOUE Fading =@% = 1740 . These seem to be smaller values than WV = 3500 provided by the classical Drake equations, but it’s a wrong impression due to a poor “intuitive” understanding of what statistics is! In fact, neither the mode nor the median are the “really important” values: the really important value for V is the MEAN VALUE! Now if you look at the thin curve in Figure 2 below (i.e. the lognormal distribution arising from the Central Limit Theorem), you see that this curve has a LONG TAIL ON THE RIGHT! In other words, it does NOT immediately go down to nearly zero beyond the peak of the mode. Thus, when you actually compute the mean value, you should not be too 16 UNCLASSIFIED / /®O@OFPRCERESSEONE
- Page 17born-digital extraction
UNCLASSIFIED/ /®@f-@FEIGH-U6E-ORi2== surprised to find out that it equals (N) =e" 2 « 4589.559 ~ 4590 communicating civilizations now in the galaxy. This is the important number, and it is HIGHER than the 3500 provided by the classical Drake equation. Thus, in conclusion, THE STATISTICAL EXTENSION of the classical Drake equation INCREASES OUR HOPES to find an extraterrestrial civilization! sf PROBABILITY DENSITY FUNCTION OF N Prob, density function of N N = Number of ET Civilizations m Galaxy Figure 2. Comparing the Two Probability Density Functions of the Random Variable N Found (1) Without Resorting to the CLT at All (thick curve) and (2) Using the CLT and the Relevant Lognormal Approximation (thin curve). Even more so our hopes are increased when we go on to consider the standard deviation associated with the mean value 4590. In fact, the standard deviation is given by equation (97) of Appendix B. This yields o, =e” ¢ ? ve" -1=11195 and so the expected number of V may actually be even much higher than the 4590 provided by the mean value alone! The “upper limit of the one-sigma confidence interval” (as statisticians call it), i.e. the sum 4590+11195 = 15,785, yields a higher number still! (Note: the “lower limit of the one-sigma confidence interval is ZERO because the lognormal distribution is POSITIVE (or, more correctly, non-negative)). Finally, the reader should note that the thick curve depicted in Figure 2 is just the NUMERICAL solution of the statistical Drake equation for a FINITE number of 7 input factors. Figure 2 actually shows that this curve “is well interpolated” by the lognormal distribution (thin curve), i.e., by the neat analytical expression provided by the Central Limit Theorem for an INFINITE number of factors in the Drake equation. That is, in conclusion, Figure 2 visually shows that taking 7 factors or an infinity of factors “is almost the same thing” already for a value as small as 7. 17 ‘UNCLASSIFIED / /®@fe@PEtGhUSPOne=
- Page 18born-digital extraction
UNCLASSIFIED / /PO@ROPEPCRE USES ONT 8. Finding the Probability Distribution of the Et-Distance By Virtue of the Statistical Drake Equation Having solved the statistical Drake equation by finding the lognormal distribution, we are now in a position to solve the ET-DISTANCE problem by resorting to statistics again, rather than just to the purely deterministic Distance Law (5), as we did in Section 2. This is “scientifically more serious” than just the purely deterministic Distance Law (5) inasmuch as the new statistical Distance Law will yield a PROBABILITY DENSITY for the Distance, with the relevant mean value and standard deviation. In other words, the Distance Law (5) itself becomes a random variable whose probability distribution, mean value and standard deviation must be computed by “replacing” into (5) the fact that is now known to follow the lognormal distribution. This is mathematically described in detail in Section 7 of Appendix A. The important new result is the PROBABILITY DENSITY FOR THE DISTANCE, the equation of which is . , | - } Fet_pDistane (1) aa € . (9) F V2ro holding for r20. This is equation (114) of Appendix B. Starting from this equation, the MEAN VALUE OF THE random variable ET_DISTANCE is computed as (ET_Distance)=Ce 3 ¢!8 (10) which is equation (119) of Appendix B, and finally the ET_DISTANCE STANDARD DEVIATION Hoe a Fier pisume = Ce st ee a | (11) which is equation (123) of Appendix B. Of course, all other descriptive statistical quantities, such as moments, cumulants etc. can be computed upon starting from the probability density (9), and the result is Table two hereafter, that is Table 3 of Appendix B. Finally, to complete this section, as well as this “introduction to the statistical Drake © equation,” the numerical values that equations (10) and (11) yield for the Input Table i are determined. They are, respectively: 3 uog Fean value = Ce e Ba 2,67) light years (12) fi 18 UNCLASSIFIED / /
- Page 19born-digital extraction
UNCLASSIFIED / APOR"OPPICEAL USE UONCT which is equation (153) of Appendix B, and noe o Orr Distme = CC 3 yl Ve 9 —] = 1,308 light years (13) which is equation (154) of Appendix B. 19 7 UNCLASSIFIED / /2@2-0555604ir062-O0int=
- Page 20born-digital extraction
UNCLASSIFIED / /POROPPreEH=6E-Ohae Table 2. Summary of the Properties of the Probability Distribution That Applies to the Random Variable ET_Distance Yielding the (average) Distance Between Any Two Neighboring Communicating Civilizations in the Galaxy Random variable ET_Distance between any two neighboring ET civilizations in galaxy assuming they are UNIFORMLY distributed throughout the whole galaxy volume. Probability distribution Probability density function Numerical constant C related to the Milky Way size Standard deviation All the moments, i.e. k-th moment Mode (= abscissa of the lognormal peak) - A . _ O&A ; (ET_Distance* ) =Ce 3¢ 18 7 = a3, ¥ Tiode = Poeuk =Ce e Peak Value of Fer nistane (4) = Value of the Mode Peak Median (= fifty-fifty probability value for N) Skewness H o : 3 =f ET Distane: ane = age > C270 it inedian =m=Ce 3 Expression of “in terms of the lower (ai) and upper (bi) limits of the Drake uniform input random variables Di Expression of "in terms of the lower (ai) and upper (bi) limits of the Drake uniform input random variables Di 2 oI _ ab, [in(B,)~ in (a;YT OF. — we? (6,-a,) Q ir] II M)-~ 20 UNCLASSIFIED / /F€
- Page 21born-digital extraction
21 UNCLASSIFIED / / F@f@FPGiti-USrOnEy UNCLASSIFIED / /®@R-@EFtGHHUGE-O)ir=
- Page 22born-digital extraction
UNCLASSIFIED/ J It is clarifying to draw the graph of the ET_Distance probability density (9): DISTANCE OF NEAREST ET_CIVILIZATION Probability density function (1 Aneters) 1500 «2000 =62500 =63000 »§=6 3500) »=— 4000) = 4500 ET_Distance from Earth (light y cars) 5000 Figure 3. The Probability of Finding the Nearest Extraterrestrial Civilization at the distance r From Earth (in light years) if the Values Assumed in the Drake Equation are Those Shown in Input Table 1. The relevant probability density function fer pisune(?) is given by equation (9), Its mode (peak abscissa) equals 1933 light years, but its mean value is higher since the curve has a long tail on the right: the mean value equals in fact 2670 light years. Finally, the standard deviation equals 1309 light years: THIS IS GOOD NEWS FOR SETI, inasmuch as the nearest ET galaxy civilization might fie at just 1 sigma = 2670-1309 = 1361 light years from us. From Figure 3 we see that the probability of finding extraterrestrials is practically zero up to a distance of about 500 light years from Earth. Then it starts increasing with the increasing distance from Earth, and reaches its maximum at 2 #06 =EPyae =Ce Fe % =1,933 light years. (14) Funde pea This is the MOST LIKELY VALUE of the distance at which we can expect to find the nearest extraterrestrial civilization, It is not the mean value of the probability distribution (9) for fez picane(t)+ In fact, the probability density (9) has an infinite tail on the right, as clearly shown in Figure 3, and hence its mean value must be higher than its peak value. As given by (10) and (12), its ee re Ce el = 2670 light years. This is the MEAN (value of the) DISTANCE at which we can expect to find extraterrestrials. oo mean value is + 22 UNCLASSIFIED/
- Page 23born-digital extraction
UNCLASSIFIED/ After having found the above two distances (1933 and 2670 light years, respectively), the next natural question that arises is: “what is the range, back and forth around the mean value of the distance, within which we can expect to find extraterrestrials with “the highest hopes?” The answer to this question is given by the notion of standard deviation that we already found to be given by (11) and (13), A ‘3 i OF T Fier pistime =C@ * Ve? —1 = 1309 light years. More precisely, this is the so-called 1-sigma (distance) level. Probability theory then shows that the nearest extraterrestrial civilization is expected to be located within this range, i.e. within the two distances of (2670-1309) = 1361 light years and (2670+1309) = 3979 light years, with probability given by the integral Of fur pigune (7) taken in between these two lower and upper limits, that is: S97 9igbiycars I ny Distina (r) dr il0. WS = TDG (1 5) 1 ss 36 lightyears In plain words: with 75 percent probability, the nearest extraterrestrial civilization is located in between the distances of 1361 and 3979 light years from us, having assumed the input values to the Drake Equation given by table 1. If we change those input values, then all the numbers change again, of course. 9. The “Data Enrichment Principle” as the Best CLT Consequence Upon the Statistical Drake Equation (Any Number of Factors Allowed) As a fitting climax to all the statistical equations developed so far, let us now state our “DATA ENRICHMENT PRINCIPLE.” It simply states that “The Higher the Number of Factors in the Statistical Drake equation, The Better.” Put in this simple way, it simply looks like a new way of saying that the CLT lets the random variable Y approach the normal distribution when the number of terms in the sum (4) approaches infinity. And this is the case, indeed. 10. Conclusions We have sought to extend the classical Drake equation to let it encompass Statistics and Probability. This approach appears to pave the way to future, more profound investigations intended not only to associate “error bars” to each factor in the Drake equation, but especially to increase the number of factors themselves. In fact, this seems to be the only way to incorporate into the Drake equation more and more new scientific information as soon as it becomes available. In the long run, the Statistical Drake equation might just become a huge computer code, growing in size and especially in the depth of the scientific information it contains. It would thus be Humanity’s first “Encyclopaedia Galactica.” 23 UNCLASSIFIED / SPO?e@PEEGEA-UGE-OAiLAee
- Page 24born-digital extraction
UNCLASSIFIED/ / Unfortunately, to extend the Drake equation to Statistics, it was necessary to use a mathematical apparatus that is more sophisticated than just the simple product of seven numbers. 24 UNCLASSIFIED / /#
- Page 25born-digital extraction
UNCLASSIFIED / / F@R-@FEEGEAIrUGE-Ohiae Appendix A: Proof of Shannon’s 1948 Theorem Stating That the Uniform Distribution is the “Most Uncertain” One Over a Finite Range of Values Information Theory was initiated by Claude Shannon (1916-2001) in his well-known 1948 two papers: Reprated with coreiton: Jom Te Bes! Sutren: Techated’ sburtiai Sol TM gp 37423 62s! aly. Orcober. 1943 a ae A Mathematical Theory of Communication By C. =. SHANNON In this Appendix, we wish to draw attention to a couple of theorems that Shannon proves on pages 36 and 37 of his work, and read, respectively (note that Shannon omits the upper and lower limits of all integrals in the first theorem: they are minus infinity and plus infinity, respectively): S. Let py! be a one-dimensional dsstinbution. The formef poy) giviag a maximum entropy subsect to dhe condition that the standard devianen cf x be fixed ata i: Gaussian. To show this we aust maximize Aix { penulagp xidy with oS |pouixay and it - jpoxiax as compass. This requires. by the calculus of variations. maxiouzing 4 /| givilogeix. ApLyiT + ppixl ax. The condinen tor this 15 5 = L legmare A -y—U and consequently fadpusting the constants to satisfy the constraints} l age ON) pe | - el yo and 25 UNCLASSIFIED / /F@R-OFREGERE USE ONEY - Page 26born-digital extraction
UNCLASSIFIED / / *. [fxs hated to a bal? ine (pixar — 6 for +! Ob and the first moment of x is fixed ata a i Pio dy. #3 then the mmasumum enmopy sccurs when Bivin se and 14 equal to log az. Now, we wish to point out that there is a third possible case, other than the two given by Shannon. This is the case when the probability density function p(x) is limited to a FINITE INTERVAL a<.x<b. This is obviously the case with any physical POSITIVE random variable, such as a distance, or the number AW of extraterrestrial communicating civilizations in the ,”. And it is easy to prove that for any such finite random variable the maximum entropy distribution is the UNIFORM distribution over a<x<>. Shannon did not bother to prove this simple theorem in his 1948 papers since he probably regarded it as too trivial. But we prefer to point out this theorem since, in the language of the statistical Drake equation, it sounds like: “Since we don’t know what the probability distribution of any one of the Drake random variables 2, is, it is safer to assume that each of them has the maximum possible entropy overa,<x<b, i.e., that D, is UNIFORMLY distributed there. The proof of this theorem is along the sare lines as for the previous two cases discussed by Shannon: We start by assuming that «; <x<o,. We then form the linear combination of the entropy integral plus the normalization condition for D, af”[- p(x) log p (x) +2 pi} dx =0 where 2 is a Lagrange multiplier. Performing the variation, one finds —log p&x)-14+.4 =0 that is: p(x)=e7 }. Applying the normalization condition (constraint) to the last expression for p(x} yields fh, P, by, | -| plx) de -{ e | de= ef dx =e? "(b, -a,) ih; Cy that yields 26 UNCLASSIFIED / / - Page 27born-digital extraction
UNCLASSIFIED / POT OPPECHHS USE ON and finally p(x)=——— with a,<x<b, showing that the maximum-entropy probability distribution over any FINITE interval a, <x<b, is the UNIFORM distribution. 27 UNCLASSIFIED / EDR. O5hLGieS ONE
- Page 28born-digital extraction
UNCLASSIFIED/ / Appendix B: Original Text of the Author’s Paper #IAC-08- A4.1.4 Titled the Statistical Drake Equation IAC-08-A4.1.4 THE STATISTICAL DRAKE EQUATION Claudio Maccone Co-Vice Chair, SET! Permanent Stidy Group, international Academy of Astronautics Address: Via Martorelli, 43 - Torino (Turin) 10155 - Itaty URL: http://www.maccone.com/ - E-mail: clmaccon@libero.it ABSTRACT. We provide the statistical generalization of the Drake equation. From a simple product of seven positive numbers, the Drake equation is now turned into the product of seven positive random variables. We call this “the Statistical Drake Equation,” The mathematical consequences of this transformation are then derived. The proof of our results is based on the Central Limit Theorem (CLT) of Statistics. In loose terms, the CLT states that the sum of any number of independent random variables, each of which may be ARBITRARILY distributed, approaches a Gaussian (i.e. normal) random variable. This is called the Lyapunov Form of the CLT, or the Lindeberg Form of the CLT, depending on the mathematical constraints assumed on the third moments of the various probability distributions. In conclusion, we show that: 1) The new random variable N, yielding the number of communicating civilizations in the Galaxy. follows the LOGNORMAL distribution. Then, as a consequence, the mean value of this lognormal distribution is the ordinary NV in the Drake equation. The standard deviation, mode, and all the moments of this lognormal V are found also. 2) The seven factors in the ordinary Drake equation now become seven positive random variables. The probability distribution of each random variable may be ARBITRARY. The CLT in the so-called Lyapunoy or Lindeberg forms (that both do not assume the factors to be identically distributed) allows for that. In other words, the CLT “translates” into our statistical Drake equation by allowing an arbitrary probability distribution for each factor. This is both physically realistic and practically very useful, of course. 3) An application of our statistical Drake equation then follows. The (average) DISTANCE between any two neighboring and communicating civilizations in the Galaxy may be shown to be inversely proportional to the cubic root of N. Then, in our approach, this distance becomes a new random variable. We derive the relevant probability density function, apparently previously unknown and dubbed “Maccone distribution” by Paul Davies. 4) DATA ENRICHMENT PRINCIPLE. It should be noticed that ANY positive number of random variables in the Stalistical Drake Equation is compatible with the CLT. So, our generalization allows for many more facturs to be added in the future as long as more refined scientific knowledge about each factor will be known to the scientists. This capability to make room for more future factors in the statistical Drake equation we call the “Data Enrichment Principle”, and we regard it as the key to more profound future results in the fields of Astrobiology and SETI. Finally, a practical example is given of how our statistical Drake equation works numerically. We work out in detail the case where each of the seven random variables is uniformly distributed around its own mean value and has a given standard deviation. For inslance, the number of slars in the Galaxy is assumed to be uniformly distributed around (say) 350 billions with a standard deviation of (say) | billion. Then, the resulting lognormal distribution ofN is computed numerically by virtue of a MathCad file that the author has written. This shows 28 UNCLASSIFIED/ Al
- Page 29born-digital extraction
UNCLASSIFIED / @R"OFEIGEiU6E-ONe= that the mean value of the lognormal random variable N is actually of the same order as the classical N given by the ordinary Drake equation, as one might expect from a good statistical generalization. 1, INTRODUCTION The Drake equation is a now famous result (see rel. [1] for the Wikipedia summary) m the fields of SETI (the Search for ExtraTerrestial Intelligence, see ref. [2]) and Astrobiology (see ref. [3]). Devised in 1960, the Drake equation was the first scientific attempt to estimate the number N of ExtraTerrestrial civilizations in the Galaxy with which we might come in contact. Frank D, Drake (see ref. [4]} proposed it as the product of seven factors: N = Ns- fp-ne- fl - ft» fe- fh. (1) Where: 1} Ns is the estimated number of stars in our Galaxy. 2) fp is the fraction (= percentage) of such slars that have planets. 3) ne is the number “Earth-type” such planets around the given star; in other words, re is number of planets, in a given stellar system, on which the chemical conditions exist for life to begin its course: they are “ready for life,” 4) flis fraction (— percentage) of such “ready for life’ planets on which life actually starts and grows up (but not yet to the “intelligence” level). 5) fi is the fraction (= percentage) of such “planets with life forms” that actually evolve until some form of “intelligent civilization” emerges (like the first, historic human civilizations on Earth). 6) fe is the fraction (= percentage) of such “planets with civilizations” where the civilizations evolve to the point of being able fo communicate across the imierstellar distances with other {at least) similarly evolved civilizations. As far as we know in 2008, this means that they must be aware of the Maxwell equations governing radio waves, as well as of computers and radioastronomy {at least). 7) fL is the fraction of galactic civilizations alive at the time when we, poor humans, attempt to pick up their radio signals (that they throw out inte space just as we have done since 1900, when Marconi started the transatlantic transmissions), In other words, fE is the 29 number of civilizations now transmitting and recciving, and this implies an estimate of “how long will a technological civilization live?” that nobody can make at the moment. Also, are they going to destroy themselves in a nuclear war, and thus live only a few decades of technological civilization? Or are they slowly becoming wiscr, reject war, speak a single language (lke English today), and merge intu a single “nation”, thus living in peace for ages? Or will robots take over one day making ‘flesh animals” disappear forever (the so-called “post-biological universe”)? No one knows... But let us go back to the Drake equation (1). In the fifty years of its existence, a number of suggestions have been put forward about the different numeric values of its seven factors. Of course, every different set of these seven input numbers yields a different value for VN, and we can endlessly play that way. Bul we claim that these are like... children plays! We claim the classical Drake equation (1), as we shall call it from now on to distinguish it from our statistical Drake equation to be introduced in the coming sections, well, the classical Drake equation is scientifically inadequate in one regard at least: it just handles sheer numbers and does not associate an error bar to each of its seven factors. At the very least, we want to associate an error bar to each D;. Well. we have thus reached STEP ONE in our improvement of the classical Drake equation: replace each sheer number by a_ prabability distribution! The reader is now asked to look at the flow chart in the next page as a guide lo this paper, please. 2. STEP 1: LETTING EACH FACTOR BECOME A RANDOM VARIABLE In this paper we adopt the notations of the great book “Probability, Random Variables and Stochastic Processes” by Athanasios Papoulis (1921-2002), now re-published as Papoulis-Pillai, UNCLASSIFIED / /®@PReOPPTCrAT USE ONtT - Page 30born-digital extraction
UNCLASSIFIED/ ref. [5]. The advantage of this notation is that it makes a neat distinction between probabilistic (or statistical: it's the same thing here) variables, always denoted by capitals, from non-probabilistic (or “deterministic”) variables, always denoted by lower-case letters. Adopting the Papoulis notation also is a tribute to him by this authur, who was a Fulbright Grantee in the United States with him at the Polytechnic Institute (now Polytechnic University) of New York in the years 1977-78-79. We thus introduce seven new (positive) random variables D, (“D™ from “Drake”) defined as D, = Ns D,= fp D,= He D,= fl (2) Ds, = fi Dy = fe D,= fi so that our STATISTICAL Drake equation may be simply rewrillen as 30 UNCLASSIFIED / /. N=[[9,. (3) Of course. N now becomes a (positive) random variable too, having its own (positive) mean value and standard deviation, Just as each of the BD, has its own (posilive) mean value and standard deviation... ... the natural question then arises: how are the seven mean values on the right related 10 the mean value on the left? ... and how are the seven standard deviations on the right related to the standard deviation on the left? Just take the next step... 3. STEP 2: INTRODUCING LOGS TO CHANGE THE PRODUCT INTO A SUM Products of randym variables are nol casy to handle in probability theory. It is actually much easier lo handle sums of random variables, rather than products, because: 1) The probability density of the sum of two or more independent random variables is the convolution of the relevant probability densities (worry not about the equations, right now). 2). The Fourier transform of the convolution simply is the product of the Fourier transforms (again, worry not about the equations, at this point)
- Page 31born-digital extraction
eneneeoecupepynin ne vecechtnh inn MeDNROUNS REN eons emcee mannttnt tefeenere/mrninrmttincnnetinn renin efi UNCLASSIFIED / SPOR OPPeTAL USS Ont 1, Introduction 2. Step 1: Letting each factor become a random 2.1. Step 2: Introducing logs to change the product into a 2.2. Step 3; The transformation law of random variables. 3. Step 4: Assuming the easiest input distribution for each D;: the uniform distribution. 3.1. Step 5: A numerical example of the Statistical Drake equation with uniform distributions for the Drake random variables 0;. 3.2. Step 6: Computing the logs of the 7 uniformly distributed Drake random variables B:, 3.3. Step 7: Finding the probability density function of 4, but only numericalty not analytically. DEAD END! 4, The Central Limit Theorem (CLT) of Statistics, . LOGNORMAL distribution as the probability distribution of the number N of communicating ExtraTerrestrial Civilizations in the Galaxy. Comparing the CLT results with the Non-CLT results, and discarding the Non-CLT approach, . DISTANCE to the nearest ExtraTerrestrial Civilization as a probability distribution (Paul Davies dubbed that the Maccone distribution). 7.1L Classical, non-probabilistic derivation of the Distance to the nearest ET Civilization. 7.2 Probabilistic derivation of probability density function for nearest ET Civilization Distance. 7.3 Statistical properties of the distribution. 7.4 Numerical example of the distribution. 8. DATA ENRICHMENT PRINCIPLE as the best CLT consequence upon the Drake equation: any number of factors allowed far. 31 UNCLASSIFIED / SPOR SPFTerAT est oNcrT
- Page 32born-digital extraction
UNCLASSIFIED/ / So, let us take the natural logs of both sides of the Statistical Drake equation (3) and change it into a sum: in(w) = {TT »,- Tal). (4) It is now convenient to intraduce eight new (positive) random variables defined as follows: { Y =In(W) ly, =In{D,) #=1,....7. Upon inversion, the first equation of (5) yields the important equation, that will be used in the sequel N=e’. (6) We are now ready to take STEP THREE. STEP 3: THE TRANSFORMATION LAW OF RANDOM VARIABLES So far we did not mention at all the problem: “which probability distribucion shall we attach to each of the seven (positive) random variables D.?” It is not casy to answer this question because we do not have the least scientific clue to what probability distributions fil at best to cach of the seven points listed in Section |. Yet, at least one trivial error must be avoided: claiming that each of those seven random yariables must have a Gaussian (i.e. normal) distribution. In fact, the Gaussian distribution, having the well- known bell-shaped probability density function felsmal=zi—-e ?* (720) has ils independent variable y ranging between —o and o2 and so it can apply to a real random variable ¥ only, and never to positive random variables like those in the staustical Drake equation (3). Period. Searching again for probability density functions that represent positive random variables, an obvious choice would be the gamma distributions (see, for instance, ref. [6]). However, we discarded this choice too because of a different reason: please keep in mind that, according to (5), once we selected a particular 32 UNCLASSIFIED/ (5) type of probability density function (pdf) for the last seven of equations (5), then we must compute the (new and different) pdf of the logs of such random variables. And the pdf of these logs certainly is not gamima-type any more. It is high time now to remind the reader of a certain theorem that is proved in probability courses, but, unfortunately, does not seem to have a specific name. [t is the fransformation law (so we shall call it, see, for instance, ref. [5J) allowing us to compute the pdf of a certain new random variable Y that is a known function Y = e(X ) of another random variable X having a known pdf. In other words, if the pdf f, (x) of a certain random variable X is known, then the pdf f,()') of the new random variable Y, related to X by the functional relationship ¥ = 2({x) (8) can be calculated according to this rule: 1} First invert the corresponding non-probabilistic cqguation v= glx) and denote by xy} the various real roots resulting from the this inversion, 2) Second, take notice whether these real roots may be either finitely- or infinitely-many, according to the nature of the function y = g(x). 3) Third, the probability density function of Y is then given by the (finite or infinite) sum ; fel Cy) fy\yJ=) (9) eer) ; where the summation extends to all roots x;(y) and # (x{v)} is the absolute value of the first derivative of g(x) where the i-th root x; (y) has been replaced instead of x. Since we must use this transformation law to transfer from the D, to the ¥; =In{D;), it is clear that we need to start from a ); pdf that is as simple as possible. The gamma pdf is not responding to this need because the analytic expression of the transformed pdf is very complicated (or, at least, it looked so to this author in the first instance). Also, the gamma distribution has two free parameters in it, and this “complicates” its application to the various meanings of the Drake cquation. In conclusion, we discarded the gamma distributions and confined - Page 33born-digital extraction
UNCLASSIFIED / SP@te@EEtGhirU6i-Ohhee=e ourselves to the simpler uniform distribution instead, as shown in the nest section. 4. STEP 4: ASSUMING THE EASIEST INPUT DISTRIBUTION FOR EACH BD; : THE UNIFORM DISTRIBUTION Let us now suppose that each of the seven Dj; is distributed UNIFORMLY in the interval ranging from the lower limit a; >0 to the upper limit b, 2 ;. This is the same as saying that the probability density function of each of the seven Drake random variables D, has the equation Junior. {x) = withO<sa,sxsbh, (10) t — €; as it follows at once from the normalization condition d. ; [ Funiiormb, (x) dx=1. (11) Let us now consider the mean value of such uniform 0; defined by D>, : ] Hy (uniform_D;) = i x uniter, (x) dx = { x dx tt b;, — a; a, Cd By words (as it is intuitively obvious): the mean value of the uniform distribution simply is the mean of the Jower plus upper limit of the variable range a, +8; 2 (uniform_D, }= In order to find the variance of the uniform distribution, we first need finding the second moment b, ae 2 "ae pe ; (uniform_D, ) a [ de tab uniform D; («) dx b, : 1 [xt ft bf a; vo dx= Se ay &, b-a;[ 3] 3 (b, -a;} a3 _ (i, ~a,)(a? +a;b; +62) my a; +a,b, +b? 3 (b, —a,) 3 The second moment of the uniform distribution is thus a? +a,b, +b? (uniform_D,”) = ent (13) From (12 and (13) we may now derive the variance of the uniform distribution ? oye > ‘ea 3 Funitom_D, = (uniform_D;” ) ‘a(uniform_D, } 2 2 2 2 _ 4; +4; +b; (a+b) _ Gj -a) .. wads Upon taking the square root of both sides of (14), we finally obtain the standard deviation of the uniform distribution: Sane, — 1 . (15) ont] We now wish to perform a calculation that is mathematically trivial, but rather uncxpected from the intuitive point of view, and very important for our applications to the statistical Drake cquation. Just consider the lwo simullancous cquations (12) and (15) we a, +b; (uniform_D, ) = a step (16) _ a; Oo a = uniforn; 7 af% Upon inverting this trivia] linear system, one finds (a ; = (uniform_D; )- V30 uniform D, 17 [2 = (uniform_D; )okV3 Finite D, = This is of paramount importance for our application the Statistical Drake equation inasmuch as it shows that: if one (scientifically) assigns the mean value and standard deviation of a certain Drake random variable Dj, then the lower and upper limits of the relevant uniforin distribution are given by the two equations (17), respectively. UNCLASSIFIED / (2@2e@ EGR VGE-Obieie - Page 34born-digital extraction
UNCLASSIFIED/ / In other words, there is a factor of V3 =] 732 included in the two equations (17) that is not obvious at all to human intuition, and must indeed be taken into account. The application of this result to the Statistical Drake cquation is discussed in the next section. 3.1 STEP 5; A NUMERICAL EXAMPLE OF THE STATISTICAL DRAKE EQUATION WITH UNIFORM DISTRIBUTIONS FOR THE DRAKE RANDOM VARIABLES D; The first variable Ms in the classical Drake equation (1) 3s the number of stars in our Galaxy. Nobody knows how many they are exactly (1). Only statistical estimates can be made by astronomers, and they oscillate (say) around a mean value of 350 billions Gif this value is indeed correct!). This being the situation, we assume that our uniformly distributed random variable Ns has a mean value of 350 billions minus or plus a standard deviation of (say) one billion (we don’t care whether this number is Scientifically the best estimate as of August 2008: we just want to set up a numerical example of our Statistical Drake equation). In other words, we now assume that one has: (uniform_D,) = 350-10” oe uniform 1, Therefore, according to equations (17) the lower and upper limit of our uniform distribution for the random variable Ns=D, arc, respectively ay, = (uniform D,)— V3 Sonitomm,p, = 348-3-10° a = : 2 = 7 9 Bx = (uniform_D,) + V3 Oyyinsm, = 351.7-10 Similarly we proceed for all the other six random variables in the Statistical Drake cquation (3). For instance, we assume that the fraction of stars that have plancts is 50%, ic. 50/100, and this will be the mean valuc of the random variable fo=Dx1. We also assume that the relevant standard deviation will be 10%, i. ¢. that @,, =10/100 . Therefore, the 34 UNCLASSIFIED/ / relevant lower and upper limits for the uniform distribution of fo=D2 turn oul to be fie (uniform_D» ) a yo Ouandes: = 0.327 3 (20) b ip = (uniform_D, ) ifs V3 Cuniorm Db, = 0.673 The next Drake random variable is the number ne of “Earth-type” planets in a given star system. Taking example from the Solar System, since only the Earth is truly “Earth-type”, the mean value of ne is clearly 1, but the standard deviation is nat zero if we assume that Mars also may be regarded as Earth- type. Since there are thus two Earth-type planets in the Solar System, we must assume a_ standard deviation of 1/V¥3 =0.577 to compensate the V3 appearing in (17) in order to finally yicld two “Earth- type” planets (Earth and Mars) for the upper limit of the random variable ne. In other words, we assume that ay = (uniform_D 3)a V3 Funitorm.D, = 0 (21) ae =. (uniform_D 2)= V3 Finitirm D, =2 The next four Drake random variables have even more “arbitrarily” assumed values that we simply assume for the sake of making up a numerical example of our Statistical Drake equation with uniform entry distributions. So, we really make no assumption about the astronomy, or the biology, or the sociology of the Drake equation: we just care about its mathematics. All our assumed entries are given in Table 1. Please notice that, had we assumed all the standard deviations to cqual zere in Table 1, then our Statistical Drake equation (3) would have obviously reduced to the classical Drake equation (1), and the resulting number of civilizations in the Galaxy would have turned oul lo be 3500: N =3500|, (22) This is the important deterministic mimber that we will usc im the sequel of this paper for comparison with our statistical results on the mean value of N, i.e. {W). This will be explained in Sections 3.3 and 5. - Page 35born-digital extraction
UNCLASSIFIED / /POTe OP PEC SS Ones GNS = 110° Table 1. Input values (i.c. mean values and standard deviations) for the seven Drake uniform random variables D;. The first column on the left lists the seven input sheer numbers that also become the mean values (middle column). Finally the last column on the right lists the seven input standard deviations. The bottom line is the classical Drake equation (1). 3.2 STEP 6: COMPUTING THE LOGS OF THE 7 UNIFORMY DISTRIBUTED DRAKE RANDOM VARIABLES D; Intuitively speaking, the natural log of a uniformly distributed random variable may aot be another uniformly distributed random yariable! This is obvious from the trivial diagram of y =In(x) shown below: Natural logarithm of x REAT. values of the natural log: y=In(x) POSITIVE independent variable x Figure 1. The simple function v = Infx). 35 Su. if we have a uniformly distributed random variable D; with lower limit @,and upper limit 5;, the random variable ¥,=In(D,) i=1....,7 (23) mus! have its range limited in between the lower limil fn(a;) and the upper limit é#(bj}. In other words, this are the lower and upper limits of the relevant probability density function fy (vy). But what is the actual analytic expression of such a pdf?. To find it, we must resort to the general transformation law for random variables, dcfincd by equation (9). Here we obviously have y=alx)= In(x) (24) That, upon inversion, yields the single root x(y)=a(yJ=e". (25) On the other hand, differentiating (24) one gets UNCLASSIFIED / 42020-5263 UGE ONES
- Page 36born-digital extraction
UNCLASSIFIED/ / Pees and g (x,(y))= ] | x aly) e* where (25) was already used in the last step. By virtue of the uniform probability density function (10) and of (26), the general transformation law (9) finally yields iv yp= fy (rdy))_ | oe, ee fy) die Col as -. 27) In other words, the requested pdf of ¥; is Probability density functions of the natural logs of all the uniformly distributed Drake random vartables Dj . This is indeed a positive function of y over the interval In(a, \< ys In(s, ), as for every pdf, and it is cusy lo see that its normalization condilion is fulfilled: (hy glnty, ) Joli J infh } ot ab fo febday= PY ay = a inde) * | nla), — a; b, —4 (29) Next we want to find the mean value and standard deviation of ¥, , since these play a crucial role for fulure developments. The mean value (Y, ) is given by Inf, ) Inf.) yee" Yj= [ an Ndy= { = a ( : In(u, " fy (vy) : | : ner, }b, — at; _ & lino, )-t-a;[In(e; )—1] | Bes b, - a; This is thus the mean value of the natural log of ail the uniformly distributed Drake random variables D; (¥,) = (in(0,)) = b, [In(o; )—1]—a;[ln(@,}—1] b; —a; » 31) 36 UNCLASSIFIED / / In order to find the variance also. we must first compute the mean value of the square of ¥,, that is ‘ lafh, } : Inf, Fy? ge (v7) =| yoo fy (v)ay=| eddy nla, j‘ Infu, ) b, ~ a; _} Im *{b,)- 2 In (b; )+ le tt;lm?(a,)- 2 in(e; )+ 2| b. — a; eS2) The variance of Yi = In(Di) is now given by (2) minus the square of (31), that. after a few reductions, yield: a;b, [in{4,)-Infe; iF 4 3 Fy = Finin,y = 1- 3 (33) 6-4) Whence the corresponding standard deviation a,b, [in(6, )-In{a; ia Oy, = Cay) = td) (b, - a; : Let us now turn to another topic: the use of Fouricr transforms, that, m probability theory, are called “characteristic functions,” Following again the notations of Papoulis (ref. [$]) we call “characteristic function”, @y (f) , of an assigned probability distribution Y; , the Fourier transform of the relevant probability density function, that is (with j= v—-1) (35) The use of characteristic functions simplifies things greatly. For instance, the calculation of all moments of a known pdf becomes trivial if the relevant characteristic function is known, and greatly simplified also are the proofs of important theorems of statistics, like the Central Limit Theorem that we will use in Section 4. Another important result 1s that the characteristic function of the sum of a finite number of independent random variables is simply given by the product of the corresponding characteristic functions. This is just the case we are facing in the Statistical Drake equation (3) and so we are now led to find the characteristic function of the random variable ¥; , i.e. - Page 37born-digital extraction
UNCLASSIFIED / /=@R-O FFG USE ONE ] Int, } (lige )}y = I . 1 [ets idhy ine } siete gS ay b, ei ina; } . b, —d; 14i¢ ne, } glltis Jin», } Ey 7 (6; —a,)(1+ jg) je) a 1. ig l-i¢ else lin, ple _ gl J pt 36 “©, —a;)(1+ jo) ) Thus, the characteristic function of the nafural log of the Drake uniform random variable Dj ix given by (37) 3.3 STEP 7: FINDING THE PROBABILITY DENSITY FUNCTION OF N, BUT ONLY NUMERICALLY NOT ANALYTICALLY Having found the characteristic functions d, (¢)} of the logs of the seven input random t variables D; . we can now immediately find the characteristic function of the random variable Y = In(¥) defined by (5). In fact, by virtue of (4), of the well-known Fourier transform property stating that “the Fourier transform ofaconvolution is the product of the Fourier transforms”, and of (37), it immediately follows that @,(¢) cquals the product of the seven @y (4): b [+ fy tis d-Tloxle Diem css . (38) The next step is to @vert this Fourier transform in order 1o get the probability density function of the random variable ¥Y = In(A). In other words, we must compute the following inverse Fourier transform [ee aes : frls)=5— fe ey (e) ae of igs Ile (f)\d¢ OR ae i-l ‘ ~altt { ~ oe 3 plliés a | pes Peae. (3s male Ig — td; 04 5S 5 |- - 37 This author regrets that he was unable to compute the last integral analytically. He had to compute it numerically tor the particular values of the 14 a; and b; that follow from Table | and equations 17. The result was the probability density function for Y = In(N) plotted in the following Figure 2. aaROE. DENSITY FUNCTION OF Yzeln(N) BERGER SERENE TEE A TTT MERCY ERGE blll TAL, 5 6 7 8 9 IDFE 12 iia variable ¥ = In(N) Probability density function of Y Figure 2. Probability density function of Y = In(Q) computed numerically by virtue of the integral (39). The two “funny gaps” in the curve are due to the numeric limitations in the MathCad numeric solver that the author used for this numeric computation. We are now just one more step from finding the probability density of NN, the number of ExtraTerrestrial Civilizations in the Galaxy predicted by our Statistical Drake equation (3). The point here is to transter from the probability density function of ¥ to that of A, knowing that Y = In(), or alternatively, that N=cxp(Y), as stated by (6). We must thus resort to the transformation law of random variables (9) by sctting = e{xjse*. (40) This, upon inversion, yields the single roat xi(y)=-y)=In(y). (41) On the other hand, differentiating (40) one gets g{xj=e" and ge x(y)J=e=y 42) where (41) was already used in the last step. The general transformation law (9) finally yields ef De fx Q; cae ie [ fr(in(y)). 43) UNCLASSIFIED / / B@R-@FEIGEe USP ONLY - Page 38born-digital extraction
UNCLASSIFIED/ / This probability density function ty (y) was computed numerically by using (43) and the numeric curve given by (39), and the result is shown in Figure 3. 4 ge oes BILITY DENSITY FUNCTION OF N vA i= zai 3 8 = 9.107 rar f 5 1-10 £ Ay 0) (000 2000 3000 4000 N = Number of ET Civilizations in Galaxy Figure 3. The numeric (and not analytic) probability density function curve fy, (v) of the number NV of ExtraTerrestrial Civilizations in the Galaxy according to the Statistical Drake equation (3). We sce that the curve peak (i.c. the mode) is very close to low valucs of N, but the tail on the right is high, meaning that the resulting mean value (N ) is of the order of thousands. We now want to compute the mean value (N ) of the probability density (43). Clearly, it is given by (V)= |» fa (v)ey. (44) 0 This integral too was computed numerically, and the result was a perfeet match with N=3500 of (22), that is (N) = 3499,99880 177509 + 0.00000012 49146861 (45) Note that this result was computed numerically in the complex domain because of the Fourier transforms, and that the real part is virtually 3500 (as expected) while the imaginary part is virtually zero because of the rounding errors. So, this result is excellent, and proves that the theory presented so far is mathematically correct. Finally we want to consider the standard deviation. This also had to be computed numerically, resulting in Oy, = 3953.42910 143389 +0.00000003 2800058: . (46) 38 This standard deviation, higher than the mean value, implies that N might range in between 0 and 7453. This completes our study of the probability density function of N if the seven uniform Drake input randam variable D; have the mean values and standard deviations listed in Table |. We conclude that, unfortunately, ever under the simplifying assumptions that the Di be uniformly distributed, if is impossible te solve the full problem analytically, since all calculations beyond equation (38) had fo be performed numerically. This is ne good. Shall we thus loose faith, and declare “impossible” the task of finding an analytic expression for the probability density function fy, (y) 7 Rather surprisingly, the answer is “no™, and there is indeed a way out of this dead-end, as we shall sec in the next section. 5. THE CENTRAL LIMIT THEOREM (CLT) OF STATISTICS Indeed there is a good, approximating analytical expression for f,, (v) , and this is the following lognormal probability density function nly ay e 3 (¥20)). (47) 2a ty (y,Hs c) a To understand why, we must resort to what is perhaps the most beautiful theorem of Statistics: the Central Limit Theorem (abbreviated CLT). Hislorically, the CLT was in fact proven first in 1901 by the Russian mathematician Alexandr Lyapunov (1857-1918), and later (1920) by the Finnish mathematician Jarl) Waldemar Lindeberg (1876-1932) under weaker conditions. These conditions are certainly fulfilled in the context of the Drake cquation because of the “reality” of the astronomy. biology and socivlogy involved with it, and we are not going to discuss this point any further here. A good, synthetic description of the Central Limit Theorem (CLT) of Statistics is found at the Wikipedia site (ref. [7]) to which the reader is referred for more details, such as the equations for the Lyapunov and the Lindeberg conditions, making the theorem “rigorously” valid. UNCLASSIFIED / 42@R-@FPIGEA=UOPONEY
- Page 39born-digital extraction
UNCLASSIFIED / / F@?e@6tGhti- U6 Ohh Put in loose terms, the CLT states that, if one las @ sum of random variables even NOT identically distributed, this sum tends ta a normal distribution when the number of terms making up the sum tends to infinity. Alse, the normal distribution mean value ts the sum af the mean values af the addend random variables, and the normal distribution variance is the sum of the variances of the addend random variables. Let us now write down the equations of the CLT in the form needed to apply it to our Statistical Drake equation (3). The idea is to apply the CIT to the sum of random variables given by (4) and (5) whatever their probability distributions can possibly be. In other words, the CLT applied to the Statistical Drake equation (3) leads immediately to the following three equations: }) The sum of the (arbitrarily distributed) independent random variables ¥, makes up the new random variable Y. 2) The sum of their mean values makcs up the new mean value of Y. 3) The sum of their variances makes up the new variance of ¥. In equations: y=Sy, (Y,) (48) oy= Do This completes our synthetic description of the CLT for sues of random variables. 6. THE LOGNORMAL DISTRIBTION IS THE DISTRIBUTION OF THE NUMBER N OF EXTRATERRESTRIAL CIVILIZATIONS IN THE GALAXY The CLT may of course be extended to products of random variables upan taking the logs of both sides, just as we did in equation (3), it then follows that the exponent random variable, like Y¥ in (6), fends to a normal random variable, and, as a consequence, it follows that the base random variable, like N in (6), tends to a lognormal random variable. 39 To understand this fact better in mathematical terms consider again of the transformation law (9) of random variables. The question is: what is the probability density function of the random variable N in equation (6), that is, what 1s the probability density function of the lognormal distribution? To find it, set y=gtx}se*. (49) This, upon inversion, yiclds the sizgle root x{y)=4(y)=ny). (50) On the other hand, differentiating (49) one gets g(xj=e" and gi{xy(vel™=y (51 where (50) was already used in the last step. The general transformation law (9) finally yields flatly ‘i ful DD TACE py fen). (52) Therefore, replacing the probability density on the right by virtue of the well-known normal (or Gaussian) distribution given by equation (7), the lognormal! distribution of equation (47) is found, and the derivation of the lognormal distribution from the normal distribution is proved. In view of future calculations, it is also useful to point out the so-called “Gaussian integral”, that is: E ae —AYe ve a 4) { : eh dem | et, ASO: BE wal. TS, This follows immediately from the normalization condition of the Gaussian (7), that is (53) _fa-uF { ee ae ean (54) 1 Yara just upon expanding the square at the exponent and making the two replacements (we skip all steps) “a (55) UNCLASSIFIED / /®O?feOPPECRRE esr One - Page 40born-digital extraction
UNCLASSIFIED/ / In the sequel] of this paper we shall denote the independent variable of the lognormal distribution (47) by a lower case letter n to remind the reader that corresponding random variable N is the positive integer number of ExtraTerrestrial Civilizations in the Galaxy. In other words, 7 will be treated as a positive reaf number in all calculations to follow because it is a “large” number {i.e. a continuous variable) compared to the only civilization that we know of, i.e. ourselves. In conclusion, from now on the lognormal probability density function of N will be written as (Intfn Rya}? l | 2c e (7 = 0) (56) fy (x) a ye: pa Having so said. we now turn to the statistical properties of the lognormal distribution (55). i.e. to the statistical properties that describe the number of ExtraTerrestrial Civilizations in the Galaxy. Our first goal is to prove an equation yielding all the moments of the lognormal distribution (56), that is, for every non-negative integer k ~ 0, 1, 2,2... one has (37) The relevant proof starts with the definition of the k- th moment One then transforms the above integral by virtue of the substitution In[r| oe (58) The new integral in z is then seen to reduce to the Gaussian integral (53) {we skip all steps here) and (57) follows 40 Upon setting &£=0 into (56), the normalization condition for f,, (21) follows fits(n)dn=1, (59) Upon setting k=1 into (56), the important mean value of the random variable N is found > x (N\ =e e? |, (60) Upon setting & =2 into (56), the mean valuc of the square of the random variable N is found (N*) =e? er . (61) The variance of N now follows from the last two formulae: (62) The square root of this is the important standard deviation formula for the N random variable (63) The third moment is obtained upon setting &k=3 into (56) (v3\ =e! oe?" ; (64) Finally, upon setting & =4, the fourth moment of N is found (n*) me4 eBe" , (65) Our next goal is to find the cumulants of N. In principle, we could compute all the cumulants K; from the generic i-th moment y, by virtue of the recursion formula (see ref. [%]) Ae es | K; = fl; -> ' ef K, My-k : (66) UNCLASSIFIED / /(®@feOFRIGHE= USE ORES - Page 41born-digital extraction
UNCLASSIFIED / /P@?eOPPIGH U6 ON In practice, however, here we shall confine ourselves to the computation of the first four cumulants only because they only. are required to find the skewness and kurtosis of the distribution. Then, the first four cumulants in terms of the first four moments read: i a Ky =, — Ky (67) Ky = 4, —-3K, K, - Kj Ky = 4, -4K, Ky-3 Ky -6K, Ky — Ky. These equations yield, respectively: gr K, =e"? (68) K; nett ev (2 1). (69) y = K,=ee? . (70) ya + K,= ener (0 - i)(32° $32" 460% 4+ 6] (71) From these we derive the skewness K, (K,); = er + 2) + 7 Clk ie Bf 332 ee ee . —TP le" +3¢°° +6e° +6 (72) and the kurtosis ents ameete se. KG) (K2)° Finally, we want to find the mode of the lognormal probability densily function, i.e. the abscissa of its peak. To do so, we must first compute the derivative of the probability density function fy(n) of equation (56), and then set it equal to zero. This derivative is actually the derivative of the ratio of two functions of 7, as it plainly appears from (57). Thus, let us set for a moment 41 2 F(n)= (mae l-w (74) 2a7 where “E™ stands for “exponent.” Upon differentiating this, one gets ! 20 (75) E {n)= 3 -2{Infx]- Ht): pT But the lognormal probability density function (36), by virtue of (74), now reads l eo El) i cal yo z (76) So that its derivative is fer visene (")_ se NE (n)-n-tee FY dr lao nH? pe EN (n) ned Qn n Setting this derivative equal to zero means setting (77) E (n)-n4 1=0 (78) That is, upon replacing (75). +. n{n]— x) + l=0. (79) ae Rearranging, this becomes Infn|- wt+o? =0 (80) and finally = a = ra (81) Unde peal This is the most likely number of ExtraTerrestrial Civilizations in the Galaxy. How likely? To find the value of the probability density function f,(#) corresponding to this value of the mode, we must obviously replace (81) into (56). After a few rearrangements, one then gets UNCLASSIFIED /-45@ RaQ EELS SEO hie - Page 42born-digital extraction
UNCLASSIFIED / / FO-R-GEEbGR Abe GE=O abe : I dy (" auded = . V2a oO This is “how likely” the most likely number of ExtraTerrestrial Civilizations in the Galaxy is, i.e. it is the peak height in the lognormal probability density function f,,(n). Next to the made, the median wr (ref. [9]) is one more statistical number used to characterize any probability distribution. It is detined as the independent variable abscissa mm such that a realization of the random variable will take up a value lower than i with 50% probability or a value higher than m with 50% probability again. In other words, the median wz splits up our probability density in exactly two equally probable parts. Since the probability of occurrence of the random event equals the arca under its density curve (i.c. the definite integral under its density curve) then the median wa (of the lognormal distribution, in this cusc) is defined as the integral upper Limi #7: Uinta ye y He (na a | 1 Is l (83) {, ty nda = | oe “5° By In order to find #1, we may net differentiate (83) with respect to wi, since the “precise” factor {4 on the right would then disappear into a zero. On the contrary, we may try to perform the obvious substitution 5 re ee ey (84) into the integral (83) to reduce it to the following integral defining the error function erf(z) Probability density function Standard deviation UNCLASSIFIED/ = number of communicating ET civil Probability distribution Mean value Variance enf (x)= - fea: (85) oan) Then, after a few reductions that we skip for the suke of brevity, the full equation (83) is turned into 1 if infer)- yw) steq[ Bee \-4 (86) that is In{wn) — 42 2Hf) —-—— |= 0 87) of Be | Since from the definition (85) one obviously has ert(0)=0, (87) becomes ln (on) — ys =0 (88) V20 [median =m=e] (89) This is the median af the lognormal distribution af N. din other words, this is the number of ExtraTerrestrial civilizations in the Galaxy such that, with 50% probability the actual value af N will be lower than this median, and with 50% probability it will be higher. whence finally In conclusion, we feel useful to summarize all the equations that we derived about the random variable N in the following Table 2. izations in Galaxy - Page 43born-digital extraction
Expression of in terms of the lower (a;) and upper (b,) lunits of the Drake uniform inpul random variables D; Expression of o? in terms of the lower (a,) and upper (;) limits of the Drake uniform input random variables D; = = pt ooo Ayode = Ayeak =e e _ a,b, [in(6,)—In(a, ie (b, — a; i Table 2. Summary of the properties of the lognormal distribution that applies to the random variable N = number of ET communicating civilizations in the Galaxy. We want to complete this section about the lognormal probability density function (56) by finding out its mumeric values for the inputs to the Statistical Drake equation (3) listed in Table 1. According to the CLT, the mean value gy: to be inserted into the lognormal densily (56) is given (according to the second equation (48)) by the sum of all the mean values (Y,). that is, by virtue of (31), by: A = x lin(@; )- =a; [in(a; )- i](90) Upon replacing the 14 a; and 6, listed in Table 1 into (90), the following numeric mean value jz is found 1 Similarly, to get the numeric variance o? one must resort to the last of equations (48) and to (33): . 2 Pe oF a Sy _ ab; [in (, }— In(a,I (92) 7 =I (b, -a;) q 43 yiclding the following numeric variance o” to be inserted into the lognormal pdf (56) co? = 1.938725 (93) whence the numeric standard deviation o oF = 1.392381 |, (94) Upon replacing these two numeric values (84) and (86) into the lognormal pdf (36), the latter is perfectly determined. It is plotted in Figure 4 hereafter as the thin curve. In other words, Figure 4 shows the lognorinal distribution for the number N of ExtraTerrestrial Civilizations in the Galaxy derived from the Central Limit Theorem as applied to the Drake equation (with the input data listed in Table I). We now like to point out the most important statistical properties of this lognormal pdf: 1) Mean Vaiue of N. This is given by cquation (60) with gsand o given by (91) and (94), respectively: UNCLASSIFIED / /2@Re@-EGRir UGE Ohi
- Page 44born-digital extraction
UNCLASSIFIED / sROR (95) In other words, there are 4590 ET Civilizations in the Galaxy according the Central Limit Thearem of Statistics with the inputs of Table I. This number 4590 is HIGHER than the 3500 foreseen by the classical Drake equation working with sheer numbers only, rather than with probability distributians. Thus equation (95) [IS GOOD FOR NEWS FOR SETI, since it shows that the expected number of EVs is HIGHER with an adequate statistical treatment than just with the too simple Drake sheer numbers of (1). 2) Variance of N. The variance of the lognormal distribution is given by (62) and turns out to be a huge number: af se em (eo - i} 125328623. (96) 3) Standard deviation af N. The standard deviation of the lognormal distribution is given by (63) and turns oul to be: » oe gy, =e e+ Ve™ -1=11195 |. (97) Again, this is GOOD NEWS FOR SETI. In fact, such a high standard deviation means that N may range from very low vatues (zero, theoretically, and one since Humanity exists} up to tens of thousands (4590+11195=15785 is (95)+(97}). 4) Mode of N. The mode (= peak abscissa) of the lognormal distribution of N is given by (81), and has a surprisingly low numeric value: n risile = Nou =e e o 250 | (98) This ts well shown in Figure 4: the made peak is very pronounced and close to the origin, but the right tail is high, and this means that the mean value of the distribution is much higher than the mode: 4590>>250. 44 UNCLASSIFIED/ /Fé 5) Median of N. The median (= fifty-fifty abscissa, splitting the pdf in two exactly equi-probable parts) of the lognormal distribution of N is given by (89), and has the numeric value: Minedian = ef = 1740 (99) In words, assuming the input values listed in Table 1, we have exactly a 50% probability that the actual value of Nis lower than 1740, and 50% that it is higher than 1740. 7. COMPARING THE CLT RESULTS WITH THE NON-CLT RESULTS The time is now ripe to compare the CLT- based results about the lognormal distribution of WV, just described in Section 5, against the Non-CLT- based results obtained numerically in Section 3.3 To do so in a simple, visual way, let us plot on the same diagram two curves: lL) The numeric curves appearing in Figure 2 and gblained after laborious Fourier transform calculations in the complex domain, and 2) The lognormal distribution (56) with humeric gg and o given by (91) and (94) respectively. We see that the two curves are virtually coincident for valucs of N larger than 1500. This is a consequence of the law of large numbers, of which the CLT is Just one of the many facets. Similarly it happens for natural log of N, 1c. the random variable ¥ of (5), that is plotted in Figure 5 both in its normal curve version (thin curve) and in ils numeric version, obtained via Fourier transforms and already shown in Figure 2. The conclusion is simple: from now on we shall discard forever the numeric calculations and we'll stick only to the equations derived by virtue of the CLT, ie. ta the lognormal (56) and its CORSEGUENCES.
- Page 45born-digital extraction
UNCLASSIFIED / @R@EEIGEH=USEONEY = PROBABILITY DENSITY FUNCTION OF N Prob. density function of N N = Number of ET Civilizations in Galaxy Figure 4. Comparing the two probability density functions of the random variable ' found: 1) Althe end of Section 3.3. in a purcly numeric way and without resorting to the CLT at all (thick curve) and 2) Analytically by using the CLT and the relevant lognormal approximation (thin curve). PROBABILITY DENSITY FUNCTION OF Y=lnt(NX} 04 a Probability density function of Y COCOA N EE SARE) 408.\00 CCEA 12 Independent variable ¥ = In(N} Figure 5. Comparing the two probability density functions of the random variable Y=In(N) found: 1) Al the end of Scction 3.3. in a purely numeric way and withoul resorting lo the CLT at all (thick curve) and 2) Analytically by using the CLT and the relevant normal (Gaussian) approximation (thin Gaussian curve). 8 DISTANCE OF THE NEAREST EXTRATERRESTRIAL CIVILIZATION AS A PROBABILITY DISTRIBUTION As an application of the Statistical Drake Equation developed in the previous sections of this paper, we now want to consider the problem of estimating the distance of the ExtraTerrestrial Civilization nearest to us in the Galaxy. In all Astrobiology textbooks (sec, for instance, ref. [10]) 45 and in several web sites, the solution to this problem is reported with only slight differences in the mathematical proofs among the various authors. In the first of the coming two sections (section 7.1) we derive the expression for this “ET_Distanee” (as we like to denote it} in the classical, non- probabilistic way: in other words, this is the classical, deterministic derivation. In the second section {7.2} we provide the probabilistic derivation, arising from our Statistical Drake UNCLASSIFIED / /POT?OPPECGEHEUGE-Ohir+ - Page 46born-digital extraction
UNCLASSIFIED/ Equation, of the corresponding probability density function fer pisane() : here r is the distance between us and the nearest ET civilization assumed as the independent variable of its own probability density function. The ensuing sections previde mere mathematical details about this Jrripistane(*) such as ifs mean value, variance, standard deviation, all central moments. mode, median, cumulants, skewness and kurtosis. CLASSICAL, NON-PROBABILISTIC DERIVATION OF THE DISTANCE OF THE NEAREST ET CIVILIZATION Consider the Galactic Disk and assume that: 1) The diameter of the Galaxy is (about) 100,000 light years, (abbreviated ly) i.e. its radius, Roatayw+ 18 about 30,000 ly. 2) The thickness of the Galactic Disk at half-way from its center, Re wags is about 16,000 ly. Then 3) The volume of the Galaxy may be approximated as the volume of the corresponding cylinder, i.e. Vesate vy mis Re), Y h ( 100) 4) Now consider the sphere around us having a radius r. The volume of such as sphere is Vou Spies =r 3 z D 4 — In the last equation, we had to divide the distance “ET_Distance” between ourselves and the nearest ET Civilization by 2 because we are now guing to make the unwarranted assumption that ail ET Civilizations are equally space from each other in the Galaxy! This is a crazy assumption, clearly, and should be replaced by more scientifically- grounded assumptions as soun as we know more about our Galactic Neighbourhood. At the moment, however, this is the best guess that we can make, and so we shall take it for granted, although we are aware that this is weak point in the reasoning. Having thus assumed that ET Civilizations are UNIFORMLY SPACED IN THE GALAXY, we can write down this proportion: 46 UNCLASSIFIED/ / ) (101) ¥, ii 7 Vv _S, ia Gatlarvy fae Cher_ Spire (102) N I That is, upon replacing bath (100) and (101) into (102): 4 gaa Py Re i 4 iT Crerleaxs ais 3 \ = (103) he I The only urnknewr in’ the last equation is ET_Distance, and se we may solve for it, thus getting ihe: (AVERAGE) DISTANCE BETWEEN ANY PAIR OF NEIGHBOURING CIVILIZATIONS IN THE GALAXY 3 ET_Distance = . where the positive constant C is defined by C= 36 Revatucy Meaiay: * 28845. light years | (105) Equations (104) and (105) are the starting point [or our first application of the Statistical Drake equation, that we discuss in detail in the coming sections of this paper. PROBABILISTIC DERIVATION OF THE PROBABILITY DENSITY FUNCTION FOR ET_DISTANCE The probability density function (pdf) yielding the distance of the ET Civilization nearest to us in the Galaxy and presented in this section, was discovered by this author on September 5", 2007. He did nol disclose it to other scientists until the SETI meeting run by the famous mathematical physicist and popular science author, Paul Davies, at the “Beyond” Center of the University of Arizona at Phoenix, on February 5-6-7-8, 2008. This mecting was also attended by SETI Institute experts Jill Tarter, Seth Shostak, Doug Vakoch, Tom Pierson and others. Durimg this author’s talk. Paul Davies suggested to call “the Maccone distribution” the new probability density function that yields the ET_Distance and is derived in this section.
- Page 47born-digital extraction
UNCLASSIFIED / APO R@FGE- Sr One Let us go back to equation (104). Since N is now a random variable (obeying the lognormal distribution), it follows that the ET_Distance must be a random variable as well. Hence it must have some unknown probability density function that we denote by Ficrpistane 1) ( 106) where ris the new independent variable of such a probability distribution (it is denoted by r to remind the reader that it expresses the three- dimensional radial distance separating us from the nearest ET civilization in a full spherical symmetry of the space around us). The question then is: what is the unknown probability distribution (106) of the ET_Distance? We can answer this question upon making the two formal substitutions N Pt wap (107) EV_distance + y into the transformation law (8) for random variables. As a consequence, (104) takes form =Cox 3, (108) A y= eQ= 2 yo! =e In order to find the unknown probability density fer_pisane(7) .We now to apply the rule (9) to (108). First. notice that (108), when inverted to yield the various roots x,{y}, yields a single real root only xQy)=—z- (109) Then, the summation in (9) reduces to one term only. Second, differentialing (108) one finds (110) Thus, the relevant absclute value reads 47 4 el} So ae (111) Upon replacing (111) into (9), we then find le (my = Cc. = 3 °° 3 on we he H In Serra "oe ‘a 1s oe 1 ted | te (I wm | O —— Set In | aeons | i TT fad ‘wat a)" This is the denominator of (9). The numerator simply is the lognormal probability density function (56) where the old independent variable x must now be re-written in terms of the new independent variable ¥ by virtue of (109), By doing so, we finally arrive at the new probability density function f(s) y coy 1 ; f= ee ’ 40 fore 3 re Rearranging and replacing y by r, the final form is: SEP distane {r)= a : Now. just replace C in (113) by virtue of (105). Then: We have discovered the probability density function yielding the probability of finding the nearest’ ExtraTerrestrial Civilization in the Galaxy in the spherical shelt between the distances r and r+dr from Earth: o Ren fons Hein feeds Ia) —————— te r . * Ie 3 Set piseme (1) = —: PET_Di tang: A Jiro (tla) holding for r=0. STATISTICAL PROPERTIES OF THIS DISTRIBUTION UNCLASSIFIED / -2@2-O555G001rUG@Ohe= - Page 48born-digital extraction
UNCLASSIFIED/ We now want to study this probability distribution in detail. Qur next questions are: 1} What is its mean value? 2) What are its variance and standard deviation? 3) What are ils moments to any higher order? 4) What are its cumulants? 5) What are its skewness and kurtosis? 6) What are the coordinates of its peak, ie. the mode (peak abscissa) and its ordinate? 7) What is its median’? The first three points in the list are all covered by the following theorem: all the moments of (113) are given by (here & is the generic and non- negative integer exponent, i.e. k = 0,1,2,3,... 20) i (ET_Distance‘ )= [ r*. FET_Distane (r) dr « 3 1 “ 2 kos ty =|] ff ---—==—-e dr I r Jiro f apt pe =C™e *¢ 3, (115) To prove this result, one first transforms the above integral by virtue of the substitution 3 Wl fee (116) r Then the new integral in z is then seen to reduce to the known Gaussian integral (53) and, afler several reductions that we skip for the sake of brevity. (115) follows from (53). In other words. we have proven that (117) (ET_Distance* | =C* ¢ Upon selling k=0 into (117), the normalization condition for fpz pisiane(7) follows yfET Distane (r) dr=1, (1 18) 48 UNCLASSIFIED/ / Upon setting k=1 into (117), the important mean value of the random variable ET_Distance is found rr oa (ET_Distance) =Ce Pel] (119} Upon setting & =2 into (117), the mean value of the square of the random variable ET_Distance is found 2 Box, i (ET_Distance”) = Ce pt 5 (120) The variance of ET_Distance now follows from the last two formulae with a few reductions: , 2 OrDisaie = (ET. Distance” } ~{ET_Distance) 2 fas a 3 Sele® etl. 21) So, the variance of ET_Distance is The square root of this is the important standard deviation of the ET_Distance random variable Orr pistane ~E¢ - —1ih (123) The third moment is obtained upon setting &=3 into (117) be | a, (ET. _Distance*) =Cie Ke (124) Finally, upon setting k =4 into (117), the fourth moment of ET_Distanee is found 4 & x Tre, ee bs ON > ET_Distanee™j}=C°e 3? e : (125) - Page 49born-digital extraction
UNCLASSIFIED / /POT@OPPECERE USE OnEer Our next goal is to find the cumulants of the ET_Distance. In principle, we could compute all the cumulants XK, from the generic i-th moment H, by virtue of the recursion formula (see ref. [8]) icl /; , i-l 1 K.-H (15) Ky. Hyp (126) In practice, however, here we shall confine ourselves to the computation of the first four cumulants because they only are required to find the skewness and kurtosis of the distribution (113). Then, the first four cumulants in terms of the first four moments read: Ky =H, Ry =i — Ky (127) Ky = fy -3K, K,-Kj Ky = it, -4K, Ky -3.K3 -6K, Ki -K}. These equations yield, respectively: He Ky=Ce Fe! (128) 2a of of K,=CPe J e%le® -1]. (129) o So" o K,=C’e“le? -3e¢' 42e6 |. (130) Ki = (3) auf &o io? 40° o 26° =Cte FJe% -4e% -3¢ % +12e% -6e ® From these we derive the skewness K, 3 (K,) 3 se 49 re ic a ele? —3¢ 8 +2¢ 4 = \ 3 $a" So” 4o° x 2a° \2 ‘ord e% -4¢% —32g 9 +l2¢3 ~Ge ? .. (132) and the kurtosis K ach 2" 4 =e 9 42¢443¢ 9% -6, (133) (K)° Next we want to find the mede of this distribution, i.e. the abscissa of its peak. To do so, we must first compute the derivative of the probability density function fier pisne(r) of (113), and then set it equal to zero. This derivative is actually the derivative of the ratio of two functions of ¢, as its plainly appears from (113). Thus, let us set for a moment cy] y Be AG ae seee (134) where “E” stands for “exponent,” Upon differentiating, one gets ~2[] Sa peat. (135) eae a Oe i But the probability density function (113) now reads > age Act pisune (r) = ance (| 36) So that its derivative is afer Ditiaw 4) = 3 _ dr V270 re UNCLASSIFIED / /@@R-OFPEGEAEUSGE ONLY
- Page 50born-digital extraction
UNCLASSIFIED / #2@Re@EELGHinW GEO (137) Setting this derivative equal to zcro means setting Ef{r)-rt+1=0 (138) That is, upon replacing (135) into (138), we get Spa} cadens =0 (139) r is Rearranging, this becomes c \ -3 Wf ue =() (140) fe that is Cc ae -10] 340 = (141) i whence Cl] nw a& Inj —} =— + — 142 nl ¢| a) ie and finally (143) This ts the most likely ET_Distance from Earth. How likely ? To find the value of the probability density function fir iistine (*) Corresponding to this value of the mode, we must obviously replace () into (). After a few rearrangements, which we skip for the sake of brevity, one gets Peak Value of Air pistane (7) = Fict_pistine CHinde } 3 3 18 aoe “¢ cy2ne 50 ...(144) This is the peak height in the pdf fey yiaane(")- Next lo the mode, the median #7 (ref. [9]) is one more statistical number uscd to characterize any probability distribution. It is defined as the independent variable abscissa i such that a realization of the random variable will take up a valuc lower than m with 50% probability or a value higher than mz with 50% probability again. In other words, the median #7 splits up our probability density in exactly two equally probable parts. Since the probability of occurrence of the random event equals the arca under its density curve (ic. the definite integral under its density curve) then the median mm (of the lognormal distribution, in this case) is defined as the integral upper limit #7: (145) Np FSS . { fer Dictane (rer - Upon replacing (113), this becomes m3 ] 2o° ] ee ee | = =—, 146 { € 5 (146) In order to finda, we may not differentiate (146) with respect to m, since the “precise” lactor 4 on the night would then disappear into a zero. On the contrary, we may Uy lo perform the obvious subsuilution «3 - mn © |-0 E gates -:20 (147) into the integral (146) to reduce it to the following integral (85) defming the error function erf(z). Then, after a few reductions that we Icave to the reader as an cxereisc, the full cquation (145). defining the median, is turned into the corresponding cquation involving the error function erf(x) as defined by (85): UNCLASSIFIED / /PO@Te OPPPCERE USEONEr - Page 51born-digital extraction
UNCLASSIFIED / /#@fe@FPtGbi-U6E-08-= ET_Distance between any two neighboring ET Random variable Civilizations in Galaxy assuming they are UNIFORMLY distributed throughout the whole Galaxy volume. Probability distribution Unnamed (Paul Davies suggested “Macconc distribution”) 2 7 4 RGais ty Neuduas 3 ~ ft Probability density function (Defining the positive numeric constant C) Mean value Variance Standard deviation All the moments, ic. &-ih moment Mode (= abscissa of the probability density function Funde = Tak = Ce te peak) Peak Value of fet pistune ') = dt a -e3 “@ ts Value of the Mode Peak ay ae 3 JE Distong Vinnde CV2n60 Median (= _ fifty-fifty probability value tor ect cae ET_Distance) median = 91 = Ce Skewness ~ * 3" Io o 2a- \2 3 3 Cle ® -4¢ 9 ~3¢ 9 +12¢3 ~—6e ” Kurtosis ! Expression of jin terms of the lower (a;) and upper b,[in(b, )—1]- a, [In(@, }-1] (;) limits of the Drake uniform input random vatiables D; : ae AD ‘ , . 2 Expression of o% in terms of the lower (a,) and upper ee a,b, [In(d,)- In(a, }I (b,) limits of the Drake uniform input random i (b, - a, variables D; Table 3. Summary of the properties of the probability distribution that applies to the random variable ET_Distance yielding the (average) distance hetween any two neighboring communicating civilizations in the Galaxy. 51 UNCLASSIFIED / / -@R-@EEEGH=U6P COREY
- Page 52born-digital extraction
that is ons In} © |-“ AW er |——=——_- |= 0 (149) J2o Since from the definition (147) one obviously has erf(0)=0, (149) yields C 3 If - 4 m V2o whence finally 563-10 -" 45478 338-107 22510 Probability density function (1 Ameters} 1.13-10°7 (148) =0 (150) (151) 0 S00 1000 1500 2000 ET_Distance from Earth {light years) This is the median of the lognormal distribution of N. in other words, this is the number of extraterrestrial civilizations in the Galaxy such that, with 50% probability the actual value of N will be lower than this median, and with 50% probability it will be higher. In conclusion, we feel useful to summarize all the equations that we derived about the random variable N in the tollowing Table 2, NUMERICAL EXAMPLE OF THE ET_DISTANCE DISTRIBUTION In this section we provide a numerical example of the analytic calculations carried on so far. Consider the Drake Equation values reported in Table 1. Then, the graph of the corresponding probability density function of the nearest ET_Distance, frp pistane (7), is shown in Figure 6. DIST ANCE OF NEAREST ET_CIVILIZA TION 2500 3000 3500 4000 4500 5000 Figure 6. This is the probability of finding the nearest ExtraTerrestrial Civilization at the distance r from Earth (in light years) if the valucs assumed in the Drake Equation are those shown in Table |. The relevant probability density function fiz pistane(”) 18 given by equation (113). Its mode (peak abscissa) equals 1933 light years, but its mean value is higher since the curve has a high tail on the right: the mean value equals in BZ UNCLASSIFIED / / - Page 53born-digital extraction
UNCLASSIFIED / / POOP PEC esr ont fact 2670 light years. Finally, the standard deviation equals 1309 light years: THIS IS GOOD NEWS FOR SETI, inasmuch as the nearest ET Civilization might lie at just 1 sigma = 2670-1309 = 1361 light years from us. From Figure 6, we see thal the probability of finding ExtraTerrestrials is practically zero up to a distance of about 500 light years from Earth. Then it starts increasing with the increasing distance from Earth, and reaches its maximum at hos = ar ay panes Avode = peak Ce & = 1933 light yCars). (152) This is the MOST LIKELY VALUE of the distance at which we can expect to find the nearest ExtraTerrestrial civilization. It is mot, however, the mean value of the probability distribution (113) for Alsy pistaue(?)- In fact, the probability density (113) has an infinite tail on the righi, as clearly shown in Figure 6, and hence its mean value must be higher than its peak value. As given by (119), its mean value is a #7 Fnwan_value = Ce * e'* = 2670 light years}. (153) This is the MEAN (value of the} DISTANCE at which we can expect to find ExtraTerrestriais. After having found the above two distances (1933 and 2670 light years, respectively), the next natural question that arises is: “what is the range, forth and back around the mean value of the distance, within which we can expect to find ExtraTerrestrials with “the highest hopes ?,” The answer to this question is piven by the notion of standard deviation. that we already found to be given by (123) x ? os om =Ce 2¢)8 Ve? —1 1309 light years), 154) Oey pistane More precisely, this is the so called 1-sigma (distance) level. Probability theory then shows that the nearest ExtraTerrestrial civilization is expected to be located within this range, i.e. within the two distances of (2670-1309) = 136] light years and (2670+1309) = 3979 light years, with probability 53 given by the integral of fet pistane(7) taken in between these two lower and upper limits, that is: 397 9lightyvars i fet Distane (r)dr=0.75 = 75% (155) 36/ightyears In plain words: wilh 75% probability, the nearest ExtraTerrestrial civilization is located in between the distances of 1361 and 3979 light years from us, having assumed the input values to the Drake Eguation given by Table 1. If we change those input values, then all the numbers change again. 9. THE “DATA ENRICHMENT PRINCIPLE” AS THE BEST CLT CONSEQUENCE UPON THE STATISTICAL DRAKE EQUATION (ANY NUMBER OF FACTORS ALLOWED) As a fitting climax to all the statistical equations developed so far, let us now state our “DATA ENRICHMENT PRINCIPLE,” It simply states that “The Higher the Number of Factors in the Statistical Drake equation, The Better,” Put in this simple way, it simply looks like a new way of saying that the CLT lets the random variable ¥ approach the normal distribution when the number of terms in the sum (4) approaches infinity. And this is the case, indeed. However, our “Data Enrichment Principle” has more profound methodological consequences that we cannot explain now, but hope to describe more precisely in one or more coming papers. CONCLUSIONS We have sought to extend the classical Drake equation to let it encompass Statistics and Probability. This approach appears to pave the way to fulure, more profound investigations intended not only to associate “error bars” to each factor in the Drake equation, but especially to increase the number of factors themselves. In fact, this seems to be the only way to incorporate into the Drake UNCLASSIFIED / / POR OFT RCIA UST UNTY
- Page 54born-digital extraction
UNCLASSIFIED / / R@R-GERLGIArHUG-Ohies equation more and more new scientific information as soon as it becomes available. In the long run, the Statistical Drake equation might just become a huge computer code, growing up in size and especially in the depth of the scientific information it contained. It would thus be Humanity’s first “Encyclopaedia Galactica,” Unfortunately, to extend the Drake equation to Statistics, it was necessary to use a mathematical apparatus that is more sophisticated than just the simple product of seven numbers, When this author had the honour and privilege to present his results at the SETI Institute on April 11", 2008, in front of an audience also including Professor Frank Drake, he felt he had to add these words: “My apologies, Frank, for disrupting the beautiful simplicity of your equation,” ACKNOWLEDGEMENTS The author is grateful to Drs. Jill Tarter, Paul Davies, Seth Shostak, Doug Vakoch, Tom Pierson, Carol Oliver, Paul Shuch and Kathryn Denning for attending his first presentation ever about these topics at the “Beyond” Center of the University of Arizona at Phoenix on February 8", 2008. He also would like to thank Dan Werthimer and his School of SETI young experts for keeping alive the 54 UNCLASSIFIED / / interplay between experimental and theoretical SETI. But the greatest “thanks” goes of course to the Teacher to all of us: Professor Frank D. Drake, whose equation opened a new way of thinking about the past and the future of Humans in the Galaxy. REFERENCES http://en. wikipedia.org/wiki/Drake_ equation [2] hitp:/en.wikipedia.org/wiki/SETI hitp://fen. wikipedia.ore/wiki/Astrabiolagy http://en. wikipedia.ore/wiki/Frank Drake [5] Athanasios Papoulis and S. Unnikrishna Pillai, “Probability, Random Variables and Stochastic Processes”, Fourth Edition, Tata McGraw-Hill, New Delhi, 2002. ISBN 0-07-048658-1. [6] http:/fen.wikipedia.ore/wikiAGamma_distribution htup://en.wikipedia.org/wiki/Central limit_theorc -m hup://en. wikipedia.org/wiki/C umulants hip://en. wikipedia.org/wiki/Median [10] Jeffrey Bennett and Seth Shostak, “Life in the Universe”, Second Edition, Pearson - Addison- _ Wesley, San Francisco. 2007, ISBN 0-8053- 4753-4. Sec in particular page 404.
- Page 55born-digital extraction
UNCLASSIFIED / (P@fe@FPGPi-U6r-Oht References [1] Benford, Gregory, Jim and Dominic, “Cost Optimized Interstellar Beacons: SETI”, arXiv.org web site (22 Oct. 2008). [2] Carl Sagan, “Cosmos”, Random House, New York, 1983. See in particular the pages 298-302. [3] Bennet, Jeffrey, and Shostak, Seth, “Life in the Universe”, second edition, Pearson — Addison Wesley, San Francisco, 2007. See in particular page 404. [4] C. Maccone, “The Statistical Drake Equation”, paper #IAC-08-A4.1.4 presented on October 1%, 2008, at the 59" International Astronautical Congress (IAC) held in Glasgow, Scotland, UK, September 29" thru October 3°¢, 2008. 55 UNCLASSIFIED / AP@PoOFrEHIGEE USP One