Concatenated page-by-page transcript. Born-digital pages came through pdf.js; scanned pages were transcribed by Claude vision OCR. Pages marked unreadable failed multiple OCR retries (heavy redaction, microfilm artifacts, or blank separators) and are kept in place for audit.
UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~
[DIA seal image]
Defense
Intelligence
Reference
Document
Acquisition Threat Support
11 March 2010
ICOD: 1 December 2009
An Introduction to the
Statistical Drake Equation
UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~
UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~
An Introduction to the Statistical Drake Equation
Prepared by:
(b)(3):10 USC 424
Defense Intelligence Agency
Author:
(b)(6)
Administrative Note
COPYRIGHT WARNING: Further dissemination of the photographs in this publication is not authorized.
This product is one in a series of advanced technology reports produced in FY 2009
under the Defense Intelligence Agency, (b)(3):10 USC 424 Advanced Aerospace
Weapon System Applications (AAWSA) Program. Comments or questions pertaining to
this document should be addressed to (b)(3):10 USC 424;(b)(6) , AAWSA Program
Manager, Defense Intelligence Agency, ATTN: (b)(3):10 USC 424 Bldg 6000, Washington,
DC 20340-5100.
ii
UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~
Contents
1. Introduction .....................................................................................................iv
2. The Key Question: How Far are They ? ......................................................... 4
3. Computing N By Virtue of the Drake Equation (1961) ....................................... 7
4. The Drake Equation is Over-Simplified ............................................................. 10
5. The Statistical Drake Equation ......................................................................... 11
6. Solving the Statistical Drake Equation By Virtue of the Central Limit Theorem
(CLT) of Statistics .......................................................................................... 13
7. An Example Explaining the Statistical Drake Equation .................................... 15
8. Finding the Probability Distribution of the Et-Distance By Virtue of the Statistical
Drake Equation................................................................................................ 18
9. The "Data Enrichment Principle" as the Best CLT Consequence Upon the
Statistical Drake Equation (Any Number of Factors Allowed) .......................... 23
10. Conclusions ................................................................................................... 23
Appendix A: Proof of Shannon's 1948 Theorem Stating That the Uniform
Distribution is the "Most Uncertain" One Over a Finite Range of
Values ........................................................................................ 25
Appendix B: Original Text of the Author's Paper #IAC-08-A4.1.4 Entitled the
Statistical Drake Equation ......................................................... 28
References .......................................................................................................... 55
iii
UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~
An Introduction to the Statistical Drake Equation
1. Introduction
SETI (an acronym for "Search for Extraterrestrial Intelligence") is a relatively
new branch of scientific research, having begun only in 1959. Its goal is to
ascertain whether alien civilizations exist in the universe, how far from us
they exist, and possibly how much more advanced than us they may be.
As of 2009, the only physical tools we know that could help us get in touch
with aliens are the electromagnetic waves an alien civilization could emit and
we could detect. This forces us to use the largest radiotelescopes on Earth for
SETI research, because the higher our collecting area of electromagnetic
radiation is, the higher our sensitivity is (that is, the farther in space we can
probe). Yet, even by using the largest radiotelescopes on Earth (the 310-meter
dish at Arecibo, for instance), we cannot search for aliens beyond, say, a few
hundred light years away. This is a very, very small amount of space around us
within our galaxy, the Milky Way, that is about 100,000 light years in diameter.
Thus, current SETI can cover only a very tiny fraction of the galaxy, and it is
not surprising that in the past 50 years of SETI searches, NO extraterrestrial
civilization was discovered. Quite simply, we did not get far enough!
This demands the construction of much more powerful and radically new
radiotelescopes. Rather than big and heavy metal dishes, whose mechanical
problems hamper SETI research too much, we are now turning to "software
radiotelescopes," where a large number of small dishes (ATA = Allen
Telescope Array, and ALMA = Atacama Large Millimeter/submillimeter Array)
or even just of simple dipoles (LOFAR = Low Frequency Array) using state-of-
the-art electronics and very-high-speed computing can outperform the
classical radiotelescopes in many regards. The final dream in this field is the
SKA (= Square Kilometer Array), currently being designed and expected to be
completed around 2020.
2. The Key Question: How Far are They ?
But still, the key question remains: how far are they?
Or, more correctly, how far do we expect the NEAREST extraterrestrial civilization to be
from the Solar System in the galaxy?
This question was first faced in a scientific manner back in 1961 by the same scientist
who also was the first experimental SETI radio astronomer ever: the American, Frank
Donald Drake (born 1930). He first considered the shape and size of the galaxy where
we are living: the Milky Way. This is a spiral galaxy measuring some 100,000 light
years in diameter and some 16,000 light years in thickness of the Galactic Disk at half-
way from its center. That is:
The diameter of the galaxy is (about) 100,000 light years, (abbreviated ly) i.e., its
radius, R_Galaxy, is about 50,000 ly.
iv
UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~
The thickness of the Galactic Disk at half-way from its center, h_Galaxy, is about 16,000 ly.
The volume of the galaxy may then be approximated as the volume of the
corresponding cylinder, i.e.
V_Galaxy = π R²_Galaxy h . (1)
Now consider the sphere around us having a radius r. The volume of such a sphere is
V_Our_Sphere = (4/3) π (ET_Distance / 2)³ (2)
In the last equation, we had to divide the distance "ET_Distance" between ourselves
and the nearest ET civilization by 2 because we are now going to make the
unwarranted assumption that all ET civilizations are equally spaced from each
other in the galaxy! This is a crazy assumption, clearly, and should be replaced by
more scientifically-grounded assumptions as soon as we know more about our Galactic
Neighborhood. At the moment, however, this is the best guess that we can make, and
so we shall take it for granted, although we are aware that this is a weak point in the
reasoning.
Furthermore, let us denote by N the total number of civilizations now living in the
galaxy, including ourselves. Of course, this number N is unknown. We only know that
N ≥ 1 since one civilization does at least exist!
Having thus assumed that ET civilizations are UNIFORMLY SPACED IN THE GALAXY, we
can then write down the proportion:
V_Galaxy / N = V_Our_Sphere / 1 . (3)
That is, upon replacing both (1) and (2) into (3):
π R²_Galaxy h / N = (4/3) π (ET_Distance / 2)³ / 1 . (4)
The last equation contains two unknowns: N and ET_Distance, and so we don't know
which one it is better to solve for.
However, we may suppose that, by resorting to the (rather uncertain) knowledge that
we have about the Evolution of the galaxy through the last 10 billion years or so, we
might somehow compute an approximate value for N.
Then, we may solve (4) for ET_Distance thus obtaining the (AVERAGE) DISTANCE
BETWEEN ANY PAIR OF NEIGHBORING CIVILIZATIONS IN THE GALAXY (DISTANCE
LAW)
5
UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~
ET_Distance(N) = ³√(6 R²_Galaxy h) / ³√N = C / ³√N (5)
where the positive constant C is defined by
C = ³√(6 R²_Galaxy h_Galaxy) ≈ 28845 light years . (6)
Equations (5) and (6) are the starting point to understand the origin of the Drake
equation that we discuss in detail in Section 3 of this paper.
Let us just complete this section by pointing out three different numerical cases of the
distance law (5):
• We know that we exist, so N may not be smaller than 1, i.e., N ≥ 1. Suppose then
that we are alone in the galaxy, i.e., that N=1. Then the distance law (5) yields as
distance to the nearest civilization from us just the constant C, i.e., 28,845 light
years. This is about the distance in between ourselves and the center of the galaxy
(i.e. the Galactic Bulge). Thus, this result seems to suggest that, if we do not find
any extraterrestrial civilization around us in these outskirts of the galaxy where we
live, we should look around the Galactic Center first. And this is indeed what is
happening, i.e., many SETI searches are actually pointing the antennas towards the
Galactic Center, looking for beacons (see, for instance ref. [1]).
• Suppose next that N=1000, i.e. there are about a thousand extraterrestrial
communicating civilizations in the whole galaxy right now. Then the distance law (5)
yields an average distance of 2,885 light years. This is a distance that most
radiotelescopes in Earth may not reach for SETI searches right now: hence the need
to build larger radiotelescopes, like ALMA, LOFAR and the SKA.
• Suppose finally that N=1000000, i.e., there are a million communicating civilizations
now in the galaxy. Then the distance law (5) yields an average distance of 288 light
years. This is within the (upper) range of distances that our current radiotelescopes
may reach for SETI searches, and that justifies all SETI searches that have been
done so far in the first fifty years of SETI (1960-2010).
In conclusion, interpolating the above three special cases of N, we may say that the
distance law (5) yields the following key diagram of the average ET distance vs. the
assumed number of communicating civilizations, N, in the galaxy right now (Figure 1):
6
UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~
[Figure: Graph titled "Average DISTANCE of the nearest ET civilization vs. the ASSUMED NUMBER of ET civilizations in the Gala"
Y-axis: Average DISTANCE of the civilization nearest to us in LIGHT YEARS, ranging from 0 to 2000
X-axis: ASSUMED NUMBER of civilizations in the Galaxy (that is, N in the Drake equation), ranging from 0 to 1000000
The graph shows a decreasing curve from approximately 2000 light years at N=0 to approximately 250 light years at N=1000000]
Figure 1. DISTANCE LAW; i.e., the Average Distance (plot along the vertical axis in light years) Versus
the NUMBER of Communicating Civilizations ASSUMED to Exist in the Galaxy Right Now
3. Computing N By Virtue of the Drake Equation (1961)
In the previous section, the problem of finding how close the nearest ET civilization may
be was "solved" by reducing it to the computation of N, the total number of
extraterrestrial civilizations now existing in this galaxy. In this section the famous
Drake equation is described, that was proposed back in 1961 by Frank Donald Drake
(born 1930) to estimate the numerical value of N. We believe that no better
introductory description of the Drake equations exists other than the one given by Carl
Sagan in his 1983 book "Cosmos" (ref. [2]), in its turn based on the famous TV series
"Cosmos." So, in this paragraph we report Carl Sagan's description of the Drake
equation unabridged.
"But is there anyone out there to talk to? With a third or a half a trillion stars in our
Milky Way galaxy alone, could ours be the only one accompanied by an inhabited
planet? How much more likely is it that technical civilizations are a cosmic
commonplace, that the galaxy is pulsing and humming with advanced societies, and,
therefore, that the nearest such culture is not so very far away – perhaps transmitting
from antennas established on a planet of a naked-eye star just next door. Perhaps
when we look up at the sky at night, near one of those faint pinpoints of light is a world
on which someone quite different from us is then glancing idly at a star we call the Sun
and entertaining, for just a moment, an outrageous speculation.
7
UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~
It is very hard to be sure. There may be several impediments to the evolution of a
technical civilization. Planets may be rarer than we think. Perhaps the origin of life is
not so easy as our laboratory experiments suggest. Perhaps the evolution of advanced
life forms is improbable. Or it may be that complex life forms evolve more readily, but
intelligence and technical societies require an unlikely set of coincidences – just as the
evolution of the human species depended on the demise of the dinosaurs and the ice-
age recession of the forests in whose trees our ancestors screeched and dimly
wondered. Or perhaps civilizations arise repeatedly, inexorably, on innumerable planets
in the Milky Way, but are generally unstable; so all but a tiny fraction are unable to
survive their technology and succumb to greed and ignorance, pollution and nuclear
war.
It is possible to explore this great issue further and make a crude estimate of N, the
number of advanced civilizations in the galaxy. We define an advanced civilization as
one capable of radio astronomy. This is, of course, a parochial if essential definition.
There may be countless worlds on which the inhabitants are accomplished linguists or
superb poets but indifferent radio astronomers. We will not hear from them. N can be
written as the product or multiplication of a number of factors, each a kind of filter,
every one of which must be sizable for there to be a large number of civilizations:
• Ns, the number of stars in the Milky Way galaxy.
• fp, the fraction of stars that have planetary systems.
• ne, the number of planets in a given system that are ecologically suitable for life.
• fl, the fraction of otherwise suitable planets on which life actually arises.
• fi, the fraction of inhabited planets on which an intelligent form of life evolves.
• fc, the fraction of planets inhabited by intelligent beings on which a communicative
technical civilization develops.
• fL, the fraction of planetary lifetime graced by a technical civilization.
Written out, the equation reads
N = Ns · fp · ne · fl · fi · fc · fL (7)
All of the f's are fractions, having values between 0 and 1; they will pare down the
large value of Ns.
To derive N we must estimate each of these quantities. We know a fair amount about
the early factors in the equation, the number of stars and planetary systems. We know
very little about the later factors, concerning the evolution of intelligence or the lifetime
of technical societies. In these cases our estimates will be little better than guesses. I
invite you, if you disagree with my estimates below, make your own choices and see
what implications your alternative suggestions have for the number of advanced
civilizations in the galaxy. One of the great virtues of this equation, due to Frank Drake
of Cornell, is that it involves subjects ranging from stellar and planetary astronomy to
organic chemistry, evolutionary biology, history, politics and abnormal psychology.
Much of the Cosmos is in the span of the Drake equation.
8
UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~
We know Ns, the number of stars in the Milky Way galaxy, fairly well, by careful counts
of stars in a small but representative region of the sky. It is a few hundred billion; some
recent estimates place it at 4 x 10¹¹. Very few of these stars are of the massive short-
lived variety that squander their reserves of thermonuclear fuel. The great majority
have lifetimes of billions or more years in which they are shining stably, providing a
suitable energy source for the energy and evolution of life on nearby planets.
There is evidence that planets are a frequent accompaniment of star formation: in the
satellite systems of Jupiter, Saturn and Uranus, which are like miniature solar systems;
in theories of the origin of the planets; in studies of double stars; in observations of
accretion disks around stars; and in some preliminary investigations of gravitational
perturbations of nearby stars.¹ Many, perhaps even most, stars may have planets. We
take the fraction of stars that have planets, fp, as roughly equal to 1/3. Then the total
number of planetary systems in the galaxy would be Ns fp ~ 1.3 x 10¹¹ (the symbol ~
means "approximately equal to"). If each system were to have about ten planets, as
ours does, the total number of worlds in the galaxy would be more than a trillion, a vast
arena for the cosmic drama.
In our own solar system there are several bodies that may be suitable for life of some
sort: the Earth certainly, and perhaps Mars, Titan and Jupiter. Once life originates, it
tends to be very adaptable and tenacious. There must be many different environments
suitable for life in a given planetary system. But conservatively we choose ne=2. Then
the number of planets in the galaxy suitable for life becomes Ns fp ne ~ 3 x 10¹¹.
Experiments show that under the most common cosmic conditions the molecular basis
of life is readily made, the building blocks of molecules able to make copies of
themselves. We are now on less certain grounds; there may, for example, be
impediments in the evolution of the genetic code, although I think this is unlikely over
billions of years of primeval chemistry. We choose fl ~ 1/3, implying a total number of
planets in the Milky Way on which life has arisen at least once as Ns fp ne fl ~ 1 x 10¹¹,
a hundred billion inhabited worlds. That in itself is a remarkable conclusion. But we are
not yet finished.
The choices of fi and fc are more difficult. On the one hand, many individually unlikely
steps had to occur in biological evolution and human history for our present intelligence
and technology to develop. On the other hand, there must be quite different pathways
to an advanced civilization of specified capabilities. Considering the apparent difficulty
in the evolution of large organisms, represented by the Cambrian explosion, let us
choose fi x fc = 1/100, meaning that only 1 per cent of planets on which life arises
actually produce a technical civilization. This estimate represents some middle ground
among the varying scientific options. Some think that the equivalent of the step from
the emergence of trilobites to the domestication of fire goes like a shot in all planetary
systems; others think that, even given ten or fifteen billion years, the evolution of a
technical civilization is unlikely. This is not a subject on which we can do much
experimentation as long as our investigations are limited to a single planet. Multiplying
¹ Carl Sagan was writings these lines back in the 1970's, when no extrasolar planets had been discovered yet. The
first such discovery occurred in 1995, when Michel Mayor and Didier Queloz, working at the "Observatoire de Haute
Provence" in France, discovered the first extrasolar planet orbiting the nearby star 51 Peg. This first extrasolar
planet was hence named 51 Peg B. Many more extrasolar planets were discovered around nearby stars ever since.
As of April 2009, 347 extrasolar planets (exoplanets) are listed in the Extrasolar Planets Encyclopaedia.
9
UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~
these factors together, we find Ns fp ne fl fi fc ~ 1 x 10⁹, a billion planets on which
technical civilizations have arisen at least once. But that is very different from saying
that there are a billion planets on which technical civilizations now exist. For this we
must also estimate fL.
What percentage of the lifetime of a planet is marked by a technical civilization? The
Earth has harbored a technical civilization characterized by radio astronomy for only a
few decades out of a lifetime of a few billion years. So far, then, for our planet fL is less
than 1/10⁸, a millionth of a percent. And it is hardly out of the question that we might
destroy ourselves tomorrow. Suppose this were a typical case, and the destruction so
complete that no other technical civilization – of the human or any other species – were
able to emerge in the five or so billion years remaining before the Sun dies. Then Ns fp
ne fl fi fc fL ~ 10, and, at a given time there would be only a tiny smattering, a handful,
a pitiful few technical civilizations in the galaxy, the steady state number maintained as
emerging societies replace those recently self-immolated. The number N might be even
as small as 1 if civilizations tend to destroy themselves soon after reaching a
technological phase; there might be no one for us to talk with but ourselves. And that
we do but poorly. Civilizations would take billions of years of tortuous evolution, and
then snuff themselves out in an instant of unforgivable neglect.
But consider the alternative, the prospect that at least some civilizations learn to live
with technology; that the contradictions posed by the vagaries of past brain evolution
are consciously resolved and do not lead to self destruction; or that, even if major
disturbances occur, they are reveres in the subsequent billions of years of biological
evolution. Such societies might live to a prosperous old age, their lifetimes measured
perhaps on geological or stellar evolutionary time scales. If 1 percent of civilizations can
survive technological adolescence, take the proper fork at this critical historical branch
point and achieve maturity, then fL ~ 1/100, N ~ 10⁷, and the number of extant
civilizations in the galaxy is in the millions. Thus, for all our concern about the possible
unreliability of our estimates of the early factors in the Drake equation, which involve
astronomy, organic chemistry and evolutionary biology, the principal uncertainty comes
to economics and politics and what, on Earth, we call human nature. It seems fairly
clear that if self-destruction is not the overwhelmingly preponderant fate of galactic
civilizations, then the sky is softly humming with messages from the stars.
These estimates are stirring. They suggest that the receipt of a message from space is,
even before we decode it, a profoundly hopeful sign. It means that someone has
learned to live with high technology; that it is possible to survive technological
adolescence. This alone, quite apart from the contents of the message, provides a
powerful justification for the search for other civilizations.
4. The Drake Equation is Over-Simplified
In the nearly fifty years (1961-2009) elapsed since Frank Drake proposed his equation,
a number of scientists and writers tried to find out which numerical values of its seven
independent variables are more realistic in agreement with our present-day knowledge.
Thus there is a considerable amount of literature about the Drake equation nowadays,
and, as one can easily imagine, the results obtained by the various authors largely
differ from one another. In other words, the value of N, that various authors obtained
by different assumptions about the astronomy, the biology and the sociology implied by
the Drake equation, may range from a few tens (in the pessimist's view) to some
10
UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~
million or even billions in the optimist's opinion. A lot of uncertainty is thus affecting our
knowledge of N as of 2010. In all cases, however, the final result about N has always
been a sheer number, i.e., a positive integer number ranging from 1 to millions or
billions. This is precisely the aspect of the Drake equation that this author regarded as
"too simplistic" and improved mathematically in his paper #IAC-08-A4.1.4, entitled
"The Statistical Drake Equation" and presented on October 1st, 2008, at the 59th
International Astronautical Congress (IAC) held in Glasgow, Scotland, UK, September
29th thru October 3rd, 2008. That paper is attached herewith as Appendix B. Newcomers
to SETI and to the Drake equation, however, may find that paper too difficult to be
understood mathematically at a first reading. Thus, I shall now explain the content of
that paper "by speaking easily." I thank the reader for his or her attention.
5. The Statistical Drake Equation
We start by an example.
Consider the first independent variable in the Drake equation (7), i.e., Ns, the number
of stars in the Milky Way galaxy. Astronomers tell us that approximately there should
be about 350 millions stars in the galaxy. Of course, nobody has counted (or even seen
in the photographic plates) all the stars in the galaxy! There are too many practical
difficulties preventing us from doing so: just to name one, the dust clouds that don't
allow us to see even the Galactic Bulge (i.e. the central region of the galaxy) in the
visible light (although we may "see it" at radio frequencies like the famous neutral
hydrogen line at 1420 MHz). So, it doesn't make any sense to say that Ns = 350 x 10⁶,
or, say (even worse) that the number of stars in the galaxy is (say) 354,233,321, or
similar fanciful exact integer numbers. That is just silly and non-scientific. Much more
scientific, on the contrary, is to say that the number of stars in the galaxy is 350 million
plus or minus, say, 50 millions (or whatever values the astronomers may regard as
more appropriate, since this is just an example to let the reader understand the
difficulty).
Thus, it makes sense to REPLACE each of the seven independent variables in the Drake
equation (7) by a MEAN VALUE (350 millions, in the above example) PLUS OR MINUS A
CERTAIN STANDARD DEVIATION (50 millions, in the above example).
By doing so, we have made a great step ahead: we have abandoned the too-simplistic
equation (7) and replaced it by something more sophisticated and scientifically more
serious: the STATISTICAL Drake equation. In other words, we have transformed the
classical and simplistic Drake equation (7) into an advanced statistical tool for the
investigation of a host of facts hardly known to us in detail. In other words still:
• We replace each independent variable in (7) by a RANDOM VARIABLE, labeled
D_i (from Drake).
• We assume that the MEAN VALUE of each D_i is the same numerical value previously
attributed to the corresponding independent variable in (7).
• But now we also ADD A STANDARD DEVIATION σ_D_i on each side of the mean value,
that is provided by the knowledge gathered by scientists in each discipline
encompassed by each D_i.
11
UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~
Having so done, the next question is:
How can we find out the PROBABILITY DISTRIBUTION for each D_i ?
For instance, shall that be a Gaussian, or what?
This is a difficult question, for nobody knows, for instance, the probability distribution of
the number of stars in the galaxy, not to mention the probability distribution of the
other six variables in the Drake equation (7).
There is a brilliant way to get around this difficulty, though.
We start by excluding the Gaussian because each variable in the Drake equation is a
POSITIVE (or, more precisely, a non-negative) random variable, while the Gaussian
applies to REAL random variables only. So, the Gaussian is out. Then, one might
consider the large class of well-studied and positive probability densities called "the
gamma distributions," but it is then unclear why one should adopt the gamma
distributions and not any other. The solution to this apparent conundrum comes from
Shannon's Information Theory and a theorem that he proved in 1948: "The probability
distribution having maximum entropy (= uncertainty) over any FINITE range of real
values is the UNIFORM distribution over that range." This is proven in Appendix A of the
present document.
So, at this point, we assume that each of the seven D_i in (7) is a UNIFORM random
variable, whose mean value and standard deviation is known by the scientists working
in the respective field (let it be astronomy, or biology, or sociology). Notice that, for
such a uniform distribution, the knowledge of the mean value μ_D_i and of the standard
deviation σ_D_i automatically determines the RANGE of each random variable in between
its lower (called a_i) and upper (called b_i) limits: in fact these limits are given by the
equations
{ a_i = μ_D_i - √3 σ_D_i
{ b_i = μ_D_i + √3 σ_D_i (8)
(the "surprising" factor √3 in the above equations comes from the definitions of mean
value and standard deviation: please see equations (12), (15) and (17) in Appendix B
for the relevant proof). So the uniform distribution of each random variable D_i is
perfectly determined by its mean value and standard deviation, and so are all its other
properties.
The next problem is the following:
OK, since we now know everything about each uniformly distributed D_i, what is the
probability distribution of N, given that N is the product (7) of all the D_i ?
In other words, not only do we want to find the analytical expression of the probability
density function of N, but we also want to relate its mean value μ_N to all mean values
μ_D_i of the D_i, and its standard deviation σ_N to all standard deviations σ_D_i of the D_i.
12
UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~
This is a difficult problem.
It occupied the author's mind for no less than about ten years (1997-2007).
It is actually an ANALYTICALLY UNSOLVABLE problem, in that, to the best of this
author's knowledge, it is IMPOSSIBLE to find an analytic expression for any FINITE
PRODUCT of uniform random variables D_i. This result is proven in Sections 2 thru 3.3 of
Appendix B (unfortunately!).
6. Solving the Statistical Drake Equation By Virtue of the
Central Limit Theorem (CLT) of Statistics
The solution to the problem of finding the analytical expression for the probability
density function of N in the statistical Drake equation was found by this author in
September 2007. The key steps are the following:
• Take the natural logs of both sides of the statistical Drake equation (7). This
changes the product into a sum.
• The mean values and standard deviations of the logs of the random variables D_i
may all be expressed analytically in terms of the mean values and standard
deviations of the D_i.
• Recall the Central Limit Theorem (CLT) of statistics, stating that (loosely speaking) if
you have a SUM of independent random variables, each of which is ARBITRARILY
DISTRIBUTED (hence, also including uniformly distributed), then, when the number
of terms in the sum increases indefinitely (i.e. for a sum of random variables
infinitely long)... the SUM RANDOM VARIABLE TENDS TO A GAUSSIAN.
• Thus, the natural log of N tends to a Gaussian.
• Thus, N tends to the LOGNORMAL DISTRIBUTION.
• The mean value and standard deviations of this lognormal distribution of N may all
be expressed analytically in terms of the mean values and standard deviations of
the logs of the D_i already found previously.
This result is fundamental.
All the relevant equations are summarized in the following Table 1. This table is actually
the same as Table 2 of the author's original paper IAC-08-A4.1.4, entitled "The
Statistical Drake Equation" and presented by him at the International Astronautical
Congress (IAC) held in Glasgow, UK, on October 1st, 2008. This original paper is
reproduced in Appendix B.
To sum up, not only is it found that N approaches the completely known lognormal
distribution for an INFINITY of factors in the statistical Drake equation (7), but the way
is paved to further applications by removing the condition that the number of terms in
the product (7) must be FINITE.
13
UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~
This possibility of ADDING ANY NUMBER OF FACTORS IN THE DRAKE EQUATION (7)
was not envisaged, of course, by Frank Drake back in 1961, when "summarizing" the
evolution of life in the galaxy in SEVEN simple STEPS. But today, the number of factors
in the Drake equation should already be increased: for instance, there is no mention in
the original Drake equation of the possibility that asteroidal impacts might destroy the
life on Earth at any time, and this is because the demise of the dinosaurs at the K/T
impact had not been yet understood by scientists in 1961, and was so only in 1980!
In practice, the number of factors should INCREASE as much as necessary in order to
get better and better estimates of N as long as our scientific knowledge increases. This
is called the "Data Enrichment Principle" and believe should be the next important goal
in the study of the statistical Drake equation.
Finally, a numerical example explaining how the statistical Drake equation works in the
practice will be given in the next section.
14
UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~
Table 1. Summary of the Properties of the Lognormal Distribution That Applies
to the Random Variable N = Number of ET Communicating Civilizations in the
Galaxy
Random variable | N = number of communicating ET
| civilizations in galaxy
Probability distribution | Lognormal
Probability density function | f_N(n) = (1/n) · (1/√(2πσ)) · e^(-(ln(n) - μ)² / 2σ²) (n ≥ 0)
Mean value | ⟨N⟩ = e^μ · e^(σ²/2)
Variance | σ²_N = e^(2μ) · e^(σ²) · [e^(σ²) - 1]
Standard deviation | σ_N = e^μ · e^(σ²/2) · √(e^(σ²) - 1)
All the moments, i.e. k-th moment | ⟨N^k⟩ = e^(kμ) · e^(k² σ²/2)
Mode (= abscissa of the lognormal peak) | n_mode ≡ n_peak = e^μ · e^(-σ²)
Value of the Mode Peak | f_N(n_mode) = (1/√(2π σ)) · e^(-μ) · e^(σ²/2)
Median (= fifty-fifty probability value | median = m = e^μ
for N)
Skewness | K₃/(K₄)^(3/2) = (e^(σ²) + 2) · 2√[(e^(-6μ) · e^(-3σ²)) / ((e^(σ²) - 1)³ · (e^(3σ²) + 3e^(2σ²) + 6e^(σ²) + 6)²)]
Kurtosis | K₄/(K₂)² = e^(4σ²) + 2e^(3σ²) + 3e^(2σ²) - 6
Expression of μ in terms of the lower | μ = Σ(i=1 to 7) ⟨Y_i⟩ = Σ(i=1 to 7) [b_i[ln(b_i) - 1] - a_i[ln(a_i) - 1]] / (b_i - a_i)
(a_i) and upper (b_i) limits of the
Drake uniform input random variables D_i
Expression of σ² in terms of the lower | σ² = Σ(i=1 to 7) σ²_Y_i = Σ(i=1 to 7) [1 - a_i b_i [ln(b_i) - ln(a_i)]²] / (b_i - a_i)²
(a_i) and upper (b_i) limits of the
Drake uniform input random variables D_i
7. An Example Explaining the Statistical Drake Equation
To understand how things work in practice for the statistical Drake equation, please
consider the following table 2. It is made up of three columns:
• The first column on the left lists the seven input sheer numbers that also become
• The mean values (middle column).
• Finally the last column on the right lists the seven input standard deviations.
15
UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~
The bottom line is the classical Drake equation (7). We see that, for this particular set
of seven inputs, the classical Drake equation (i.e. the product of the seven numbers)
yields a total of 3500 communicating extraterrestrial civilizations existing in the galaxy
right now.
[Table 2 box:]
Ns := 350·10⁹ μNs := Ns σNs := 1·10⁹
fp := 50/100 μfp := fp σfp := 10/100
ne := 1 μne := ne σne := 1/√3
fl := 50/100 μfl := fl σfl := 10/100
fi := 20/100 μfi := fi σfi := 10/100
fc := 20/100 μfc := fc σfc := 10/100
fL := 10000/10^10 μfL := fL σfL := 1000/10^10
N := Ns·fp·ne·fl·fi·fc·fL N = 3500
Table 2. Input Values (i.e. mean values and standard deviations) for the Seven Drake Uniform Random
Variables Di. The first column on the left lists the seven input sheer numbers that also become the mean values
(middle column). Finally the last column on the right lists the seven input standard deviations. The bottom line is
the classical Drake equation (7).
The statistical Drake equation, however, provides a much more articulated answer than
just the above sheer number N = 3500. In fact, a MathCad code written by this author
and capable of performing all the numerical calculations required by the statistical
Drake equation for a given set of seven input mean values plus seven input standard
deviations, yields for N the lognormal distribution (thin curve) plotted in Figure 2. We
see immediately that the peak of this thin curve (i.e. the mode) falls at about
n_mode ≡ n_peak = e^μ · e^(-σ²) ≈ 250 (this is equation (99) of Appendix B), while the median (fifty-
fifty value splitting the lognormal density in two parts with equal undergoing areas) falls
at about n_median ≡ e^μ ≈ 1740. These seem to be smaller values than N = 3500 provided by
the classical Drake equations, but it's a wrong impression due to a poor "intuitive"
understanding of what statistics is! In fact, neither the mode nor the median are the
"really important" values: the really important value for N is the MEAN VALUE! Now if
you look at the thin curve in Figure 2 below (i.e. the lognormal distribution arising from
the Central Limit Theorem), you see that this curve has a LONG TAIL ON THE RIGHT! In
other words, it does NOT immediately go down to nearly zero beyond the peak of the
mode. Thus, when you actually compute the mean value, you should not be too
16
UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~
surprised to find out that it equals ⟨N⟩ = e^μ · e^(σ²/2) ≈ 4589.559 ~ 4590 communicating
civilizations now in the galaxy. This is the important number, and it is HIGHER than the
3500 provided by the classical Drake equation. Thus, in conclusion, THE STATISTICAL
EXTENSION of the classical Drake equation INCREASES OUR HOPES to find an
extraterrestrial civilization!
[Figure: Graph titled "PROBABILITY DENSITY FUNCTION OF N"
Y-axis: Prob. density function of N, ranging from 0 to 6·10⁻⁴
X-axis: N = Number of ET Civilizations in Galaxy, ranging from 0 to 4000
The graph shows two curves: a thick curve and a thin curve, both peaking around N=250 and having long right tails]
Figure 2. Comparing the Two Probability Density Functions of the Random Variable N Found (1)
Without Resorting to the CLT at All (thick curve) and (2) Using the CLT and the Relevant Lognormal
Approximation (thin curve).
Even more so our hopes are increased when we go on to consider the standard
deviation associated with the mean value 4590. In fact, the standard deviation is given
by equation (97) of Appendix B. This yields σ_N = e^μ · e^(σ²/2) · √(e^(σ²) - 1) = 11195 and so the
expected number of N may actually be even much higher than the 4590 provided by
the mean value alone! The "upper limit of the one-sigma confidence interval" (as
statisticians call it), i.e. the sum 4590+11195 = 15,785, yields a higher number still!
(Note: the "lower limit of the one-sigma confidence interval" is ZERO because the
lognormal distribution is POSITIVE (or, more correctly, non-negative)). Finally, the
reader should note that the thick curve depicted in Figure 2 is just the NUMERICAL
solution of the statistical Drake equation for a FINITE number of 7 input factors. Figure
2 actually shows that this curve "is well interpolated" by the lognormal distribution (thin
curve), i.e., by the neat analytical expression provided by the Central Limit Theorem for
an INFINITE number of factors in the Drake equation. That is, in conclusion, Figure 2
visually shows that taking 7 factors or an infinity of factors "is almost the same thing"
already for a value as small as 7.
17
UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~
8. Finding the Probability Distribution of the Et-Distance By
Virtue of the Statistical Drake Equation
Having solved the statistical Drake equation by finding the lognormal distribution, we
are now in a position to solve the ET-DISTANCE problem by resorting to statistics again,
rather than just to the purely deterministic Distance Law (5), as we did in Section 2.
This is "scientifically more serious" than just the purely deterministic Distance Law (5)
inasmuch as the new statistical Distance Law will yield a PROBABILITY DENSITY for the
Distance, with the relevant mean value and standard deviation. In other words, the
Distance Law (5) itself becomes a random variable whose probability distribution, mean
value and standard deviation must be computed by "replacing" into (5) the fact that N
is now known to follow the lognormal distribution. This is mathematically described in
detail in Section 7 of Appendix A.
The important new result is the PROBABILITY DENSITY FOR THE DISTANCE, the
equation of which is
f_ET_Distance(r) = (3/r) · (1/√(2π σ)) · e^(-[ln(6 R²_Galaxy h_Galaxy / r³) - μ]² / 2σ²) (9)
holding for r ≥ 0. This is equation (114) of Appendix B.
Starting from this equation, the MEAN VALUE OF THE random variable ET_DISTANCE is
computed as
⟨ET_Distance⟩ = C e^(-μ/3) e^(σ²/18) (10)
which is equation (119) of Appendix B, and finally the ET_DISTANCE STANDARD
DEVIATION
σ_ET_Distance = C e^(-μ/3) e^(σ²/18) √(e^(σ²/9) - 1) (11)
which is equation (123) of Appendix B. Of course, all other descriptive statistical
quantities, such as moments, cumulants etc. can be computed upon starting from the
probability density (9), and the result is Table two hereafter, that is Table 3 of Appendix
B.
Finally, to complete this section, as well as this "introduction to the statistical Drake
equation," the numerical values that equations (10) and (11) yield for the Input Table 1
are determined. They are, respectively:
r_mean value = C e^(-μ/3) e^(σ²/18) ≈ 2,670 light years (12)
18
UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~
which is equation (153) of Appendix B, and
σ_ET_Distance = C e^(-μ/3) e^(σ²/18) √(e^(σ²/9) - 1) ≈ 1,309 light years (13)
which is equation (154) of Appendix B.
19
UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~
Table 2. Summary of the Properties of the Probability Distribution That Applies
to the Random Variable ET_Distance Yielding the (average) Distance Between
Any Two Neighboring Communicating Civilizations in the Galaxy
Random variable | ET_Distance between any two neighboring
| ET civilizations in galaxy assuming they are
| UNIFORMLY distributed throughout the
| whole galaxy volume.
Probability distribution | Unnamed
Probability density function | f_ET_Distance(r) = (3/r) · (1/√(2π σ)) · e^(-[ln(6 R²_Galaxy h_Galaxy / r³) - μ]² / 2σ²)
Numerical constant C related to the | C = √(6 R²_Galaxy h_Galaxy) ≈ 28,845 light years
Milky Way size
Mean value | ⟨ET_Distance⟩ = C e^(-μ/3) e^(σ²/18)
Variance | σ²_ET_Distance = C² e^(-2μ/3) e^(σ²/9) (e^(σ²/9) - 1)
Standard deviation | σ_ET_Distance = C e^(-μ/3) e^(σ²/18) √(e^(σ²/9) - 1)
All the moments, i.e. k-th moment | ⟨ET_Distance^k⟩ = C^k e^(-kμ/3) e^(k² σ²/18)
Mode (= abscissa of the lognormal peak) | r_mode ≡ r_peak = C e^(-μ/3) e^(-σ²/9)
Value of the Mode Peak | Peak Value of f_ET_Distance(r) =
| ≡ f_ET Distance(r_mode) = (3 / C√(2π σ)) · e^(μ/3) · e^(σ²/18)
Median (= fifty-fifty probability value | median = m = C e^(-μ/3)
for N)
Skewness | K₃/(K₄)^(3/2) = e^(-μ)(e^(σ²/2) - 3e^(5σ²/18) + 2e^(σ²/6)) /
| C³(e^(8σ²/9) - 4e^(5σ²/9) - 3e^(4σ²/9) + 12e^(σ²/3) - 6e^(2σ²/9))^(3/2)
Kurtosis | K₄/(K₃)² = e^(4σ²/9) + 2e^(σ²/3) + 3e^(2σ²/9) - 6
Expression of μ in terms of the lower | μ = Σ(i=1 to 7) ⟨Y_i⟩ = Σ(i=1 to 7) [b_i[ln(b_i) - 1] - a_i[ln(a_i) - 1]] / (b_i - a_i)
(ai) and upper (bi) limits of the Drake
uniform input random variables Di
Expression of σ² in terms of the lower | σ² = Σ(i=1 to 7) σ²_Y_i = Σ(i=1 to 7) [1 - a_i b_i [ln(b_i) - ln(a_i)]²] / (b_i - a_i)²
(ai) and upper (bi) limits of the Drake
uniform input random variables Di
20
UNCLASSIFIED//~~FOR OFFICIAL USE ONLY~~[page unreadable; original scan available via the document viewer above]
[page unreadable; original scan available via the document viewer above]
[page unreadable; original scan available via the document viewer above]
[page unreadable; original scan available via the document viewer above]
[page unreadable; original scan available via the document viewer above]
[page unreadable; original scan available via the document viewer above]
[page unreadable; original scan available via the document viewer above]
[page unreadable; original scan available via the document viewer above]
[page unreadable; original scan available via the document viewer above]
[page unreadable; original scan available via the document viewer above]
[page unreadable; original scan available via the document viewer above]
[page unreadable; original scan available via the document viewer above]
[page unreadable; original scan available via the document viewer above]
[page unreadable; original scan available via the document viewer above]
[page unreadable; original scan available via the document viewer above]
[page unreadable; original scan available via the document viewer above]
[page unreadable; original scan available via the document viewer above]
[page unreadable; original scan available via the document viewer above]
[page unreadable; original scan available via the document viewer above]
[page unreadable; original scan available via the document viewer above]
UNCLASSIFIED//FOR OFFICIAL USE ONLY
In practice, however, here we shall confine
ourselves to the computation of the first four
cumulants only because they only are required to
find the skewness and kurtosis of the distribution.
Then, the first four cumulants in terms of the first
four moments read:
{
K_1 = μ_1
K_2 = μ_2 - K_1^2
K_3 = μ_3 - 3K_1 K_2 - K_1^3 (67)
K_4 = μ_4 - 4K_1 K_3 - 3K_2^2 - 6K_2 K_1^2 - K_1^4.
These equations yield, respectively:
K_1 = e^μ e^(σ^2/2). (68)
K_2 = e^(2μ) e^(σ^2)(e^(σ^2) - 1). (69)
K_3 = e^(3μ) e^(9σ^2/2). (70)
K_4 = e^(4μ+2σ^2)(e^(σ^2) - 1)^3(e^(3σ^2) + 3e^(2σ^2) + 6e^(σ^2) + 6) (71)
From these we derive the skewness
K_3 /
(K_4)_2^(3/2) =
= (e^(σ^2) + 2) √( e^(-6μ) e^(-3σ^2) / ((e^(σ^2) - 1)^5 (e^(3σ^2) + 3e^(2σ^2) + 6e^(σ^2) + 6)^(3/2)) ). (72)
and the kurtosis
K_4 / (K_2)^2 = e^(4σ^2) + 2e^(3σ^2) + 3e^(2σ^2) - 6. (73)
Finally, we want to find the mode of the
lognormal probability density function. i.e. the
abscissa of its peak. To do so, we must first
compute the derivative of the probability density
function f_N(n) of equation (56), and then set it
equal to zero. This derivative is actually the
derivative of the ratio of two functions of n, as it
plainly appears from (57). Thus, let us set for a
moment
F(n) = (ln[n] - μ)^2 / (2σ^2) (74)
where "E" stands for "exponent." Upon
differentiating this, one gets
E'(n) = 1/(2σ^2) · 2(ln[n] - μ) · 1/n. (75)
But the lognormal probability density function (56),
by virtue of (74), now reads
f_N(n) = 1/(√(2π) σ) · e^(-E(n)) / n (76)
So that its derivative is
df_ET_Distance(r)/dr = 1/(√(2π) σ) · (-e^(E(n)) E'(n) · n - 1 · e^(E(n))) / n^2
= 1/(√(2π) σ) · (-e^(-E(n)) [E'(n) · n + 1]) / n^2. (77)
Setting this derivative equal to zero means setting
E'(n) · n + 1 = 0 (78)
That is, upon replacing (75),
1/σ^2 · (ln[n] - μ) + 1 = 0. (79)
Rearranging, this becomes
ln[n] - μ + σ^2 = 0 (80)
and finally
n_mode ≡ n_peak = e^μ e^(-σ^2) (81)
This is the most likely number of ExtraTerrestrial
Civilizations in the Galaxy.
How likely? To find the value of the probability
density function f_N(n) corresponding to this
value of the mode, we must obviously replace (81)
into (56). After a few rearrangements, one then
gets
41
UNCLASSIFIED//FOR OFFICIAL USE ONLYUNCLASSIFIED//FOR OFFICIAL USE ONLY
f_N(n_mode) = 1/(√(2π) σ) · e^(-μ) · e^(σ^2/2). (82)
This is "how likely" the most likely number of
ExtraTerrestrial Civilizations in the Galaxy is, i.e.
it is the peak height in the lognormal probability
density function f_N(n).
Next to the mode, the median m (ref. [9]) is one
more statistical number used to characterize any
probability distribution. It is defined as the
independent variable abscissa m such that a
realization of the random variable will take up a
value lower than m with 50% probability or a value
higher than m with 50% probability again. In other
words, the median m splits up our probability
density in exactly two equally probable parts. Since
the probability of occurrence of the random event
equals the area under its density curve (i.e. the
definite integral under its density curve) then the
median m (of the lognormal distribution, in this
case) is defined as the integral upper limit m:
∫_0^m f_N(n)dn = ∫_0^m 1/n · 1/(√(2π)) e^(-(ln(n)-μ)^2/(2σ^2)) = 1/2. (83)
In order to find m, we may not differentiate (83) with
respect to m, since the "precise" factor ½ on the
right would then disappear into a zero. On the
contrary, we may try to perform the obvious
substitution
z^2 = (ln(n) - μ)^2 / (2σ^2) z ≥ 0 (84)
into the integral (83) to reduce it to the following
integral defining the error function erf(z)
erf(x) = 2/√π ∫_0^x e^(-z^2) dz (85)
Then, after a few reductions that we skip for the sake
of brevity, the full equation (83) is turned into
1/2 + erf( (ln(m) - μ) / (√2 σ) ) = 1/2 (86)
that is
erf( (ln(m) - μ) / (√2 σ) ) = 0 (87)
Since from the definition (85) one obviously has
erf(0)=0, (87) becomes
(ln(m) - μ) / (√2 σ) = 0 (88)
whence finally
median = m = e^μ. (89)
This is the median of the lognormal distribution of
N. In other words, this is the number of
ExtraTerrestrial civilizations in the Galaxy such
that, with 50% probability the actual value of N will
be lower than this median, and with 50% probability
it will be higher.
In conclusion, we feel useful to summarize all the
equations that we derived about the random variable
N in the following Table 2.
Random variable | N = number of communicating ET civilizations in Galaxy
Probability distribution | Lognormal
Probability density function | f_N(n) = 1/n · 1/(√(2π)σ) e^(-(ln(n)-μ)^2/(2σ^2)) (n ≥ 0)
Mean value | ⟨N⟩ = e^μ e^(σ^2/2)
Variance | σ_N^2 = e^(2μ) e^(σ^2)(e^(σ^2) - 1)
Standard deviation | σ_N = e^μ e^(σ^2/2) √(e^(σ^2) - 1)
42
UNCLASSIFIED//FOR OFFICIAL USE ONLY
UNCLASSIFIED//FOR OFFICIAL USE ONLY
All the moments, i.e. k-th moment | ⟨N^k⟩ = e^(kμ) e^(k^2σ^2/2)
Mode (= abscissa of the lognormal peak) | n_mode ≡ n_peak = e^μ e^(-σ^2)
Value of the Mode Peak | f_N(n_mode) = 1/(√(2π) σ) · e^(-μ) · e^(σ^2/2)
Median (= fifty-fifty probability value for N) | median = m = e^μ
Skewness | K_3 / (K_4)_2^(3/2) = (e^(σ^2) + 2) √( e^(-6μ) e^(-3σ^2) / ((e^(σ^2) - 1)^5 (e^(3σ^2) + 3e^(2σ^2) + 6e^(σ^2) + 6)^(3/2)) )
Kurtosis | K_4 / (K_2)^2 = e^(4σ^2) + 2e^(3σ^2) + 3e^(2σ^2) - 6
Expression of μ in terms of the lower (a_i) and upper (b_i) limits of the Drake uniform input random variables D_i | μ = Σ_{i=1}^{7} ⟨Y_i⟩ = Σ_{i=1}^{7} (b_i[ln(b_i) - 1] - a_i[ln(a_i) - 1]) / (b_i - a_i)
Expression of σ^2 in terms of the lower (a_i) and upper (b_i) limits of the Drake uniform input random variables D_i | σ^2 = Σ_{i=1}^{7} σ_{Y_i}^2 = Σ_{i=1}^{7} 1 - a_i b_i [ln(b_i) - ln(a_i)]^2 / (b_i - a_i)^2
Table 2. Summary of the properties of the lognormal distribution that applies to the random variable N = number of
ET communicating civilizations in the Galaxy.
We want to complete this section about the
lognormal probability density function (56) by
finding out its numeric values for the inputs to the
Statistical Drake equation (3) listed in Table 1.
According to the CLT, the mean value μ to be
inserted into the lognormal density (56) is given
(according to the second equation (48)) by the sum of
all the mean values ⟨Y_i⟩, that is, by virtue of (31), by:
μ = Σ_{i=1}^{7} ⟨Y_i⟩ = Σ_{i=1}^{7} (b_i[ln(b_i) - 1] - a_i[ln(a_i) - 1]) / (b_i - a_i). (90)
Upon replacing the 14 a_i and b_i listed in Table 1
into (90), the following numeric mean value μ is
found
μ ≈ 7.462176 (91)
Similarly, to get the numeric variance σ^2 one
must resort to the last of equations (48) and to (33):
σ^2 = Σ_{i=1}^{7} σ_{Y_i}^2 = Σ_{i=1}^{7} 1 - a_i b_i [ln(b_i) - ln(a_i)]^2 / (b_i - a_i)^2 (92)
yielding the following numeric variance σ^2 to be
inserted into the lognormal pdf (56)
σ^2 ≈ 1.938725 (93)
whence the numeric standard deviation σ
σ ≈ 1.392381. (94)
Upon replacing these two numeric values (84)
and (86) into the lognormal pdf (56), the latter is
perfectly determined. It is plotted in Figure 4
hereafter as the thin curve.
In other words, Figure 4 shows the lognormal
distribution for the number N of ExtraTerrestrial
Civilizations in the Galaxy derived from the Central
Limit Theorem as applied to the Drake equation
(with the input data listed in Table 1).
We now like to point out the most important
statistical properties of this lognormal pdf:
1) Mean Value of N. This is given by equation (60)
with μ and σ given by (91) and (94), respectively:
43
UNCLASSIFIED//FOR OFFICIAL USE ONLYUNCLASSIFIED//FOR OFFICIAL USE ONLY
⟨N⟩ = e^μ e^(σ^2/2) ≈ 4589.559 (95)
In other words, there are 4590 ET Civilizations in
the Galaxy according the Central Limit Theorem of
Statistics with the inputs of Table 1. This number
4590 is HIGHER than the 3500 foreseen by the
classical Drake equation working with sheer
numbers only, rather than with probability
distributions. Thus equation (95) is GOOD FOR
NEWS FOR SETI, since it shows that the expected
number of ETs is HIGHER with an adequate
statistical treatment than just with the too simple
Drake sheer numbers of (1).
2) Variance of N. The variance of the lognormal
distribution is given by (62) and turns out to be a
huge number:
σ_N^2 = e^(2μ) e^(σ^2)(e^(σ^2) - 1) ≈ 125328623. (96)
3) Standard deviation of N. The standard deviation
of the lognormal distribution is given by (63) and
turns out to be:
σ_N = e^μ e^(σ^2/2) √(e^(σ^2) - 1) = 11195 (97)
Again, this is GOOD NEWS FOR SETI. In fact,
such a high standard deviation means that N may
range from very low values (zero, theoretically, and
one since Humanity exists) up to tens of thousands
(4590+11195=15785 is (95)+(97)).
4) Mode of N. The mode (= peak abscissa) of the
lognormal distribution of N is given by (81), and has
a surprisingly low numeric value:
n_mode ≡ n_peak = e^μ e^(-σ^2) ≈ 250 (98)
This is well shown in Figure 4: the mode peak is very
pronounced and close to the origin, but the right tail
is high, and this means that the mean value of the
distribution is much higher than the mode:
4590>>250.
5) Median of N. The median (= fifty-fifty abscissa,
splitting the pdf in two exactly equi-probable parts)
of the lognormal distribution of N is given by (89),
and has the numeric value:
n_median ≡ e^μ ≈ 1740 (99)
In words, assuming the input values listed in Table 1,
we have exactly a 50% probability that the actual
value of N is lower than 1740, and 50% that it is
higher than 1740.
7. COMPARING THE CLT RESULTS
WITH THE NON-CLT RESULTS
The time is now ripe to compare the CLT-
based results about the lognormal distribution of N,
just described in Section 5, against the Non-CLT-
based results obtained numerically in Section 3.3.
To do so in a simple, visual way, let us plot on
the same diagram two curves:
1) The numeric curves appearing in Figure 2
and obtained after laborious Fourier
transform calculations in the complex
domain, and
2) The lognormal distribution (56) with
numeric μ and σ given by (91) and (94)
respectively.
We see that the two curves are virtually coincident
for values of N larger than 1500. This is a
consequence of the law of large numbers, of which
the CLT is just one of the many facets.
Similarly it happens for natural log of N, i.e. the
random variable Y of (5), that is plotted in Figure 5
both in its normal curve version (thin curve) and in
its numeric version, obtained via Fourier transforms
and already shown in Figure 2.
The conclusion is simple: from now on we shall
discard forever the numeric calculations and we'll
stick only to the equations derived by virtue of the
CLT, i.e. to the lognormal (56) and its
consequences.
44
UNCLASSIFIED//FOR OFFICIAL USE ONLY
UNCLASSIFIED//FOR OFFICIAL USE ONLY
PROBABILITY DENSITY FUNCTION OF N
[Figure 4. Graph showing probability density function of N, with y-axis labeled "Prob. density function of N" ranging from 1·10^-4 to 6·10^-4, and x-axis labeled "N = Number of ET Civilizations in Galaxy" ranging from 0 to 4000. Two curves are shown: a thick curve and a thin curve that are nearly coincident for N > 1500.]
Figure 4. Comparing the two probability density functions of the random variable N found:
1) At the end of Section 3.3. in a purely numeric way and without resorting to the CLT at all (thick curve) and
2) Analytically by using the CLT and the relevant lognormal approximation (thin curve).
PROBABILITY DENSITY FUNCTION OF Y=ln(N)
[Figure 5. Graph showing probability density function of Y=ln(N), with y-axis labeled "Probability density function of Y" ranging from 0 to 0.5, and x-axis labeled "Independent variable Y = ln(N)" ranging from 0 to 12. Two curves are shown: a thick curve and a thin Gaussian curve.]
Figure 5. Comparing the two probability density functions of the random variable Y=ln(N) found:
1) At the end of Section 3.3. in a purely numeric way and without resorting to the CLT at all (thick curve) and
2) Analytically by using the CLT and the relevant normal (Gaussian) approximation (thin Gaussian curve).
8. DISTANCE OF THE NEAREST
EXTRATERRESTRIAL CIVILIZATION
AS A PROBABILITY DISTRIBUTION
As an application of the Statistical Drake
Equation developed in the previous sections of this
paper, we now want to consider the problem of
estimating the distance of the ExtraTerrestrial
Civilization nearest to us in the Galaxy. In all
Astrobiology textbooks (see, for instance, ref. [10])
and in several web sites, the solution to this
problem is reported with only slight differences in
the mathematical proofs among the various authors.
In the first of the coming two sections (section 7.1)
we derive the expression for this "ET_Distance"
(as we like to denote it) in the classical, non-
probabilistic way: in other words, this is the
classical, deterministic derivation. In the second
section (7.2) we provide the probabilistic
derivation, arising from our Statistical Drake
45
UNCLASSIFIED//FOR OFFICIAL USE ONLY
UNCLASSIFIED//FOR OFFICIAL USE ONLY
Equation, of the corresponding probability density
function f_ET_Distance(r) : here r is the distance
between us and the nearest ET civilization
assumed as the independent variable of its own
probability density function. The ensuing sections
provide more mathematical details about this
f_ET_Distance(r) such as its mean value, variance,
standard deviation, all central moments, mode,
median, cumulants, skewness and kurtosis.
CLASSICAL, NON-PROBABILISTIC
DERIVATION OF THE DISTANCE OF THE
NEAREST ET CIVILIZATION
Consider the Galactic Disk and assume that:
1) The diameter of the Galaxy is (about) 100,000
light years, (abbreviated ly) i.e. its radius,
R_Galaxy, is about 50,000 ly.
2) The thickness of the Galactic Disk at half-way
from its center, h_Galaxy, is about 16,000 ly.
Then
3) The volume of the Galaxy may be
approximated as the volume of the
corresponding cylinder, i.e.
V_Galaxy = π R^2_Galaxy h (100)
4) Now we consider the sphere around us having a
radius r. The volume of such as sphere is
V_Our_Sphere = 4/3 π (ET_Distance / 2)^3 (101)
In the last equation, we had to divide the distance
"ET_Distance" between ourselves and the nearest
ET Civilization by 2 because we are now going to
make the unwarranted assumption that all ET
Civilizations are equally space from each other in
the Galaxy! This is a crazy assumption, clearly,
and should be replaced by more scientifically-
grounded assumptions as soon as we know more
about our Galactic Neighbourhood. At the moment,
however, this is the best guess that we can make,
and so we shall take it for granted, although we are
aware that this is weak point in the reasoning.
Having thus assumed that ET Civilizations
are UNIFORMLY SPACED IN THE GALAXY,
we can write down this proportion:
V_Galaxy / N = V_Our_Sphere / 1. (102)
That is, upon replacing both (100) and (101) into
(102):
π R^2_Galaxy h / N = (4/3 π (ET_Distance / 2)^3) / 1. (103)
The only unknown in the last equation is
ET_Distance, and so we may solve for it, thus
getting the:
(AVERAGE) DISTANCE BETWEEN ANY PAIR
OF NEIGHBOURING CIVILIZATIONS IN
THE GALAXY
ET_Distance = ∛(6 R^2_Galaxy h) / ∛N = C / ∛N (104)
where the positive constant C is defined by
C = ∛(6 R^2_Galaxy h_Galaxy) ≈ 28845 light years (105)
Equations (104) and (105) are the starting point for
our first application of the Statistical Drake
equation, that we discuss in detail in the coming
sections of this paper.
PROBABILISTIC DERIVATION OF THE
PROBABILITY DENSITY FUNCTION FOR
ET_DISTANCE
The probability density function (pdf) yielding
the distance of the ET Civilization nearest to us in
the Galaxy and presented in this section, was
discovered by this author on September 5^th, 2007.
He did not disclose it to other scientists until the
SETI meeting run by the famous mathematical
physicist and popular science author, Paul Davies,
at the "Beyond" Center of the University of
Arizona at Phoenix, on February 5-6-7-8, 2008.
This meeting was also attended by SETI Institute
experts Jill Tarter, Seth Shostak, Doug Vakoch,
Tom Pierson and others. During this author's talk,
Paul Davies suggested to call "the Maccone
distribution" the new probability density function
that yields the ET_Distance and is derived in this
section.
46
UNCLASSIFIED//FOR OFFICIAL USE ONLY
UNCLASSIFIED//FOR OFFICIAL USE ONLY
Let us go back to equation (104). Since N is
now a random variable (obeying the lognormal
distribution), it follows that the ET_Distance must
be a random variable as well. Hence it must have
some unknown probability density function that
we denote by
f_ET_Distance(r) (106)
where r is the new independent variable of such a
probability distribution (it is denoted by r to
remind the reader that it expresses the three-
dimensional radial distance separating us from the
nearest ET civilization in a full spherical symmetry
of the space around us).
The question then is: what is the unknown
probability distribution (106) of the ET_Distance?
We can answer this question upon making the two
formal substitutions
{ N → x
{ ET_distance → y (107)
into the transformation law (8) for random
variables. As a consequence, (104) takes form
y = g(x) = C / ∛x = C · x^(-1/3). (108)
In order to find the unknown probability density
f_ET_Distance(r), we now to apply the rule (9) to
(108). First, notice that (108), when inverted to
yield the various roots x_i(y), yields a single real
root only
x_1(y) = C^3 / y^3. (109)
Then, the summation in (9) reduces to one term
only.
Second, differentiating (108) one finds
g'(x) = -C/3 · x^(-4/3). (110)
Thus, the relevant absolute value reads
|g'(x)| = |-C/3 · x^(-4/3)| = C/3 · x^(-4/3). (111)
Upon replacing (111) into (9), we then find
|g'(x_1)| = C/3 · x_1^(-4/3) = C/3 [C^3/y^3]^(-4/3) = C/3 [C/y]^(-4) = y^4 / (3C^3).
...(112)
This is the denominator of (9). The numerator
simply is the lognormal probability density
function (56) where the old independent variable x
must now be re-written in terms of the new
independent variable y by virtue of (109). By
doing so, we finally arrive at the new probability
density function f_Y(y)
f_Y(y) = 3C^3 / y^4 · 1/C^3 · 1/(√(2π) σ) · e^(-(ln[C^3/y^3] - μ)^2 / (2σ^2)) / (y^3/y^3).
Rearranging and replacing y by r, the final form
is:
f_ET_distance(r) = 3/r · 1/(√(2π) σ) · e^(-(ln[C^3/r^3] - μ)^2 / (2σ^2)). (113)
Now, just replace C in (113) by virtue of (105).
Then:
We have discovered the probability density
function yielding the probability of finding the
nearest ExtraTerrestrial Civilization in the
Galaxy in the spherical shell between the
distances r and r+dr from Earth:
f_ET_Distance(r) = 3/r · 1/(√(2π)) · e^(-(ln[√(6R^2_Galaxy h_Galaxy)/r^3] - μ)^2 / (2σ^2))
holding for r ≥ 0.
STATISTICAL PROPERTIES OF THIS
DISTRIBUTION
47
UNCLASSIFIED//FOR OFFICIAL USE ONLYUNCLASSIFIED//FOR OFFICIAL USE ONLY
We now want to study this probability
distribution in detail. Our next questions are:
1) What is its mean value?
2) What are its variance and standard
deviation?
3) What are its moments to any higher order?
4) What are its cumulants?
5) What are its skewness and kurtosis?
6) What are the coordinates of its peak, i.e.
the mode (peak abscissa) and its ordinate?
7) What is its median?
The first three points in the list are all covered
by the following theorem: all the moments of (113)
are given by (here k is the generic and non-
negative integer exponent, i.e. k = 0,1,2,3,... ≥ 0)
⟨ET_Distance^k⟩ = ∫_0^∞ r^k · f_ET_Distance(r) dr
= ∫_0^∞ r^k · 3/r · 1/(√(2π) σ) · e^(-(ln[C^3/r^3] - μ)^2 / (2σ^2)) dr
= C^k e^(-kμ/3) e^(k^2σ^2/18). (115)
To prove this result, one first transforms the above
integral by virtue of the substitution
ln[C^3/r^3] = z. (116)
Then the new integral in z is then seen to reduce to
the known Gaussian integral (53) and, after several
reductions that we skip for the sake of brevity,
(115) follows from (53). In other words, we have
proven that
⟨ET_Distance^k⟩ = C^k e^(kμ/3) e^(k^2σ^2/18). (117)
Upon setting k = 0 into (117), the
normalization condition for f_ET_Distance(r) follows
∫_0^∞ f_ET Distance(r) dr = 1. (118)
Upon setting k = 1 into (117), the important
mean value of the random variable ET_Distance
is found
⟨ET_Distance⟩ = C e^(μ/3) e^(σ^2/18). (119)
Upon setting k = 2 into (117), the mean value of
the square of the random variable ET_Distance is
found
⟨ET_Distance^2⟩ = C^2 e^(-2μ/3) e^(2σ^2/9). (120)
The variance of ET_Distance now follows from
the last two formulae with a few reductions:
σ^2_ET_Distance = ⟨ET_Distance^2⟩ - ⟨ET_Distance⟩^2
= C^2 e^(-2μ/3) e^(σ^2/9)(e^(σ^2/9) - 1). (121)
So, the variance of ET_Distance is
σ^2_ET_Distance = C^2 e^(-2μ/3) e^(σ^2/9)(e^(σ^2/9) - 1). (122)
The square root of this is the important
standard deviation of the ET_Distance random
variable
σ_ET_Distance = C e^(-μ/3) e^(σ^2/18) √(e^(σ^2/9) - 1). (123)
The third moment is obtained upon setting
k = 3 into (117)
⟨ET_Distance^3⟩ = C^3 e^(-μ) e^(σ^2/2). (124)
Finally, upon setting k = 4 into (117), the fourth
moment of ET_Distance is found
⟨ET_Distance^4⟩ = C^4 e^(-4μ/3) e^(8σ^2/9). (125)
48
UNCLASSIFIED//FOR OFFICIAL USE ONLY
UNCLASSIFIED//FOR OFFICIAL USE ONLY
Our next goal is to find the cumulants of the
ET_Distance. In principle, we could compute all
the cumulants K_i from the generic i-th moment
μ_i by virtue of the recursion formula (see ref. [8])
K_i = μ_i - Σ_{k=1}^{i-1} (i-1 choose k-1) K_k μ_{i-k}. (126)
In practice, however, here we shall confine
ourselves to the computation of the first four
cumulants because they only are required to find
the skewness and kurtosis of the distribution (113).
Then, the first four cumulants in terms of the first
four moments read:
{
K_1 = μ_1
K_2 = μ_2 - K_1^2
K_3 = μ_3 - 3K_1 K_2 - K_1^3 (127)
K_4 = μ_4 - 4K_1 K_3 - 3K_2^2 - 6K_2 K_1^2 - K_1^4.
These equations yield, respectively:
K_1 = C e^(-μ/3) e^(σ^2/18). (128)
K_2 = C^2 e^(-2μ/3) e^(σ^2/9)(e^(σ^2/9) - 1). (129)
K_3 = C^3 e^(-μ)(e^(σ^2/2) - 3e^(5σ^2/18) + 2e^(σ^2/6)). (130)
K_4 = (131)
= C^4 e^(-4μ/3)(e^(8σ^2/9) - 4e^(5σ^2/9) - 3e^(4σ^2/9) + 12e^(2σ^2/3) - 6e^(2σ^2/9))
From these we derive the skewness
K_3 /
(K_4)_2^(3/2) =
e^(-μ)(e^(σ^2/2) - 3e^(5σ^2/18) + 2e^(σ^2/6))
= ―――――――――――――――――――――――――――――――――――――――――――――――.
C^3(e^(8σ^2/9) - 4e^(5σ^2/9) - 3e^(4σ^2/9) + 12e^(2σ^2/3) - 6e^(2σ^2/9))^(3/2)
...(132)
and the kurtosis
K_4 / (K_2)^2 = e^(4σ^2/9) + 2e^(σ^2/3) + 3e^(2σ^2/9) - 6. (133)
Next we want to find the mode of this
distribution, i.e. the abscissa of its peak. To do so,
we must first compute the derivative of the
probability density function f_ET_Distance(r) of (113),
and then set it equal to zero. This derivative is
actually the derivative of the ratio of two functions
of r, as its plainly appears from (113). Thus, let us
set for a moment
E(r) = (ln[C^3/r^3] - μ)^2 / (2σ^2). (134)
where "E" stands for "exponent." Upon
differentiating,
one gets
E'(r) = 1/(2σ^2) · 2(ln[C^3/r^3] - μ) · C^3 · (-3) · r^(-4) / (C^3/r^3)
= 1/σ^2 · (ln[C^3/r^3] - μ) · (-3) · 1/r. (135)
But the probability density function (113) now
reads
f_ET_Distance(r) = 3/(√(2π) σ) · e^(-E(r)) / r (136)
So that its derivative is
df_ET_Distance'(r)/dr = 3/(√(2π) σ) · (-e^(E(r)) E'(r) · r - 1 · e^(E(r))) / r^2
49
UNCLASSIFIED//FOR OFFICIAL USE ONLYUNCLASSIFIED//FOR OFFICIAL USE ONLY
= 3/(√(2π) σ) · (-e^(-E(r)) [E'(r) · r + 1]) / r^2. (137)
Setting this derivative equal to zero means setting
E'(r) · r + 1 = 0 (138)
That is, upon replacing (135) into (138), we get
1/σ^2 · (ln[C^3/r^3] - μ) · (-3) · 1/r · r + 1 = 0 (139)
Rearranging, this becomes
-3(ln[C^3/r^3] - μ) + σ^2 = 0 (140)
that is
-3ln[C^3/r^3] + 3μ + σ^2 = 0 (141)
whence
ln[C/r] = μ/3 + σ^2/9 (142)
and finally
r_mode ≡ r_peak = C e^(-μ/3) e^(-σ^2/9). (143)
This is the most likely ET_Distance from Earth.
How likely ?
To find the value of the probability density
function f_ET_Distance(r) corresponding to this value
of the mode, we must obviously replace () into ().
After a few rearrangements, which we skip for the
sake of brevity, one gets
Peak Value of f_ET_Distance(r) ≡ f_ET_Distance(r_mode)
= 3/(C√(2π) σ) · e^(μ/3) · e^(σ^2/18).
...(144)
This is the peak height in the pdf f_ET_Distance(r).
Next to the mode, the median m (ref. [9]) is one
more statistical number used to characterize any
probability distribution. It is defined as the
independent variable abscissa m such that a
realization of the random variable will take up a
value lower than m with 50% probability or a value
higher than m with 50% probability again. In other
words, the median m splits up our probability
density in exactly two equilly probable parts. Since
the probability of occurrence of the random event
equals the area under its density curve (i.e. the
definite integral under its density curve) then the
median m (of the lognormal distribution, in this
case) is defined as the integral upper limit m:
∫_0^m f_ET_Distance(r)dr - 1/2 (145)
Upon replacing (113), this becomes
∫_0^m 3/r · 1/(√(2π) σ) · e^(-(ln[C^3/r^3] - μ)^2 / (2σ^2)) = 1/2. (146)
In order to find m, we may not differentiate (146)
with respect to m, since the "precise" factor ½ on
the right would then disappear into a zero. On the
contrary, we may try to perform the obvious
substitution
z^2 = (ln[C^3/r^3] - μ)^2 / (2σ^2) z ≥ 0 (147)
into the integral (146) to reduce it to the following
integral (85) defining the error function erf(z). Then,
after a few reductions that we leave to the reader as
an exercise, the full equation (145), defining the
median, is turned into the corresponding equation
involving the error function erf(x) as defined by (85):
50
UNCLASSIFIED//FOR OFFICIAL USE ONLYUNCLASSIFIED//FOR OFFICIAL USE ONLY
Random variable | ET_Distance between any two neighboring ET
Civilizations in Galaxy assuming they are UNIFORMLY
distributed throughout the whole Galaxy volume.
Probability distribution | Unnamed (Paul Davies suggested "Maccone distribution")
Probability density function | f_ET_Distance(r) = 3/r · 1/(√(2π) σ) · e^(-(ln[√(6R^2_Galaxy h_Galaxy)/r^3] - μ)^2 / (2σ^2))
(Defining the positive numeric constant C) | C = ∛(6 R^2_Galaxy h_Galaxy) ≈ 28845 light years
Mean value | ⟨ET_Distance⟩ = C e^(-μ/3) e^(σ^2/18)
Variance | σ^2_ET_Distance = C^2 e^(-2μ/3) e^(σ^2/9)(e^(σ^2/9) - 1)
Standard deviation | σ_ET_Distance = C e^(-μ/3) e^(σ^2/18) √(e^(σ^2/9) - 1)
All the moments, i.e. k-th moment | ⟨ET_Distance^k⟩ = C^k e^(kμ/3) e^(k^2σ^2/18)
Mode (= abscissa of the probability density function peak) | r_mode ≡ r_peak = C e^(-μ/3) e^(-σ^2/9)
Value of the Mode Peak | Peak Value of f_ET_Distance(r) =
≡ f_ET_Distance(r_mode) = 3/(C√(2π) σ) · e^(μ/3) · e^(σ^2/18)
Median (= fifty-fifty probability value for ET_Distance) | median = m = C e^(μ/3)
Skewness | K_3 / (K_4)_2^(3/2) = e^(-μ)(e^(σ^2/2) - 3e^(5σ^2/18) + 2e^(σ^2/6)) / (C^3(e^(8σ^2/9) - 4e^(5σ^2/9) - 3e^(4σ^2/9) + 12e^(2σ^2/3) - 6e^(2σ^2/9))^(3/2))
Kurtosis | K_4 / (K_2)^2 = e^(4σ^2/9) + 2e^(σ^2/3) + 3e^(2σ^2/9) - 6
Expression of μ in terms of the lower (a_i) and upper (b_i) limits of the Drake uniform input random variables D_i | μ = Σ_{i=1}^{7} ⟨Y_i⟩ = Σ_{i=1}^{7} (b_i[ln(b_i) - 1] - a_i[ln(a_i) - 1]) / (b_i - a_i)
Expression of σ^2 in terms of the lower (a_i) and upper (b_i) limits of the Drake uniform input random variables D_i | σ^2 = Σ_{i=1}^{7} σ_{Y_i}^2 = Σ_{i=1}^{7} 1 - a_i b_i [ln(b_i) - ln(a_i)]^2 / (b_i - a_i)^2
Table 3. Summary of the properties of the probability distribution that applies to the random variable ET_Distance
yielding the (average) distance between any two neighboring communicating civilizations in the Galaxy.
51
UNCLASSIFIED//FOR OFFICIAL USE ONLYUNCLASSIFIED//FOR OFFICIAL USE ONLY
1/2 + erf( (ln[C^3/m^3] - μ) / (√2 σ) ) = 1/2 (148)
that is
erf( (ln[C^3/m^3] - μ) / (√2 σ) ) = 0 (149)
Since from the definition (147) one obviously has
erf(0)=0, (149) yields
(ln[C^3/m^3] - μ) / (√2 σ) = 0 (150)
whence finally
median = m = C e^(μ/3). (151)
This is the median of the lognormal distribution of
N. In other words, this is the number of
ExtraTerrestrial civilizations in the Galaxy such
that, with 50% probability the actual value of N will
be lower than this median, and with 50% probability
it will be higher.
In conclusion, we feel useful to summarize all the
equations that we derived about the random variable
N in the following Table 2.
NUMERICAL EXAMPLE OF THE
ET_DISTANCE DISTRIBUTION
In this section we provide a numerical
example of the analytic calculations carried on so
far.
Consider the Drake Equation values reported
in Table 1. Then, the graph of the corresponding
probability density function of the nearest
ET_Distance, f_ET_Distance(r), is shown in Figure 6.
DISTANCE OF NEAREST ET_CIVILIZATION
[Figure 6. Graph showing probability density function of ET_Distance, with y-axis labeled "Probability density function (1/meters)" ranging from 0 to 5.63·10^-20, and x-axis labeled "ET_Distance from Earth (light years)" ranging from 0 to 5000. The curve peaks around 2000 light years.]
Figure 6. This is the probability of finding the nearest ExtraTerrestrial Civilization at the distance r from
Earth (in light years) if the values assumed in the Drake Equation are those shown in Table 1. The relevant
probability density function f_ET Distance(r) is given by equation (113). Its mode (peak abscissa) equals 1933
light years, but its mean value is higher since the curve has a high tail on the right: the mean value equals in
52
UNCLASSIFIED//FOR OFFICIAL USE ONLY
UNCLASSIFIED//FOR OFFICIAL USE ONLY
fact 2670 light years. Finally, the standard deviation equals 1309 light years: THIS IS GOOD NEWS FOR
SETI, inasmuch as the nearest ET Civilization might lie at just 1 sigma = 2670-1309 = 1361 light years
from us.
From Figure 6, we see that the probability of
finding ExtraTerrestrials is practically zero up to a
distance of about 500 light years from Earth. Then
it starts increasing with the increasing distance
from Earth, and reaches its maximum at
r_mode ≡ r_peak = C e^(-μ/3) e^(-σ^2/9) ≈ 1933 light years. (152)
This is the MOST LIKELY VALUE of the
distance at which we can expect to find the
nearest ExtraTerrestrial civilization.
It is not, however, the mean value of the
probability distribution (113) for f_ET_Distance(r). In
fact, the probability density (113) has an infinite
tail on the right, as clearly shown in Figure 6, and
hence its mean value must be higher than its peak
value. As given by (119), its mean value is
r_mean_value = C e^(μ/3) e^(σ^2/18) ≈ 2670 light years. (153)
This is the MEAN (value of the) DISTANCE
at which we can expect to find ExtraTerrestrials.
After having found the above two distances (1933
and 2670 light years, respectively), the next natural
question that arises is: "what is the range, forth and
back around the mean value of the distance, within
which we can expect to find ExtraTerrestrials with
"the highest hopes ?," The answer to this question
is given by the notion of standard deviation, that
we already found to be given by (123)
σ_ET_Distance = C e^(-μ/3) e^(σ^2/18) √(e^(σ^2/9) - 1) ≈ 1309 light years.
...(154)
More precisely, this is the so called 1-sigma
(distance) level. Probability theory then shows that
the nearest ExtraTerrestrial civilization is expected
to be located within this range, i.e. within the two
distances of (2670-1309) = 1361 light years and
(2670+1309) = 3979 light years, with probability
given by the integral of f_ET_Distance(r) taken in
between these two lower and upper limits, that is:
∫_{1361lightyears}^{3979lightyears} f_ET Distance(r) dr ≈ 0.75 = 75% (155)
In plain words: with 75% probability, the nearest
ExtraTerrestrial civilization is located in between
the distances of 1361 and 3979 light years from us,
having assumed the input values for the Drake
Equation given by Table 1. If we change those
input values, then all the numbers change again.
9. THE "DATA ENRICHMENT
PRINCIPLE" AS THE BEST CLT
CONSEQUENCE UPON THE
STATISTICAL DRAKE EQUATION
(ANY NUMBER OF FACTORS
ALLOWED)
As a fitting climax to all the statistical
equations developed so far, let us now state our
"DATA ENRICHMENT PRINCIPLE," It simply states that
"The Higher the Number of Factors in the
Statistical Drake equation, The Better,"
Put in this simple way, it simply looks like a
new way of saying that the CLT lets the random
variable Y approach the normal distribution when
the number of terms in the sum (4) approaches
infinity. And this is the case, indeed. However, our
"Data Enrichment Principle" has more profound
methodological consequences that we cannot
explain now, but hope to describe more precisely
in one or more coming papers.
CONCLUSIONS
We have sought to extend the classical Drake
equation to let it encompass Statistics and
Probability.
This approach appears to pave the way to
future, more profound investigations intended not
only to associate "error bars" to each factor in the
Drake equation, but especially to increase the
number of factors themselves. In fact, this seems to
be the only way to incorporate into the Drake
53
UNCLASSIFIED//FOR OFFICIAL USE ONLYUNCLASSIFIED//FOR OFFICIAL USE ONLY
equation more and more new scientific information
as soon as it becomes available. In the long run,
the Statistical Drake equation might just become a
huge computer code, growing up in size and
especially in the depth of the scientific information
it contained. It would thus be Humanity's first
"Encyclopaedia Galactica."
Unfortunately, to extend the Drake equation to
Statistics, it was necessary to use a mathematical
apparatus that is more sophisticated than just the
simple product of seven numbers.
When this author had the honour and privilege
to present his results at the SETI Institute on April
11^th, 2008, in front of an audience also including
Professor Frank Drake, he felt he had to add these
words: "My apologies, Frank, for disrupting the
beautiful simplicity of your equation,"
ACKNOWLEDGEMENTS
The author is grateful to Drs. Jill Tarter, Paul
Davies, Seth Shostak, Doug Vakoch, Tom Pierson,
Carol Oliver, Paul Shuch and Kathryn Denning for
attending his first presentation ever about these
topics at the "Beyond" Center of the University of
Arizona at Phoenix on February 8^th, 2008. He also
would like to thank Dan Werthimer and his School
of SETI young experts for keeping alive the
interplay between experimental and theoretical
SETI. But the greatest "thanks" goes of course to
the Teacher to all of us: Professor Frank D. Drake,
whose equation opened a new way of thinking
about the past and the future of Humans in the
Galaxy.
REFERENCES
[1] http://en.wikipedia.org/wiki/Drake_equation
[2] http://en.wikipedia.org/wiki/SETI
[3] http://en.wikipedia.org/wiki/Astrobiology
[4] http://en.wikipedia.org/wiki/Frank_Drake
[5] Athanasios Papoulis and S. Unnikrishna Pillai,
"Probability, Random Variables and Stochastic
Processes", Fourth Edition, Tata McGraw-Hill,
New Delhi, 2002. ISBN 0-07-048658-1.
[6] http://en.wikipedia.org/wiki/Gamma_distribution
[7]
http://en.wikipedia.org/wiki/Central_limit_theore
m
[8] http://en.wikipedia.org/wiki/Cumulants
[9] http://en.wikipedia.org/wiki/Median
[10] Jeffrey Bennett and Seth Shostak, "Life in the
Universe", Second Edition, Pearson - Addison-
Wesley, San Francisco, 2007. ISBN 0-8053-
4753-4. See in particular page 404.
54
UNCLASSIFIED//FOR OFFICIAL USE ONLY
UNCLASSIFIED//FOR OFFICIAL USE ONLY
References
[1] Benford, Gregory, Jim and Dominic, "Cost Optimized Interstellar Beacons: SETI",
arXiv.org web site (22 Oct. 2008).
[2] Carl Sagan, "Cosmos", Random House, New York, 1983. See in particular the pages
298-302.
[3] Bennet, Jeffrey, and Shostak, Seth, "Life in the Universe", second edition, Pearson –
Addison Wesley, San Francisco, 2007. See in particular page 404.
[4] C. Maccone, "The Statistical Drake Equation", paper #IAC-08-A4.1.4 presented on
October 1^st, 2008, at the 59^th International Astronautical Congress (IAC) held in
Glasgow, Scotland, UK, September 29^th thru October 3^rd, 2008.
55
UNCLASSIFIED//FOR OFFICIAL USE ONLY