In his 1970 essay Contingency and Natural Law, Wolfhart Pannenberg presented one of his peculiar themes, the concept of a

The thrust of this concept is to harmonize the apparent contingency of the Creation, flowing forward from the Beginning, to the hidden persistent intervention of the Creator, flowing backward from the End. In his later 1981 essay Theological Questions to Scientists, ² Pannenberg pushes forward his argument by making explicit his ambitious agenda of a unified understanding of God's Creation, including both the making of theology and the making of science. He asks scientists a number of specific questions, all crucial to his enterprise, among which I select the second:

As a mathematician working mainly at the interface between stochastics, statistics, commutative algebra, and differential geometry, I feel Pannenberg's statement especially inspiring. In fact, keywords like continuity and irreversibility are technical terms in mathematics, and the very concept of a continuity from the end does have many counterparts in current research. Theological work on "how God acts" ³ has taken up various different themes from physics, including quantum physics, Relativity Theory, and cosmology. ⁴ Pannenberg himself sought analogies to a variety of basic concepts in physics such as inertia, laws of thermodynamics, and field theory.

Here I pursue another of Pannenberg's suggestions by considering the contribution of the mathematical theory of random process and statistical physics. ⁵ Of note, the terms "random" and "statistics" do not imply any ontological indeterminism; instead, they are simply meant as the theory of counting frequencies of occurrence. This formalism fits well with a Humean theory of causality and other, more general, theories of probabilistic causation. ⁶ The idea of inverting time to assess probabilistic causality is shared by several applied models, such as Markov chains, stochastic control, Kingman's coalescent, conditional independence, and Bayesian networks. Like others working within mainstream statistics, I feel that probabilistic causation is not a fully satisfying concept. Nonetheless, these models do have something to reveal about Time, as they are not ad hoc models but general formalisms whose structure is ultimately algebraic, hence, logical.

This paper is a development, in the direction suggested by the focus of this Conference, of my research presented to the Research Group "Le parole della Scienza" (The Words of Science) directed by Professor Giandomenico Boffi. It is part of the research program "Scienza e fede sull'Interpretazione del Reale" (SEFIR) of the ISSR Ecclesia Mater, Pontificia Università Lateranense (Vatican City). ⁷

Probability

Probability is the mathematics of the contingent. Each contingent event is seen in the perspective of what could have been by considering it as a special instance among a universe of possibilities. In a sense, all events in a given universe share the same ontological status, that is, what has actually happened is no more real than what could have happened.

In the mathematical model, the universal set is qualified by giving a full description, which reduces the uncertainty considerably. A special (experimental) realization of this setting is known from the antique practice of casting lots or dice. The corresponding Hebrew and Greek words appear in the Bible many times: for example, "lots" appears in the New Revised Standard Version (NRSV) 48 times. The casting of lots is used to decide among equivalent possibilities, the implied concept being the affirmation of God's justice. In Acts 1. 15-26, the selection of Matthias to take the place of Judas follows a clear scheme. First, the universe of all possible candidates is defined: "one of the men who have accompanied us during all the time the Lord Jesus went in and out among us"; second, the decision is left to the Lord: "And they cast the lots for them".

The basic scheme has further evolved in modern times by adding to the universe of possibilities a numerical evaluation of the probability of each event. The probabilities were first all equal. Then it was observed that the probabilities of complex events could be unequal; for example, with two dice the probability of throwing a total of 6 is smaller than the probability of throwing a total of 7. The next step was to associate probability elicitation with the state of information. The probability that the top card in a well-shuffled deck is 1 in 13, but for a tricky dealer who knows the bottom card the probability is not the same. Today we distinguish between a theory of probability, where a single evaluation of the probabilities is considered, and a theory of statistics, where many such evaluations are considered and compared as statistical hypotheses.

Formally, a statistical model consists of the choice of a universe U of possible elementary events and a set of probability functions P, in each of which a subset of cases A, called events, is mapped to their probability P(A), which is a number between 0 and 1, extremes included. Importantly, the successful construction of a statistical model is based on two distinct modeling efforts. The essential part of the statistical model is the a priori delimitation of the set of all possibilities, i.e., the universe of all possible elementary events U. This basic step is easily overlooked in everyday life: before guessing my chances of winning in a lottery game, I must know how many numbers there are the urn. Such information absent, I have to make several, possibly subjective, assumptions or go one step up in the complexity of my model and think of a set of possible urns. In the second step, one wants to apply to each event one or more rules to compute the probability: this is, in fact, a generalization of the operation of counting in the form of a relative evaluation of the part upon the total.

In the current practical use of probability in science there are at least three possible approaches that can be clearly illustrated with the following thought experiment. ⁸ I show a coin to a friend and ask him the probability the coin will come up heads. Then, I toss the coin, but I quickly cover the result with my hand, so that it is hidden to both of us, and I ask him again to guess the probability. Finally, I take a look at the result without letting my friend see it. Again, what is the probability? In the first case, the probability was 1 in 2 for both of us: this probability is actually a statement on the statistical properties of a mechanical system. It is the probability that statistical physics uses. In the second case, the event is actually realized, but neither of us knows the outcome. This is the probability based on the existence of hidden variables. In the final case, the probability is 1 in 2 for my friend, but is either 0 or 1 for me. This is the subjective probability of Game Theory and economics. Correspondingly, there are at least three competing philosophical schools trying to assert one interpretation over the other. As a mathematician, I do not care to choose from among them. I am happy enough with the knowledge I can apply to so many different situations.

The approach to probability and chance I have just described is modern and is usually called the axiomatic approach. In this approach, the probabilistic framework is not contrasted with the deterministic one, but the latter is thought of as a special case of the former. Deterministic models are included, as limit cases, in the statistical model by considering a probability function that assigns probability 1 to a single elementary event and probability 0 to all the others.

Independence and Conditional Independence

Mathematicians are trained to first consider the simplest, but still non-trivial, example conceivable without destroying the problem itself. In fact, mathematics consists of highly reliable statements about very simple conceptual objects. Because this type of knowledge can be successfully applied to very big complex objects, mathematicians may be inclined to embrace Platonic philosophy (where reality is grounded on perfect Truth we can imperfectly know through its reflections in our minds) or spiritualistic philosophy (both the necessary and the contingent spring from the Truth).

In the first, the simplest non-trivial example, a mathematician has a reliable starting point in his research program, in that the identification itself of what is simple-but-non-trivial is the very first description of the object under study. If we want to discuss Time, we need at least three times - past, present, and future - and at least one event, together with its negation, to represent the state of the world at each of these three times. This leaves us with a set of possible cases of the form abc, each element of the triple being binary. For instance, 110 means that something that was true in the past and is still true now will be false in the future; the event "true now" is represented by the set of the four cases that have 1 in the middle position: 010, 011, 110, 111.

In the first half of the past century, several authors identified an important concept to deal with our simple-but-nontrivial world. Various different terms are used: sufficiency (Ronald Fisher), conditional independence, chain property (Andrey Markov). We use the term conditional independence ⁹ to refer to the notion of statistical independence.

In statistical models, the process of learning from experience is represented by the computation of conditional probability: the probability P(A) is updated to be P(A|B) = P(AB)/P(B) after B is observed. Two events, A and B, are independent if the observation of B does not change the probability of A: P(A) = P(A|B). But contrary to what the defining phrase seems to suggest, independence is a property of the probability function and not of the events themselves.

Importantly, this definition applies to deterministic situations: if either A or B is a sure event (P(A) = 1 or P(B) = 1, respectively), then the events A and B are independent.

It is also important to understand that this relation is symmetric, i.e., P(A|B) = P(A) if, and only if, P(B|A) = P(B) because they are both equivalent to P(AB) = P(A)P(B). ¹⁰ [By way of example, we can determine whether "healing" is independent of "praying" by checking (backward) the relative frequencies of praying ill people in the two classes "healed" and "not healed".]

We can say that, in our small world, past and future are conditionally independent, given the present, if for all triple ABC, A event of the past, B event of the present, and C event of the future, the following equivalent statements hold:

While all statements 1-4 are logically equivalent, each of them reveals a specific facet of the topic.

If we interpret the past as "what has already been decided" and the future as "what is not yet realized or known", the notion of conditional independence provides a definition of present. ¹¹

Incorrectly, present time is sometimes conceived of as a point in the representation of time as a real line. In contrast to this naïve definition, here we consider the present to be the relevant observable that separates past and future in the sense of conditional independence. Ancient physics was wrong in postulating that the present state of a freely moving particle is its current position. Modern mechanics correctly formulated the definition of present state, which entails knowing a particle's current position and current velocity. If I know its position and velocity, I do not need any further information about its past behavior to predict its future trajectory. The present should not be conceived of as something as thin as a point but rather as something thick enough to be a complete summary of the past for the purpose of predicting the future under a given model.

Family tree versus Pedigree

The Gospel of Luke 4. 23-38 presents a backward genealogy of Jesus's male ancestors, while the Gospel of Matthew 1. 2-16 gives a subset of the forward family tree of Jesus. A complete forward genealogy is a highly complex structure, even if one were to go a few generations back in time, whereas a backward pedigree is simpler, as every individual has one mother and one father. Recognizing the difficulty of his enterprise, Matthew feels forced to list five women: Tamar, Rahab, Ruth, "the wife of Uriah", and Mary.

A mathematical model of genealogy would probably take as elementary events the individual described by the backward full pedigree. As a mathematical object, a pedigree is a graph of a special type called tree. ¹² Individuals are connected to each other by common ancestors. An individual who is an ancestor of another individual will have his binary tree as a sub-binary tree of the latter. As the number of ancestors of each individual grows exponentially, - i.e., the number of ancestors in the n-th backward generation is 2^n, while the total number of living individuals decreases the further back in the past we look - common ancestors abound in this model: here a theorem states that we are all brothers and sisters.

Given an individual in any time of the past, and given a knowledge of the full structure, a forward family tree is uniquely defined. Such a model is certainly correct because it corresponds to the basic structure of humanity as we know it, even allowing for contemporary assisted reproductive technologies. However, as Pannenberg points out, because every scientific model is contingent to some assumed initial condition, we cannot project our model into an infinite past.

Our model of each individual seen as a backward binary tree can be decorated by adding to each individual a particular characteristic of that individual, e.g., sex, phenotype, genotype, name, surname, social rank and so forth. A statistical model is a probability of the full universe of (decorated) binary pedigrees. It will have an unattainable mathematical complexity. Viable models are derived by assuming some sort of invariance or exchangeability and/or some form of conditional independence and by proving that a unique global probability model exists which is compatible with the given assumptions. For example, we could assume that at every generation the matching is casual, that is to say, every female-male couple has the same probability to bear a child. This, in turn, opens the way to compute the probability of future events. One example of a future event is the extinction of the descendant line of a given group of individuals.

It is remarkable that one of the first accounts of what we now call stochastics was a study by Sir Francis Galton (a cousin of Charles Darwin!) and Reverend Henry William Watson entitled On the Probability of Extinction of Families (1874). Other applications of stochastics came later: statistical physics developed by Ludwig E. Boltzmann (1844-1906), control technology ("rocket science"), mathematical finance (Fischer Black and Myron Scholes, 1973).

As was illustrated in the toy model above, almost all genetic research is based on the construction of backward models, which are tractable, whereas the forward model is not. A notable example is the coalescence theory. Quoting Wikipedia, "an evolutionary lineage is a sequence of species that form a line of descent, each new species being the direct result of speciation from an immediate ancestral species". This description is a forward description. Kingman's coalescent process, however, is a backward process where the key process is not speciation but rather the coalescence of two species into a common ancestor. ¹³

The complexity of time evolution depends on the direction of time. I now turn to illustrate the tools used in stochastics to deal with the flow of time.

Time

The simplest statistical model involving time flow concerns random times. It is one of the first examples we teach in a probability course. A random time is a random variable T whose values are positive real numbers, e.g., failure times. Some device fails because of ageing. I am an example. Other devices fail because of unpredictable circumstances, e.g., a glass. A glass fails (breaks) because it slips out of my hand and not because it grows older and dies. In other words, it is "used as new". This is modeled by saying that the conditional probability, now at time t, to survive up to time (s+t), given that it has not yet failed, is equal to the unconditional probability to survive up to time s.

From such qualitative assumptions based on conditional probabilities computed at a generic time, we can derive a closed form formula for the probability of survival at time t. What is remarkable in this model is that time is treated in a special way and that the argument is not based on a specific time, but on all times.

The issue of flowing time in stochastics is framed using a special construction. We assume an infinite interval of times and for each time we define a set F_t of events that are past or present of time t. Likewise, we define a set F^t of events that are present or future of time t. The full model consists of:

Here I need to point out that there is an extra structure F_t, t in I, constricting the flowing of time.

Each given probability P will have, for each time t, sets of sufficient present events, able to separate the past of time t from the its future. Many problems can be discussed under this scheme, in particular, time reversal, sufficiency, stationarity, and behavior in the infinite future and in the infinite past. ¹⁴

Time arrow

The mathematical formalism associated with Kolmogorov and Doob, which I shortly described above, has been quite successful in dealing with a number of specific applications, such as mathematical finance and statistics of time series. It assumes the specific structure of time to be continuous, one dimensional, and ordered from past to future. As time flows, the set of information available to a specific (ideal) observer increases in an unending process as time passes. Such a structure of time is embedded into the basic axiomatics of the calculus of probability by endowing the Kolmogorov axiomatics - consisting of a universe U, a set of events F, and a probability function P - with the further structure F_t to represent the evolution of information.

The formalism in itself does not really explain anything about the nature nor the origin of time, but only the fact that any coherent notion of time must deal with the correlated notion of information, precisely the fact that the flow of time incessantly transforms a potentiality into a fact: "wait and see".

Information itself has its own mathematical model as explained by Boltzmann-Gibbs-Shannon entropy. Entropy is a measure of the set of probability functions which has no counterpart in deterministic models. As a mathematical model it is quite simple, being nothing else than the mean value of the logarithm of frequencies. But in the hierarchy of mathematical models it ranks before probability and stochastics. Information Theory based on entropy underpins much of the spectacular technological achievements obtained in communication theory through data compression. ¹⁵

Because the statistical models of relevant physical systems present a monotone evolution of entropy and because the notion of entropy seems to capture at least some features of the relation between information and flow of time, Reichenbach suggested (see note 11) that an explanation for the direction of time, the "time arrow", is the direction of increasing entropy. While this idea has its obvious merits, having become part of today's popular culture where the end of the universe is coupled with the state of maximal global entropy, it must be admitted that it does not really explain very much. When a quantity increases steadily in time, we can always take this quantity as a measure of time, in the same way that we measure the passing of years by measuring the height of a child. In physics, the arrow of time is still considered an open problem, but Reichenbach's proposal apparently does not attract much attention. ¹⁶

Conclusion

The aim of my paper is to take ideas and languages from the mathematical theory of stochastic processes and place them in the wider perspective suggested by W. Pannenberg.

First, the theory of stochastic processes in the form it is usually associated to the name of Andrey Kolmogorov (1903-1987) has specific tools to deal with time evolution, information, causality, and inversion of the arrow of time.

Second, the probabilistic framework does not contrast with a deterministic view, the deterministic case being thought of as a limit case (borderline case) of the probabilistic case, in the same way classical mechanics is a limit case of both relativistic and quantum mechanics.

Third, the notion of causation implied by probabilistic theory, i.e., probabilistic causation, is actually a notion of mutual correlation, so that a true causality argument needs to be developed in the direction of a theory of experimentation or a theory of intervention. ¹⁷

Fourth, conditional independence between past and future, given the present state, provides a consistent conceptual framework within which to discuss the meaning of present time and the consistency of forward causation together with backward causation. In particular, both causations are meaningful and mutually consistent but not symmetric, i.e., the respective conditional probabilities need not be equal. This specific point is the main contribution I offer in this paper.

Fifth, a notion of causation not dependent on a notion of time flow, i.e., dropping the first clause of Hume's definition, is studied in network structures, e.g., social networks. In this case, the universe of possible elementary events is not the set of all possible time trajectories of the system under study, but rather the set of all possible networks. ¹⁸

Sixth, a full theory of stochastic processes, when the relation between time and space is that given by Special Relativity is still lacking and little research is being done in this direction. The theory in the quantum case presents peculiar problems, which are outside my expertise, but it seems to be an area of active development. ¹⁹

As a bottom line, I must say that none of the mathematical models I have described answers in a fully satisfactory way the second of Pannenberg's Theological Questions to Scientists, in that those mathematical models fall short in dealing convincingly with infinite time both backward and forward. This topic being outside the scope of this paper, I wish to mention just two keywords I feel should be considered to this end, namely, stationarity and ergodicity. However, the fact remains that the mathematical theory of stochastic processes takes Pannenberg's suggestion of studying nature as history very seriously.

Notes

I quote from W. C. Linss' translation which appeared in W. Pannenberg. Toward a Theology of Nature ed. Ted Peters, Westminster/John Knox Press 1993:106. The quotation is an excerpt from a fully argued case, see especially pp. 81-86 and 105-108. ↩
W. Pannenberg, Theological Questions to Scientists, Zygon (1981) 15(1):65-76, reprinted in Toward a Theology of Nature 15-28. ↩
I refer to the recent account by Nicholas Saunders, Divine Action and Modern Science, Cambridge University Press 2002. ↩
From the lengthy bibliography, I cite here only Robert John Russell, Time in Eternity: Pannenberg, Physics, and Eschatology in Creative Mutual Interaction, University of Notre Dame Press 2012, because of the emphasis it gives to Pannenberg’s ideas on time and because of the explicit encouragement the author gives to the contribution of Christian scientists by their involvement in what he terms Creative Mutual Interaction (CMI). ↩
As discussed by Lawrence Sklar, Physics and Chance: Philosophical Issues in the Foundations of Statistical Mechanics Cambridge University Press 1993. See also a new book by Sergio Chibbaro, Lamberto Rondoni, and Angelo Vulpiani, Reductionism, Emergence and Levels of Reality. The Importance of Being Borderline, Springer 2014 ↩
David Hume An Enquiry Concerning Human Understanding 1739 Sec. VI-VII. Patrick Suppes A Probabilistic Theory of Causality North-Holland 1970. ↩
G. Pistone: Caso: parola della scienza? in Nuova Civiltà delle Macchine XXVII 2009: 71-80; Predire dall'inizio o spiegare dalla fine? ivi 1/2012:43-50; Complessità, statistica e società ivi 4/2012:49-58. Sergio Rondinara ed. Dio come Spirito e le scienze della natura: in dialogo con Wolfhart Pannenberg SEFIR/Città Nuova 2008. ↩
I learned this example from Professor David Spigelhalter (Cambridge). ↩
Professor A. Philip Dawid formalized conditional independence and emphasized its importance in statistical research in a seminal paper discussed by the Royal Statistical Society: A. P. Dawid (1979). Conditional Independence in Statistical Theory. J. R. Stat. Soc. B, 41, 1-31. ↩
Here and in the following I leave aside the marginal technical issue of zero probability events. ↩
A similar notion was called common cause principle by Hans Reichenbach in his posthumous The Direction of Time (Maria Reichenbach ed.) University of California Press 1956. ↩
A graph is a mathematical structure G = (V,F), where V is a finite set and F is a set of couples in V. A graph is a drawing consisting of points connected by arrows. A tree is a graph without cycles. Cf the Jesse tree in Isaiah 11:1. ↩
A concise history is given by Sir John F. C. Kingman himself in "Origins of the Coalescent: 1974-1982" Genetics 156(2000):1461–1463. ↩
This is a very rough summary of the modern theory of stochastic processes as formulated by Andrey N. Kolmogorov in Grundbegriffe der Wahrscheinlichkeitsrechnung (1933) and by Joseph L. Doob in Stochastic Process (1953). ↩
A standard introductory textbook is T. M. Cover and J. A. Thomas Elements of Information Theory, 1991. ↩
See, for example, the conference proceedings edited by Sergio Albeverio and Philippe Blanchard Direction of Time, Springer 2014. ↩
Is smoking a cause of lung cancer? An experiment consists of the selection of two groups, smokers and non smokers, to compare the incidence of lung cancer in the two groups. The theory of statistical experiments was developed by Ronald Fisher in his groundbreaking treatise Statistical Methods for Research Workers (1925 first edition). It is the standard approach of medical research. This is contrasted to an intervention that would impose cessation of smoking in a group of both smokers and non smokers and then compare the outcome with the uncontrolled group. The difference is the treatment of possible hidden variables. The latter approach was suggested by Judea Pearl in Causality: Models, Reasoning, and Inference, Cambridge University Press 2000. A general discussion of the philosophical notion of experiment is given by Marco Buzzoni in his Esperimento e realismo scientifico. Saggio su David Goodwin, Loffredo Napoli 2001. ↩
See Steffen L. Lauritzen, Graphical Models, Clarendon Press, Oxford, 1996, and Rick Durrett, Random Graph Dynamics, Cambridge University Press 2006. ↩
I quote only, from among many others, the (highly technical) account by Luigi Accardi, Alberto Frigerio and John T. Lewis, "Quantum Stochastic Processes", Publ. RIMS, Kyoto University 18 (1982) 97-133. ↩

Creation from the End: Mathematical Analogies