Correlated Structures in Random Data

GCP Background

Laboratory Research

The history of controlled laboratory research on interactions of human consciousness with physical random systems tracks the development of microelectronics and computers. The first large database experiments were conducted by Helmut Schmidt, at Boeing Laboratories, in the late '60s and early '70s. The number of experiments and investigators grew over the next decade, and in 1979, Robert Jahn, at Princeton University, established the Princeton Engineering Anomalies Research (PEAR) laboratory to focus on an engineering approach to the question whether sensitive electronic devices including random components might be affected by special states of consciousness, including strong emotions and directed intention. I joined the PEAR group in 1980.

REG Experiments

At the PEAR lab, the primary experiment used a custom designed Random Event Generator (REG or RNG) incorporating a well-developed commercial source of electronic white noise. This bench-top experiment provided control over parameters such as the speed and size of the samples drawn from the random sequence of bits. For example it might be set to collect a 200 bit sample at a rate of 1000 bits per second, and to register a trial each second consisting of the sum of the 200 bits. The equipment displayed the current output trial value and a running mean as feedback to the operator. The experiment used a tripolar protocol, with instructions to maintain an intention to achieve either a high or a low mean, or to let the machine generate baseline data. Over more than a decade this basic experiment yielded an enormous database, with a bottom line indicating a small but significant effect of human intention on a random data sequences. A paper describing 12 years of research (pdf) at PEAR, using several different mind-machine interaction experiments, is available.

FieldREG

My job at PEAR was to coordinate the research, focusing on experimental design and analysis. Attention was given immediately to computerizing the REG experiment for security and ease of data processing, and to allow greater flexibility in experimental design. An early proposal was to record a continuously running random data stream, and to use that as a target for intention with a variety of timing and assignment schemes. Such a system was finally developed in the early 1990's, when John Bradish built the first of a series of truly portable REG devices, and York Dobyns wrote software to record and index a continuous datastream of 200-bit trials, one per second, hour after hour and day after day. The "continuous REG" was used as a direct focus for some experiments, with intentions identified in the index, but we also could mark and later analyse data collected while something else was going on in the room -- another experiment, or perhaps a small, intense meeting or group discussion.

Given portable REG devices and newly available laptop computers, we were inspired to take the experiment into the field, running a modified version of the continuous software called FieldREG. The name was a double entendre, since the purpose of the experiment was to monitor something that might be regarded as a consciousness field. The FieldREG experiment did not have an intention, and indeed could be used to gather data in situations with little or no direct interest or attention from people. We looked for situations that might produce a "group consciousness" because people would be engaged in a common focus, resulting in a kind of coherence or resonance of thoughts and emotions. For contrast, we identified other, mundane situations we could predict would not bring people to a shared focus. A long series of FieldREG experiments produced striking, statistically significant results. As in the laboratory, the effects are small, but they have implications of substantial importance to studies of human consciousness, assuming the results represent what we believe they do.

Prototype Global Tests

Other investigators, including Dean Radin and Dick Bierman began doing similar field experiments looking at a broad array of situations, and we set up collaborations. For example, Dean asked some colleagues to collect data during the O. J. Simpson trial, which was expected to garner attention from huge numbers of people. The combined data from several REGs showed an impressive departure from expectation at the time the verdict was announced. Other tests looked at data taken during the Oscars, with segregation of the data into periods of strong and weak interest. Again the difference was significant.

In December 1996 I met by chance two people who were organizing a global "Gaiamind Meditation". This meeting coincided with the developing idea of attempting to register some indication of a global consciousness, making a kind of FieldREG group consciousness experiment in the large scale. The coincidence led me to arrange a collaboration with colleages who could record REG data that might show evidence of a "consciousness field" during the Gaiamind event. The composite of data from 14 independent REG systems showed a significant effect.

This work was a prelude for our attempt to register effects of the world-wide expression of compassion at Princess Diana's funeral in September of 1997, which, coincidentally, was followed exactly a week later by the memorial ceremonies for Mother Teresa. These were prototypical "global events" for the Global Consciousness Project, in that they were the focus of a great deal of attention, and at least in the case of Princess Diana, also occasions for an unusually widespread feeling of shared compassion.


Establishing the EGG Project

In November 1997, at a meeting of professional researchers in parapsychology and psychophysiology, the various component ideas for what ultimately became the Global Consciousnes Project coalesced into a practical form. The technology was becoming available to create an Internet-based array of continuously recording REG nodes placed around the world. This would metaphorically resemble the placement of electrodes on a human head for Electroencephalogram or EEG recordings, though of course the data would not be fluctuating voltages, but randomly varying numbers. The resemblance led Greg Nelson to suggest the network could be envisioned as an "Electrogaiagram", and we began to call it the EGG Project. We later adopted the formal name "Global Consciousness Project" but continue to use an efficient terminology based on the EGG acronym and associations.

Hardware

Three kinds of random sources are used in the project. They all were developed for use in research and all are high quality sources that produce random data meeting stringent criteria. The data are difficult to distinguish from theoretical expectation in calibration runs, although as real, physical devices, they cannot be perfectly random. All use a quantum level process, either thermal noise or electron tunneling for the fundamental source of random fluctuation.

Software

The original software architecture for the project was designed by Greg Nelson, and refined by John Walker. It was well-considered, and has served with little modification since the beginning of the project. The primary operational software consists of two parts. At each of the host sites around the world an REG (or RNG) device is attached to a computer running the "eggsh" or "egg.exe" software (for Linux and Windows, respectively). The software collects one trial consisting of 200 bits each second, and stores the sum of the bits as the raw data. The indexed sequence of trials is recorded in a daily file on the host computer. The computer is connected to the Internet, and sends a packet of data at regular intervals to a server located in Princeton, NJ, running a program called the "basket", which writes the data as it arrives from each egg into a permanent archive. The software is open source and available for inspection.

Host Sites

When one of the qualified hardware random sources is combined with the project software running on an Internet-connected computer, we call the resulting unit an "Egg", hosted by a volunteer contributor. Host computers also run a program that synchronizes their clocks to network timeservers, to keep the independent data sequences synchronized to the second. The early egg hosts were colleagues in Europe and the US. As word of the project spread, people from other parts of the world volunteered to host an egg, and we gradually built a fairly broad geographic coverage. Approximately 40 countries are represented, in most continents, and in most timezones with substantial populations.

Data Archive

At the heart of the research project is the archival database. The raw data are stored in a binary format with header information to identify the specific source and timing for every trial. A web-based data extract form invokes scripts to decode the archive and present the specified data for inspection or analysis in a readable format. A completely normalized and standardized version of the data can be made available for well-defined research and analysis projects.

Website

The development and major features of the project are presented in the GCP website, which is split into two tracks. One documents the rigorous scientific work we do to ensure the quality of the data and the analyses designed to identify and assess any anomalous structure that may appear in the data. The other branch presents a complementary, aesthetic approach to the project, fostering the subjective and interpretive perspectives that we believe are also valuable in efforts to study the subtle aspects of consciousness interacting with the physical world. In addition to the descriptions, the website is presents primary analyses and summaries, as well as access to the data.

Support

The project has been supported from the beginning by generous contributions of time and expertise as well as money to defray expenses. A long list of people are responsible but I would like especially to note the help in various forms from Greg Nelson, John Walker, Dean Radin, Paul Bethke, Richard Adams, Peter Bancel, and Rick Berger. The full list is much longer, and includes the egg hosts as well.


The GCP Experiment

The GCP recorded its first data on August 4, 1998. Beginning with a few random sources, the network grew to about 10 instruments by the beginning of 1999, and to 28 by 2000. It has continued to grow, stabilizing at roughly 60 to 65 eggs by 2004.

The early experiment simply asked whether the network was affected when powerful events caused large numbers of people to pay attention to the same thing. This experiment was based on a hypothesis registry specifying a priori for each event a period of time and an analysis method to examine the data for changes in statistical measures. Various other modes of analysis including attempts to find general correlations of GCP statistics with other longitudinal variables have been considered, and continue to be developed.

Purpose

In the most general sense, the purpose of the project was and is to create and document a consistent database of parallel streams of random numbers generated by high-quality physical sources. The goal is to determine whether any correlations might be detectable of statistics from these data with independent long-term physical or sociological variables. In the original experimental design we asked the more limited question whether there is a detectable correlation of deviations from randomness with the occurrence of major events in the world.

Hypothesis

The formal hypothesis of the original event-based experiment is very broad. It posits that engaging global events will correlate with deviations in the data. The identification of global events and the times at which they occur are specified case by case, as are the recipes for calculating the variance deviations. This latitude of choice makes the original experiment complicated to analyse, but by standardizing the results, we can obtain a composite outcome. This constitutes a general test of the broadly defined formal hypothesis.

Analytical Recipes


The formal events are fully specified in a hypothesis registry. Over the years, several different analysis recipes were invoked, though most analyses specify either the "network variance" (the squared Stouffer Z) or the "device variance" method. Each recipe stipulates how the event statistic is calculated, by first specifying a block statistic within the blocked examination period and then a method for combining these to give an event statistic. Note that the test statistic is a single value representing the deviation from expectation for the whole period specified in the registry. The results table has links to details of the analyses, typically including a "cumulative deviation" graph tracing the history of the second-by-second deviations during the event, leading to the terminal value which is the test statistic. The following table shows the precise algorithms for the basic statistics used in the analyses.


Control Data

It is possible to generate various kinds of controls, including matched analysis with a time offset in the actual database, or matched analysis using a pseudorandom clone database. However, the most general control analysis is achieved by comparisons with the empirical distributions of the test statistics. These provide a rigorous control background and confirm the analytical results for the formal series of hypothesis tests.

Compound Result

Over the six years since the inception of the project, 170 replications of the basic hypothesis test have been accumulated. The composite result is a statistically significant departure from expectation of 4 standard deviations. The combined result from these analyses thus gives support for the formal hypothesis, and this encourages a deeper look, beginning with a thorough re-analysis of the original findings, and proceeding to extensive analysis using other methods.


Sharpening the Focus

The focus of our effort turns now to a more comprehensive program of rigorous analyses and incisive questions intended to characterize the data more fully and to facilitate the identification of any non-random structure. We begin with thorough documentation of the analytical and methodological background for the main result, to provide a solid basis for new hypotheses and experiments. The goal is to increase both the depth and breadth of our assessments, to develop sound interpretations, and ultimately to elucidate the meaning of the original findings.

Critical Assessments

A variety of analyses have been undertaken to establish the quality of the data and characterize the output of individual devices and the network as a whole. The first stage is a careful search for any data that are problematic because of equipment failure or other mishap. Such data are removed. With all bad data removed, each individual REG or RNG can be characterized to provide empirical estimates for statistical parameters. These are used to convert the database into a normalized, completely reliable data resource to facilitate rigorous analysis. The intent is to lay the basis for an assessment of the multi-year database with sophisticated statistical and mathematical techniques. We then can use a range of statistical tools to look for small, but reliable changes from expected random distributions that may be correlated with natural or human-generated variables.

Acceptable Events

A major effort was made to identify the "formal" events that could be accepted according to rigorous criteria. This resulted in a set of 170 usable events over the first 6 years of the project. A total of 13 events that were originally in the formal series were excluded because they were partially redundant or overlapped others, or were not unambiguously defined in the original narrative hypotheses.

Real Devices vs Theory

Ideally, the trials recorded from the REGs distribute like binomial [200, 0.5] (mean 100, variance 50). But although they all are high-quality random sources, perfect theoretical performance is not the case for these real-life devices. A logical XOR of the raw bit-stream with a fixed pattern of bits with exactly 0.5 probability compensates mean biases of the regs.

Normalized and Standardized Data

After XOR'ing, the mean is guaranteed over the long run to fit theoretical expectation. The trial variances remain biased, however. The biases are small (about 1 part in 10,000) and generally stable on long timescales. We treat them as real albeit tiny biases that need to be corrected by normalization for rigorous analysis. They are corrected by converting the trialsums for each individual egg to standard normal variables (z-scores), based on the emprirical standard deviations.


Re-Analysis of the Event-based Experiment

The normalized and standardized data resource allows us to to a rigorous re-analysis of the event-based experiment This was the primary analysis approach for the first few years of the project, and it generated sufficient evidence of anomalous correlations to justify deeper analysis, and more general correlation strategies. In this approach, "global events" are identified and hypothesis that specifies a time period and an analysis recipe is registered. The analytical results are combined into a cumulative, or aggregate, assessment of the hypothesis of correlated departures from expectation.


New Analyses: Extensions and Explorations

The background of careful preparation for rigorous analysis can be envisioned as a conversion of the GCP database to a "data resource" which can be examined with power and flexibility. As we proceed, new materials will be added to this page. The following excursions are examples of what can now be done with some facility. Some provide deeper understanding of previous work, others give new perspectives and insights. We have developed a number of questions that are capable of informing us deeply about the nature and quality of the evidence. As we proceed, we expect to have many cases that, in Peter's term, "will require a lot of mulling," but can learn much from the ability to visualize the data in different ways.

Long Trends and Correlations

The rigorously normalized and standardized data resource can be used for a wide variety of completely general analyses that are not constrained to the event-based protocols. For example, we ask whether there is any significant large scale structure with questions addressing long trends and correlations in the full, six-year database.

Presidental Poll Example

If there are long trends, we can in principle expect to find correlations with independent external measures. Among the possibilities are sociological data. For example, we compare GCP network variance with perceptions of presidental performance measured in polls that ask the question: "Do you approve or disapprove of the way the President is handling his job?"

Splitting the Data

A strong test of the hypothesis that there is structure in the data can be made by determining if the same trends and patterns are found in independent subsets. We look at this by splitting the data into alternate seconds.

Sliding the Event Time

Here we look at the aggregate event zscore when the event examination periods are shifted uniformly in time. The question is what happens to the evidence for anomalous deviation associated with an event as a result of sliding the event periods over the dbase in 1/2 hr steps and recalculating the aggregate Z at each step.

Impulse Events, Quakes, Distance

Examination of "Impulse" events shows structure and includes indications that the response may begin a little early. Here we look at events defined for the formal hypothesis test series, and compare them with a new database of arguably similar events, namely big earthquakes. They show remarkably similar trends, even though there are differences related to the choice of statistics. This page also include some modeling to assess the effect of geographic distance on pair correlations.

Earthquakes, Population, Locality

Large earthquakes occur with sufficient frequency to allow assessment of the relationship of GCP effects to the distribution of the Eggs. They provide an opportunity to consider the question whether the effects are nonlocal in the strong sense, and whether the proximity of people affected by the earthquake is an essential contributor to the effects.

Assessing the Effect of Blocking

The earliest analysis method was a hand calculation using 15-minute blocking of the data. In this method the composite Z for each egg is computed for each time block in the event. The sum of the resulting Zē values is a Chisquare with degrees of freedom equal to the number of blocks times eggs. The early procedure was replaced for most events by a "standard analysis" using the raw data with no blocking. But an obvious question was what effect the various blocking levels might have on the outcome. One form of the question is, "What is the optimum blocking level?" Here we begin to look at this question in a rigorous and comprehensive way.

New Year Celebrations

One of the most interesting recurring events that we have examined is the New Year transition. We have made a hypothesis each year since 1998/1999 that the period around midnight on New Years eve will show structure in the network variance -- the squared Stouffer Z across eggs. Beginning in 1999/2000 we also examined the device variance -- the sum of squared Z-scores per egg each second. These analyses accomodate the moving locus of the New Year celebrations by doing a signal average across time zones. The data resource allows a much more facile exploration of the question whether the New Year Variance Analysis shows structure.

Analysis of Periodic Variation

Fourier analysis gives us a general answer to the question whether there is any indication of periodic structure in the data. We wish to know, for example, if there is any diurnal variation suggesting differences corresponding to time of day, or if there are any longer term effects associated with the day or the week, etc.


GCP Home