Correlations in Continous Parallel Random Sequences

This page is a working document and is not to be quoted or referenced

Last Update 30 Jan 2005


Introduction

The Global Consciousness Project (GCP) maintains a world-wide network of random event generators from which synchronized, continuous sequences of data are collected in a central repository. The network has grown to include more than 60 sites around the world, each generating and reporting second-by-second data. The map below shows the global network as of mid-2004. The development and major features of the project are presented in the GCP website, which is split into two tracks. One documents the rigorous scientific work we do to ensure the quality of the data and the analyses designed to identify and assess any anomalous structure that may appear in the data. The other branch presents a complementary, aesthetic approach to the project, fostering the subjective and interpretive perspectives that we believe are also needed in efforts to study the subtle aspects of consciousness interacting with the physical world.

egg host map
The Global Consciousness Project network of REG devices (eggs). The array covers most of the world, with greater concentration in Europe and the USA. Host sites are indicated by bright spots.

A Scientific Focus

The GCP Hypothesis

The formal hypothesis of the original event-based experiment is very broad. It posits that engaging global events will correlate with deviations in the data. The identification of global events and the times at which they occur are specified case by case, as are the recipes for calculating the variance deviations. This latitude of choice is one reason why the experiment is complicated to analyse. Later, we will compare the recipes, and examine them separately, but by standardizing their results, we can combine them to obtain a composite result. This constitutes a general test of the broadly defined formal hypothesis. The combined result from these analyses gives support for the hypothesis, and this encourages a deeper look.

Experiment and Analysis

We begin with an overview of the GCP as an experiment designed to test a formal hypothesis. A brief description of the original experiment is supplemented with links to detailed specifications for research protocols and technical equipment. Although the network has grown larger over the years, the data acquisition system has been stable, resulting in a large corpus of data recorded in a consistent format. The archival database has been examined mainly by way of event-based probes designed to test the original hypothesis. The focus of our effort turns now to a more comprehensive program of rigorous analyses and incisive questions intended to characterize the data more fully, and to facilitate the identification of any non-random structure.

The first stage is a careful search for any data that are problematic because of equipment failure or other mishap. Such data are removed. Each individual REG or RNG is characterized to provide empirical estimates for statistical parameters. These are used to convert the database into a normalized, completely reliable data resource to facilitate rigorous analysis. The intent is to lay the basis for an assessment of the multi-year database with sophisticated statistical and mathematical techniques.

Using the resulting standardized database, we can proceed to a rigorous re-analysis of the original experimental series. This will be augmented by a broad range of analyses intended to increase our understanding of the outcomes in the basic series of formally specified events. The significant results of the originally specified analyses will be extended by an array of independent analytical perspectives that provide deeper insight into the apparent structure in the nominally random data. This is a work in progress, but we have in hand at the time of this writing, January 2005, substantial evidence of generalized, long-term anomalous structure that is in itself remarkable. We have made some progress toward understanding it by looking for correlates that point to the source of the departures from expectation. Much remains to be done, but, for example, we now have evidence for correlation with some external, independent "social indicators" such as trends in polling data. Given the rigorously vetted database, we can proceed with a variety of well-defined correlational studies of this nature, as well as more flexible data mining explorations.

To the extent they reflect real phenomena that are not accomodated in theoretical models of the physical world, the GCP data can have a role in expanding our conceptions. One of our goals for the current analytical effort is to make the work transparent for other scientists. There are many important questions to address, and we will profit from independent perspectives on what we are doing. Ideally, we hope to interest other scientists in helping with the considerable task of understanding and modeling the effects we see in the data. The project is designed to be completely open for inspection and collaborative examination of the data and results.

The Original Experiment

The GCP began recording data August 4, 1998. and over the intervening period a series of tests have been made of the experimental hypothesis that the data may show structure that is correlated with major events in the world. The original design for the experiment was based on a hypothesis registry specifying a priori for each event a period of time and an analysis method to examine the data for changes in statistical measures.

After six years, during which we have accumulated 170 replications of the basic hypothesis test, the composite result is a statistically significant departure from expectation of 4 standard deviations. This has encouraged us to undertake an extended analysis of the project's methods and results. The first goal is to document the analytical and methodological background of the main result, as a solid basis from which we can frame hypotheses and experiments that will help interpret these measurements and elucidate their meaning.

A variety of analyses have been undertaken, first to establish the quality of the data and characterize the output of individual devices and the network as a whole. This information is used to normalize and standardize the data, and to identify and remove any bad data. We then can use a range of statistical tools to look for small, but reliable changes from expected random distributions that may be correlated with natural or human-generated variables. Beyond the simple event-based analysis, there are many potentially useful techniques for data assessment. For example, time series analysis may identify diurnal or other cyclic patterns, or correlations with natural variables such as electromagnetic or geomagnetic field fluctuations. The most intriguing questions may be in the sociological or psychological domain. We can ask, for example, whether economic indicators or calendrical rhythms exhibit any correlations with structure in the random data.

In order to move forward, it will be useful, even essential, to have help from other people with scientific and technical backgrounds. The GCP results are complex, however, and some familiarization is required before one can approach the data intelligently. Thus, a further goal of this analysis is to reduce the effort needed to understand and contribute to the project by presenting a technical introduction to the data and the experimental results. We would like to arrive at a point where various specific hypotheses can be proposed and tested against the database with both flexibility and rigor. Because we know little or nothing about mechanisms or models that might explain the GCP result, it is necessary to take small analytical steps designed to inform a focused search for correlations and patterns.

We begin by looking more closely at the result in hand. Careful study of the event-based experiment will provide guidance for a broader examination of the database. The intent is to make a thorough, multi-dimensional picture of the formal test series, applying several statistical measures and examining effects of factors such as data blocking, spatial relationships, and the dimensions (duration) of the event. With this material in hand we can proceed with testing the data for correlations with independent parameters representing physical, societal, and environmental factors. While we will come to understand the original event-based analytical result much better in this process, we will also be able to decide which statistics and tests will be useful to work with in subsequent work.

The analysis of the event-based experiment is presented briefly below. It should give a good sense for what has been accomplished so far, where further work is needed and where other scientists might be interested in contributing expertise and insight. Key elements of the project are summarized first. Further details can be found elsewhere on the GCP website.


Normalizing the Data

The data are produced by three different makes of electronic random event generators (REG or RNG): Pear, Mindsong and Orion. All data trials are sums of 200 bits. The trialsums are collected once a second at each host site. The Pear and Mindsong devices produce about 12 trials per second and the Orions about 39 (thus 95% of the source data is not collected). REGs are added to the network over time, and the data represent an evolving set of REGs and geographical nodes. The network started with 4 REGs and currently has about 65 in operation. The changing size and distribution of the array contributes to the complexity of some analyses.


Re-Analysis of the Event-based Experiment

The normalized and standardized data resource allows us to to a rigorous re-analysis of the event-based experiment This was the primary analysis approach for the first few years of the project, and it generated sufficient evidence of anomalous correlations to justify deeper analysis, and more general correlation strategies. In this approach, "global events" are identified and a hypothesis that specifies a time period and an analysis recipe is registered. The analytical results are combined into a cumulative, or aggregate, assessment of the hypothesis of correlated departures from expectation.


New Analyses: Extensions and Explorations

The background of careful preparation for rigorous analysis can be envisioned as a conversion of the GCP database to a "data resource" which can be examined with power and flexibility. As we proceed, new materials will be added to this page. The following excursions are examples of what can now be done with some facility. Some provide deeper understanding of previous work, others give new perspectives and insights. We have developed a number of questions that are capable of informing us deeply about the nature and quality of the evidence. As we proceed, we expect to have many cases that, in Peter's term, "will require a lot of mulling," but can learn much from the ability to visualize the data in different ways.

Sliding the Event Time

Here we look at the aggregate event zscore when the event examination periods are shifted uniformly in time. The question is what happens to the evidence for anomalous deviation associated with an event as a result of sliding the event periods over the dbase in 1/2 hr steps and recalculating the aggregate Z at each step.

New Year Celebrations

One of the most interesting recurring events that we have examined is the New Year transition. We have made a hypothesis each year since 1998/1999 that the period around midnight on New Years eve will show structure in the network variance -- the squared Stouffer Z across eggs. Beginning in 1999/2000 we also examined the device variance -- the sum of squared Z-scores per egg each second. These analyses accomodate the moving locus of the New Year celebrations by doing a signal average across time zones. The data resource allows a much more facile exploration of the question whether the New Year Variance Analysis shows structure.

Long Trends and Correlations

The rigorously normalized and standardized data resource can be used for a wide variety of completely general analyses that are not constrained to the event-based protocols. For example, we ask whether there is any significant large scale structure with questions addressing long trends and correlations in the full, six-year database.

Analysis of Periodic Variation

Fourier analysis gives us a general answer to the question whether there is any indication of periodic structure in the data. We wish to know, for example, if there is any diurnal variation suggesting differences corresponding to time of day, or if there are any longer term effects associated with the day or the week, etc.

Assessing the Effect of Blocking

The earliest analysis method was a hand calculation using 15-minute blocking of the data. In this method the composite Z for each egg is computed for each time block in the event. The sum of the resulting Z² values is a Chisquare with degrees of freedom equal to the number of blocks times eggs. The early procedure was replaced for most events by a "standard analysis" using the raw data with no blocking. But an obvious question was what effect the various blocking levels might have on the outcome. One form of the question is, "What is the optimum blocking level?" Here we begin to look at this question in a rigorous and comprehensive way.


E-mail questions, comments or suggestions to Roger Nelson. Project Director

consciousness,   parapsychology, group consciousness, mind, world, global, anomalies, parapsychology, psi, random event, REG, RNG, subtle energy