Introduction The Global Consciousness Project had been running continuously since August 4, 1998. After six years of testing the global consciousness hypothesis, a highly significant result of 4 standard deviations has obtained. This success has encouraged us to undertake an extended analysis of the project's results and methods. Our goals are to place the main result on a sounder analytical and methodological footing, as well as to guide the project towards framing hypotheses and experiments that can help to elucidate the meaning behind these measurements. In order to move forward, it will be useful, even essential, to have help from people with scientific and technical backgrounds. However, the GCP results are complex and some familiarization is required before one can approach the data intelligently. Thus, a further goal of this analysis is to reduce the effort needed to understand and contribute to the project by presenting a technical introduction to the data and the experimental results. We would like to arrive at a point where a number of specific hypotheses can be proposed and tested against the database. Because we know nothing(!) about mechanisms or models that might explain the GCP result, it seems premature to examine the database outright for patterns or correlations that might point to anomalous results. Instead we begin by looking more closely at the result in hand. By studying the event-based experiment we hope to provide guidance for a subsequent, thorough look at the database. That will set the ground to test the data for correlations with independent parameters based on societal or environmental factors. But first, we need to decide which statistics and tests will be best to work with. The analysis of the event-based experiment is presented briefly below. It should give you a good sense for what has been accomplished so far, where further work is needed and where you might (should you be inclined...) contribute. Key elements of the project are summarized first. These points are probably familiar. Further details can be found elsewhere on the GCP site. Figure: The cumulative deviation (cumdev) of event Z-scores for 170 formally accepted events through September 7, 2004. Standard normal trial scores were used for all event calculations. The cumdev terminal value has a Z-score of Z = 4.02 (pval = 0.00003). The GCP hypothesis The formal hypothesis of the event-based experiment is very broad. It posits that engaging global events will correlate with variance deviations in the data. The identification of global events and the times at which they occur are specified case by case, as are the recipes for calculating the variance deviations. This latitude of choice is the one reason why the experiment is complicated to analyse. The GCP data Overview A few statistics on the database are listed in the table. (Note: the raw data files list times and values of reg output. This doesn't constitute a database in the strict sense of the term. We use database in a looser sense, to refer to all the GCP data through Sept. 8, 2004) Total days 2224.6 Total trials 7.64E+9 Z-score of accepted global events 4.02 Online reg-days 84772 Average regs online/day 38.1 Current regs online 60-65 Total non-null trials 6.85E+9 Total nulls 7.93E+8 Total events 183 Total accepted events 170 Regs The data are produced by three different makes of electronic regs: Pear, Mindsong and Orion. All data trials are XOR'd sums of 200 bits. The trialsums are collected once a second at each host. The Pear and Mindsong regs produce about 12 trials per second and the Orions about 39 (thus 95% of the source data is not collected). Regs are added to the network over time, and the data represent an evolving set of regs and geographical nodes. The network started with 4 regs and currently has about 65 in operation. This contributes to the complexity of some analyses. Figure: Number of reporting regs in the GCP network by device type. XOR Ideally, trials distribute like binomial[200, 0.5] (mean 100, variance 50). But this is not the case for these real-life devices. The XOR compensates mean biases of the regs and the XOR'd data has very good, stable means. XORing does more than correct mean bias. For example, XORing a binomial[N,p] will transform it to a binomial[N, 0.5], so the variance is transformed as well. The Pear and Orion regs use a "01" bitmask and the Mindsong uses a 560-bit mask (the Mindsong mask is the string of all possible 8-bit combinations of 4 "0"'s and 4 "1"'s. Variance bias Even after XOR'ing, the trial variances are biased. The biases are small (about 1 part in 10,000) and generally stable on long timescales. They are corrected by standardization of the trialsums to standard normal variables (z-scores). Mindsong regs tend to have positive biases. This gives a net positive variance bias to the data. Since the GCP hypothesis explicitly looks for a positive variance bias, these are important, albeit small, corrections. The variance biases tell us that the raw, unXOR'd trails cannot be modeled as a simple binomial with shifted bit probability. The sensitivity of analyses to variance bias depends on the statistic calculated. Two typical calculations are the trial variance and the variance of the trial means across regs at 1-second intervals. That is, Var(z) and Var(Z)==Var(Sum(z)), where z are the reg trials and the sum is over all regs for each second. We sometimes refer to these as the device and network variances, respectively. (Note: when using standardized trial z-scores, there is little difference between variances calculated with respect to theoretical or sample means; theoretical and sample variances will be distinguished where necessary). Out-of-bound trials The regs occasionally produce improbable trial values. This is usually associated with intermitent hardware problems. These trials are removed before analysis. All trialsums that deviate by 45 or more from the theoretical mean of 100 are removed and replaced by nulls. See the data preparation page for more details. Rotten Eggs Once out-of-bound trials are removed, the mean and variance of each reg are checked for stability. Sections of reg data that do not pass stability criteria are masked and excluded from analysis. Data from these "rotten eggs" are usually very obvious. Nevertheless, there are cases where excluding data is a judgment call. The current criteria impose a limit that will, on average, exclude 0.02% of valid data (an hour or two of data per year). See the data preparation page. Bad network days On a few days, the network produced faulty or incomplete data. These occurred during the first weeks after the GCP began operation and during hacker attack in August 2001. The days August 11, 25, 31 and September 6 have less than 86400 seconds of data. These days are retained in the database. For the days August 5-8, inclusive, the data consists mostly of nulls for all regs. These days have been removed from the standardized data. Figure: Network variance as the cumdev of Z^2 for raw and standardized data over the whole database. Figure: Device variance as the cumdev of z^2 for raw and standardized data over the whole database. Figure: Two views of bad trial data. Device variance at 1-second blocking with and without bad trial removal. Device variance at 1 minute blocking with and without bad trial removal. Nulls It is fairly common that egg nodes send null trials. Nulls may persist for long times, as when a host site goes down, or may appear intermitently. Nulls do not cause problems for calculations on the data, but they can add to the inherent variability of some statistics. Figure: The plot shows the presence of null trials in the data. Black: number of regs listed at least once in the data; blue: number of regs reporting on a given day; magenta: mean number of regs producing non-null data during day.