An alternative to the prediction-based analysis of the EGG data is based on the reasonable assumption that if an effect results in deviations at a specified time, there should be some non-zero level of correlation between the eggs at that time. We examine first the inter-correlations of the actual meanshifts, and then, the inter-correlations of the Chisquare values used in the primary event-based analyses.
A general procedure for determining whether there is any effect on the Eggs from global events, either unknown or insufficiently newsworthy to provoke a GCP prediction, is to examine the intercorrelations across the eggs. Given that they are independent sources of data, and widely dispersed so that no ordinary local perturbative forces could similiarly affect them, there should be no tendency for the correlation matrix to show anything but chance fluctuations. If instead, we see a tendency for correlation among the meanshifts or the corresponding Chisquare values generated by the individual eggs, this indicates a global source of anomalous effect in accordance with the GCP's general hypothesis. An early attempt was made to explore this possibility looking at a single day's data.
Although this effort appeared promising, the correlational approach needs the power of large amounts of data to mitigate the extremely small hypothesized effect. Doug Mast created a set of scripts and analysis functions to examine all the data over long periods. A "first draft" assessment of all data from 1999 was completed in March, 2000. Here is his description:
Date: Sun, 19 Mar 2000 12:54:14 -0500 (EST) From: email@example.com To: rdnelson
These results indicate that any significant inter-egg correlations (for the one-minute signals I defined) are about of the frequency expected by chance. However, several features in the data are intriguing. For the synchronous correlations, the number of high correlations is more than expected at each significance level between 10^-7 and 10^-1. None of these effects have a striking p-value, although those for 10^-1 and 10^-2 are close to a liberal definition of "low." However, the control runs (with signals offset by pseudorandom intervals) seem in several cases to show *fewer* high correlations than expected by chance. When the synchronous and control runs are examined together, the possibility of some real effect seems greater. To gather a little more data, I tried a second control set with a different (but still deterministic) random seed. The results, with the same format as the above table, are below.
Here again, there is a "significant" (p=0.04) absence of high correlations at one level, and no significantly large numbers of high correlations. So this seems to confirm the results of the other control run.
Graphing the Meanshift Correlations
A graph of the Effective Z-Score for the three datasets as a function of the decreasing probability levels visualizes the differences. The Z-Scores for the Synchronized set are shown in red, compared with the Stouffer Averaged Z-Scores for the Control data in Black. I have done a preliminary calculation of the combined "bottom line" that needs to be checked for its appropriateness and logic. The algorithm is (sum((ysynch-yctrl)/2^.5))/8^.5 for each of the controls, which yields Z = 2.4387 and 1.50 for the comparison of the synchronized data with Control 1 and Control 2, respectively. The Stouffer combination of these results in Z = 2.785. This does not account for the non-independence of the counts at the different probability levels. There is something like a 10% overlap of .1 with the necessarily included .01, plus 1% with .001, etc. I don't know how to adjust for this properly, but my guess is that the end-effect of the non-independence would be a relatively small reduction in the apparent combined Z. If the reduction were as large as 30%, the difference would remain significant at the 0.05 level. A rough compensation for the overlap can be made by reducing the Z-scores by the amount which will be counted at the next level. The 0.10 count includes 10% from the 0.01 level, etc. This will give approximately Znew = 0.1111111 x Zorig. Using this correction, the composite Z is reduced to 2.4757, which has a 1-tailed probability of 0.0066.
Assuming the logic for this exploration is correct, and the non-independence penalty is modest, as suggested in the rough calculation of an overlap compensation, the result indicates a clear difference between the Synchronized data and the randomly paired Control data, which is readily seen in the following figure.
We should have some concern, however, that the difference calculation above is driven as much by the low counts for controls as by the high counts for the synchronized eggs. If we look at the effective Z-scores for synchronized eggs alone, compared with expectation 0, the result is much weaker. For all eight levels, the composite Z is 1.301 (1.4641 without the 11% correction). If one ignores the last two levels, which have too few data to give a reasonable estimate, the Z is 1.521 (1.711 without the 11% correction). Given the nature of the data, these more modest Z-scores probably are a better indication of the possible inter-egg correlation effect. [RDN]
It seems like there is some effect here, but any such effect is very small, barely visible even after 75 million trials. Further replication and checking are certainly necessary--an obvious second test will be to do the same analysis for this year's data. If there is a real effect, the cause is a tough question--artifacts in the RNGs may be hard to rule out. For instance, could the RNG behavior be affected by any electrical signals associated with the host computer downloading data from the egg once a second? [ regarding equipment artifacts, Roger Nelson replied: Re artifacts, I don't think there is anything likely from host computer electricals. The design precludes this, and we have been doing both calibrations and acute "influence" testing of em fields, temperature, vibration, sound, etc., on these devices for years without finding any artifactual effects. Of course we can't rule out anything we haven't thought of testing. There are a couple of arguments against artifact -- one, the design includes a logical XOR that guarantees a statistical mean of p=.5. Secondly, the data themselves from your analysis show both high and low counts. I don't have a final answer, but it does not look like artifact can be responsible for the correlation counts. ] Finally, here is another view of the data, which is completely post hoc and poorly justified, but possibly still interesting. We could take the number of significant correlations at each level to be a binary random variable, with a value of 1 for a high number of correlations (score > 100 in the above tables) and value of 0 for a low number of correlations (score < 100). We can further make the dubious assumption that each significance level can be treated as an independent random variable. By this criterion, the synchronous runs give 7 1's out of 8 trials (p=0.03125) and the control runs give 10 0's out of 16 trials (p=0.22725). The apparent significance for the synchronous runs could very well be a fluke--it will be interesting to see if something similar happens for this year's data. I'd love to hear your thoughts about this little study--you can also feel free to share this message with any colleagues who might be interested. I'll be more than happy to answer any questions about what I did. If you're interested in checking or replicating my results, I'd be happy to share my codes with you (and others) as well.
Correlations of the Chisquare measure
The predictions for the GCP are for non-directional deviations of the means from expectation, and most of the individual event analyses use a Chisquare test, where Chisquare is composed of the squared, normalized meanshifts. The following tables show the inter-correlations of Chiquare measures in a format similar to that used above. Here is Doug Mast's description:
Date: Fri, 14 Apr 2000 18:30:44 -0400 (EDT) From: firstname.lastname@example.org To: rdnelson
There doesn't seem to be a clear effect jumping out at me here. At the "10^-6" through the "10^-8" levels, it's starting to look like a trend, with the synchronous data having more "significant" correlations with r > 0 and fewer with r < 0. But obviously, as I mentioned before, the "significance levels" no longer correspond to the true PDFs of the correlation coefficents. If we can determine the PDF of the "chi-square" correlation coefficient, then we could determine whether there really is an apparent trend or not.
Graphing the Chisquare Correlations
The following graph shows the difference between the counts of significant
correlations for synchronized vs control pairs of eggs. There are, as noted,
more positive and less negative correlations in the synchronized
data than in controls. I am uncertain that the estimate for the standard
deviation (used to scale the graph) is statistically correct, and hence do not feel a
parametric test of differences is appropriate.
A non-parametric Wilcoxon Signed-Rank test gives a two-tailed
probability of 0.078 for the excess of positive correlations, 0.039 for the
deficit of negative correlations, and p = 0.019 for the
difference between the positive and negative correlation counts,
providing further evidence that there may be a generalized effect of
correlation among the synchronized eggs.
This work is new, and there are issues such as the effects of possible non-independence and the computation of appropriate error estimates which need deeper consideration. Several people are involved, and part of the exchange is by email, allowing us to provide access to the discussion.