Inter-Egg Correlation |
|
An alternative to the prediction-based analysis of the EGG data is based on the reasonable assumption that if an effect results in deviations at a specified time, there should be some non-zero level of correlation between the eggs at that time. A general procedure for determining whether there is any effect on the Eggs from global events, even those which are unknown or insufficiently newsworthy to provoke a GCP prediction, is to examine the intercorrelations across the eggs. Given that they are independent sources of data, and widely dispersed so that no ordinary local perturbative forces could simultaneously and similiarly affect them, there should be no tendency for the correlation matrix to show anything but chance fluctuations. On the other hand, if we see a tendency for correlation among the meanshifts or the corresponding Chisquare values generated by the individual eggs, this indicates a global source of anomalous effect in accordance with the GCP's general hypothesis. An early attempt was made to explore this possibility looking at a single day's data. Although this effort appeared promising, the correlational approach needs the power of large amounts of data to mitigate the extremely small hypothesized effect. Doug Mast has developed an analytical approach and software that is capable of assessing the intercorrelations among the eggs across months and years of data. In earlier work using these procedures, there were interesting results, but some concern about possible non-independence of multiple measures. This approach, documented as Version 1, looked at the counts of correlations at various levels of significance, and showed promise of identifying significant intercorrelation. It led Doug to further creative effort to establish a fully robust analysis. He says:
Clearly, the data for 2000 do not show the same phenomenon found in that of 1999. If the effect size [z/sqrt(N)] were constant, all the entries for 2000 would have astronomically small p-values, since N was much larger. So it appears that either any real effect for 1999 didn't occur in 2000, or else the intriguing results for 1999 were a fluke abetted by my semi post-hoc analysis (i.e., I designed the refined method after gaining some experience with the 1999 data). I don't know what to conclude, except for a big "hmmmmmmmmmm," which I guess is often the case in this business.. I responded with a suggestion for a comprehensive look, computing an additional row for the table using an unweighted Stouffer Z to combine the two years' data for an overall look at the question. This yields a result that remains significant at the two-sigma level:
From this perspective, which is one way to look at replication of subtle phenomena, the case for an intercorrelation of synchronized eggs is pretty good, with probability on the order of a few parts in a thousand. Doug responded with a more robust calculation: "To combine the two years for an overall result seems natural. I think it's better, though, to do the full analysis for both years' data (i.e., directly compute N(|r|>r0) and Sum(|r|) for the full two-year data set, and analyze using the appropriate analytic and control PDF's for the larger N). The results are given below on the lines "1999 & 2000 (TDM)".
The numbers from the comprehensive calculation in the last line above should be considered more accurate than the Stouffer-Z numbers. They still show an arguably significant effect, though not as significant as the calculations for 1999 alone. Of course, the effect size for the full data set is much smaller than that indicated by the 1999 data alone. The question may be asked whether the results for 1999 are scientifically valid evidence, given that the tests were designed based on the experience of the previous investigations. In this context, our discussion of statistical designs and procedures has continued, and addresses in depth certain issues such as the definition of a proper "test" of the explicit and implicit hypotheses. This technical discussion will be of interest to anyone who is concerned with the viability of the evidence for inter-egg correlations. Graphical RepresentationsThe results for the two years can be visualized graphically. In the following figure, the running or cumulative mean of the Z-scores for each day is shown as it develops over time. Plotted on the y-axis are Z-scores for the sums of |r| up to each day on the x-axis. That is, if there were N_day total trials from 1 Jan 1999 through the current day, the value plotted would be
where mu and sigma are the theoretical mean and standard deviation of Sum(|r|) for N_day trials. The expectation value of Z is zero; a constant effect size would yield a curve going as sqrt(N); and a constant p-value would be a horizontal line (I think). The end point is the overall Z score (Z for N_day -> N_final).
These data can be displayed in a number of different ways, of course. For most of the event-based analyses, we use a format that accumulates the deviations from expectation across seconds, minutes, or other blocks of data. This is done by converting the mean of the block of data to a Z-score ((Mean - Mu)/Sigma), where Mu is the expected mean and Sigma is the standard error for the mean. The resulting Z is squared, to give a positive, Chisquare-distributed quantity, with one degree fo freedom (df). The Chisquares are additive, and a composite Chisquare for an event or a series of events is obtained by summing them. Graphically, we can display the progressive accumulation of evidence by drawing the cumulative sum of (Chisquare - df), which has the form of a random walk around expectation zero (plotted here as a horizontal line) if there is no consistent deviation. If there is an effect, however, the random walk will have a trend away from the expectation line. The following figures show the correlation data treated in this way. Doug provided the data for each day of the two years, in the form of the total number of correlations, the number of cases where the absolute value of the correlation was greater than expected (N(|r|>r0)), and the sum of the absolute value of the correlations for each day (Sum(|r|)).
|