The interegg correlation work is difficult, and requires discussion of
methodology and interpretation. This page will provide access to some of the
content of our exchanges, as background for the primary results.
Date: Sat, 22 Apr 2000 08:10:50 0400 (EDT)
From: rdnelson
To: Matthew.J.Salganik@ccmail.census.gov
Cc: Doug Mast , York Dobyns ,
Roger D. Nelson
Subject: Re: InterEgg correlation
On Fri, 21 Apr 2000 Matthew.J.Salganik@ccmail.census.gov wrote:
> Here is Doug's reply to my email. I think he left you off the cc list
> accidentaly. I am going to mull this over some this weekend. Take care.
>
Hi Matt,
Thanks for the copy. I will copy this note to Doug too, and
to York, with whom I talked about it a bit. Doug's response
is somewhat similar to the observation I made about the comparison
of the synchronized and control counts  both should be
affected in the same way by any general nonindependence problem,
and the empirical results actually show opposite deviations
from expected counts.
One thing that would be helpful is to know exactly how the
array of correlations was constructed. Doug says
"The method is then to compute Pearson correlation coefficients
between the signals from all possible egg pairs, over a
large number of oneminute intervals." That seems clear,
and I read it to mean that if we have eggs A B C, there
will be correlations AB AC and BC. The question is
whether these are independent, and the concern is, if AB
and AC are strongly correlated, then BC must be too. On the
null hypothesis, there are no correlations and AB AC and BC
are indeed independent, but if there is an influence that affects
the eggs in a common way, then we should expect the worrisome
nonindependence, but for precisely the reason that there is
an anomalous external agency. Then the question becomes, can we
legitimately count all three excess correlations?
(York suggests that one approach would be to count all
correlations to a single egg, i.e., count AB and AC. One
could then repeat this with B as the pivot egg and again
with C, etc. Each such set would be a separate independent
estimate of the interegg correlation, whose average would
be a wellqualified estimate using all available data.)
Assuming the control pairs are constructed in the same way as the
synchronized pairs (the same offset is used for each set of
correlations, that is, a "pseudosynchronized" set is created from
pairs with a common offset) they constitute a proper comparison set
in which the effect of nonindependence is exactly the same, so
we can be certain that differences between synchronized and control
counts are not affected by the possible nonindependence.
Furthermore, the empirical counts show an excess for synchronized
and a deficit for the control pairs, which strongly indicates that
the method of counting all significant correlations is not
contaminated by any effect of nonindependence.
Since it does seem that there is, however, some nonindependence,
as described, it is worth considering why it doesn't seem to
create a problem in the counting method. My guess is that
the countable correlations are distributed more homogeneously
than would be suggested by our image of how an "effect"
should work. We envision a minute in which the effect
impinges on all the eggs in the network, thus resulting in
correlations among a given set of eggs, but this may be an
incorrect picture. Instead, we may be seeing a single large
correlation here and another there, and finding an excess
of these otherwise unrelated correlations in different sets of
synchronized eggs. Any nonindependence, in this view, would
constitute a secondorder effect that is too small to observe,
and too small to affect Doug's measureable.
It is interesting to think about this, but a formal
assessment is, alas, beyond my capacities. I am copying
this to York for his information, and hoping he may comment.
Roger

[Doug's response to Matt's inquiry]
> Hi Matt,
>
> Thanks for your comments.
>
> > One question I had was about the independence of the correlation values.
> > For example if we have three data streams A, B, and C and we calculate the
> > three pvalues of the correlation of the possible combinations of the three
> > (i.e. pvalue for A cor B, A cor C, and B cor C). I am wondering if these
> > three pvalues are independent. I mean if A correlates with B and A
> > correlates with C then B probably correlates with C.
>
> I don't really know. Your statement makes some sense. But then, the
> intercorrelations of the egg data (the raw, not the chisquared data)
> closely match the theory. For example, one tenth of the signal
> pairs should have correlation coefficients above the threshold for the
> 0.10 significance level, and the actual fraction of correlation coefficients
> above that threshold (as seen in the tables for the synchronous and both
> control runs) is 0.10, within + 0.01% in each case.
>
> So, although I can't rigorously prove that the independence assumption
> is justified, the empirical data suggest that the interegg correlations
> are close to independent (or at least that any nonindependent effects
> average out in the long run).
>
> Cheers,
>
> Doug.
You can return to the interegg correlation page by clicking
here.
