The Spottiswoode and May Criticism

James Spottiswoode and Ed May wrote a paper shortly after 9/11 criticising the GCP analysis. This is one of the most quoted criticisms of the Global Consciouness project, but there are some problems with it that should be noted. While they are smart and generally conscientious analysts, they made in this case some serious mistakes, especially of interpretation. Over the years we have occasionally talked about these issues, but without resolving our disagreement. Here I present some material from a discussion forum where in October, 2007, James once again made claims based on that earlier, flawed analysis. I include also some of the preceding comments in the discussion, in the usual email format with most recent posts first.

The post that started the thread was about otherwise unrelated claims various people have made that the stock market "predicted" the 9/11 disaster. But it also included a shot at the GCP which engendered substantial discussion:

Date: Mon, 08 Oct 2007 09:47:45 -0700
From: James Spottiswoode
Subject: Re: psi and 9/11?
I have some interest in checking this since, allegedly, the purchasing of puts on American Airlines was unprecedented in the days preceding 9/11. Do you have put volume for AA through that period and for a year or 2 before for stats? If so, I would happily run some analyses.
I personally thought that the original analysis with a peak on 9/2 was completely unconvincing but maybe if the data is weekly the 9/2 point represent the week leading to 9/11, rather than the preceding week. What exactly does the data point date mean?
In terms of psi I find fudging the time to be completely invalid. That is one reason I thought the GCP 9/11 result was so unconvincing. See: http://www.jsasoc.com/docs/Sep1101.pdf
On the other hand, for someone "in the know," buying puts on AA over the days leading up to 9/11 is exactly what you would expect to happen.
James

We begin with Roger Nelson's response on Oct 11 2007. If you are interested in reading the intervening discussion, the major posts are include below. It may be best to read the posts in order which means bottom to top (dates are provided for clarity).

Roger Nelson 11 Oct 2007

James wrote, a few days ago: "In terms of psi I find fudging the time to be completely invalid. That is one reason I thought the GCP 9/11 result was so unconvincing. See: http://www.jsasoc.com/docs/Sep1101.pdf"

James labels this in a later post a "peremptory rejection" and hopes it will lead to some discussion. It has done the latter, and I like that. But this "peremptory rejection" is at least disingenuous, certainly misleading, and since James has been told many times how to distinguish the formal trials (and the formal specification for analysis of data on 9/11) from informal, it seems perfectly reasonable to call this "fudging the time" remark to be something like the damned lie linked to certain descriptions of statistics.

I understand perfectly well the demand for a hypothesis-based experiment any physicist could in principle repeat, but it is ridiculous to say the GCP does not present information that would in principle allow that. We are quite clear on what is done in formal tests of a local hypothesis for each event we select (see Dick Shoup's comment), and indeed it is possible for anyone who wants to do so to access the data to test his/her own local hypothesis. What apparently galls James and possibly others is the selection part -- how should we go about selecting an event? That is a tougher problem than physicists typically confront, to be sure. Just as James decided on a hunch to look at the times before earthquakes, we must decide on a hunch what kind of event is worth looking at in tests of possible correlated deviation in GCP data. Long since, at least for those of us much involved, the hunches have been educated. And we know (as others have commented) a great deal more by now about how the number of people and the distances between REGs affect the issues. And (though some of us, including, I think, Peter have not gotten there yet) we know about other factors on a different plane of measurement difficulty, such as how much compassion is evoked or involved, and what the valence of the emotion is.

These are hard matters to deal with, but psychologists and other social scientists have been working with some success for the last few decades to do some encoding and measurement in these dimensions. This is an area of great interest for me, and while it is not something most physicists are trained to think about, my guess is that little real progress in the psi business can be made if we allow ourselves to believe that, for example, a "psi field" (if there were to be such a thing) will be defined by a 1/r^2 rule. That kind of assumption is risky if not plain foolish. (Again someone else has so commented -- see York Dobyns' remarks.)

I am traveling, but have had a chance to read much of what has been said in this thread. No chance to respond in a deeply thoughtful way, but the "Treppenwoerter" I wanted to speak were ruining my sleep, so I decided to get a few lines into the discussion. Here are a couple more points:

I am not fishing. [This was one of James' critical claims about what the GCP does, and it appeared to be echoed by Peter Bancel, who does sophisticated analysis of GCP data. But see his comments.]

Peter may be [fishing], but the well defined "formal series" of tests constitute my effort to repeatedly test the hypothesis that there is correlation between deviations from random behavior in the data and selected global events. I really don't mind someone preferring to call this a conjecture, but I do mind the effort being rejected as fudging or data mining or fishing. I also am engaged in explorations, including some that are specifically intended to educate my "hunches" and ultimately to lead to better understanding of the psychological and social factors alluded to above. Something I have not discussed except when asked, is that my desire to learn is such that some of the events I choose to specify for the formal series are ones my "hunch" facility says are likely to be losers -- that is weak, null, or backwards, and hence likely to subtract from the accumulation of positive evidence for the general hypothesis/conjecture of a connection with consciousness. I have enough confidence by now in the data and in the evidence base that I feel it is OK use the formal structure (which allows unambiguous statistical evaluations) to learn something about valence, and compassion, etc., and, needless to say, N of engaged people.

Another point: We know that the average effect size in the formal tests is small, about Z = 0.33, and as Peter is careful to point out, this means that no individual event or local hypothesis test is adequate to tell us about the possible correlations. But to jump from that to the notion that it is all about averages is wasteful. September 11 2001 is not your average day -- in the world or in the GCP data. It is hard for me to credit the clarity of anyone who has actually looked at the GCP data for 9/11 and a sampling of other days who does not see structure in the data on that particular day. Just a glance at the autocorrelation of a simple device variance measure on 9/11 compared with 60 surrounding days (or 400 days) reveals a spectacular difference. It is even more difficult to understand how the fact that this comparison and several others that also show 9/11 to be unique in the database are "post hoc" rather than in service of an a priori hypothesis troubles any sophisticated observer. Deeper examination of data that a hunch suggests may be interesting must be done, and it is easy to discover that it usually is done in good science. Look at James and Ed's conclusion that the 5 sigma result in their big pre-stimulus response experiment is an experimenter effect (or DAT). If I heard it right, that is based entirely and exactly on post hoc analysis.

I am pleased that James provoked some discussion. Not sure there is any progress, but maybe these are steps in that direction.

Roger


On 10 Oct 2007 James Spottiswoode wrote:

> Peter,
> 
> Thank _you_ for your interesting reply which resolves all my doubts.  As 
> you point out the problems all go away if (2) is not (yet) part of the plan.
> 
> It is also refreshing to hear you point out that if the "effect" that 
> the GCP is sensing has no distance fall off one might as well put the 
> RNG(s) on your own desk.  This was also something that was also 
> contentious at the beginning, as I recall, because some argued that the 
> effects must be universal and distance independent (and "global") but, 
> as we agree, if that were the case one could put them anywhere including 
> all conveniently in the same place.  Also the whole concept of "field 
> REG" is moot without there being some 1/r^x type dependency.
> 
> In terms of a psi response to large scale effects I have found something 
> that might be of interest here.  As you may know I have published a few 
> papers on geophysical correlates to psi performance.  In looking for 
> some physical environmental correlate to psi I assumed one would have 
> the best chance of detection if one correlated against the largest 
> possible amount of data with the highest possible effect size.  To this 
> end I have assembled over the years a DB of all the free response psi 
> experimentation (mostly Ganzfeld and remote viewing) that I could find 
> trial level data for.  N ~ 3,000 and includes the PEAR RV,  Ganzfeld 
> from Honorton, IfP, Dick Bierman and many others, remote viewing from 
> SRI, SAIC etc.  I have found that there is a tiny, but statistically 
> significant (Z ~ 3), negative correlation between trial ES and global 
> seismicity over the preceding few days.  The correlation doesn't appear 
> to be due either to local earthquakes (local to the trial that is) or to 
> be limited to earthquakes sensed by humans.  In other words if you 
> remove earthquakes sensed by people leaving only those small enough or 
> remote enough not to be sensed, the correlation remains.  
> The correlation is stronger for shallow earthquakes.  I know you GCP 
> folks have looked at individual large quakes that affected people but 
> have you looked at global seismicity?  If you are interested I can point 
> you at the database needed and give calculation details.
> 
> With such a low Z one might reasonably think that this  correlation is 
> simply my own example of (over) fishing.  After all it would not have 
> been hard to spend  some time correlating my data set against rainfall, 
> wind speeds, stock market values and yada yada yada.  In fact I did not 
> do this looked at seismicity based on a hunch.
> 
> I have tested a number of ideas about what mechanisms might be 
> responsible for this but have come up with nothing convincing 
> yet.  Anyone have any ideas?
> 
> James
> 
> 
> 
> On Oct 10, 2007, at 1:17 AM, Peter Bancel wrote:
> 
>> To James:
>>
>> Thank-you James for your thoughtful comments.
>> It is interesting for me to hear some of the history.
>>
>> I agree completely with what you say.
>> Your example of Faraday's work is helpful because it points out where 
>> the misunderstanding lies, I think.
>>
>> The crux of it is that the GCP is entirely occupied with fishing, good 
>> and proper fishing, as you describe for Faraday's searching for dB/dt.
>> We  - or at least I - are not ready for phase 2).
>> One can interpret the formal experiment as strong evidence for an 
>> effect, or not, depending on one's inclination.
>> However, there's no doubt that a clear and focused hypothesis test is 
>> lacking.
>> (This is why I have not published anything all this time - my 
>> inclination, in fact, is to move beyond the 0-5 sigma regime, but that 
>> may be setting the bar too high.
>> I am writing a paper now only because we feel pressure to do so; 
>> personally I would prefer to wait.)
>>
>> Your criticisms are from the view of (2) - testing a hypothesis. We 
>> are simply not there yet. Recently, I have begun calling the basic 
>> conception of the project a /conjecture/ instead of a hypothesis, 
>> hoping that this term might avoid some of the confusion.
>>
>> I have come to this point of view slowly, after much reflection and 
>> digging in the data. 
>>
>> It'd point out two things: 1. we have made some real progress. 2. the 
>> analysis is more systematic and involved than just checking 10^8 
>> things in an hour on a PC. There would not be any point if it boiled 
>> down to that.
>>
>> My current feeling on the project is that real progress will come by 
>> demonstrating and understanding a distance dependence of the effect 
>> and redeploying the network accordingly.
>>
>> For those interested, I summarize below how I've gone about the fishing.
>>
>> To Suitbert:
>>
>> Principal component analysis is a good way to go since much of the 
>> "signal" does seem to be in second order statistics. 
>> I am currently doing PCA on the data associated with events of the 
>> "formal" experiment. Not surprisingly, the first components do *not* 
>> contain a significant variance excess so it is more complicated than 
>> that. Your suggestion basically assumes that an individual event or 
>> period will stand out statistically. Again, we are not there yet. 
>> However, PCA (and a study of mutual information, which is next up) may 
>> help us understand how the network reacts to our conjectured global 
>> consciousness, thereby leading to refined test statistics. PCA is 
>> closely connected to distance dependent structure in the data - a key 
>> aspect which needs to be understood.
>>
>> -Peter
>>
>> Fishing the data:
>> I begin by examining  the formal event experiment. I assume that the 
>> positive result of that experiment implies some systematic structure 
>> in those data. That is, I assume it is not due to mere chance or 
>> selection. If there is no structure, it is hard to argue that the 
>> significant result is *not* chance or methodological error. So, I 
>> guess that the formal result is a less than optimal "projection" of an 
>> underlying anomalous non-guassian structure. This means I assume there 
>> is something to look for. 
>>
>> I guess that any structure will involve dependence on time, distance 
>> and number (of RNGs). This is ad hoc, but it fits with the GCP 
>> conjecture. For example, if there were no distance and number 
>> dependence, the GCP "network" might as well be a single RNG sitting in 
>> Roger's basement.
>>
>> *Very* briefly, I then identify two statistics which might contain 
>> structure: the covariance of the network mean and the covariance 
>> of network variance. 
>> These are independent and they both have 3+ sigma deviations over the 
>> formal events. (The first of these is what drives the result of the 
>> formal event experiment)
>>
>> I look at these for structure and find that there is a weak distance 
>> dependence and a characteristic time dependence. This structure is 
>> found to be the same for both of these (independent) statistics. There 
>> are many tedious details to all this.
>>
>> Also, I find that the two independent statistics correlate with each 
>> other if I filter on the characteristic time scale mentioned above.
>>
>> In another tack, I look at this structure for two subsets of events. 
>> The subsets are a binary ranking of events as "big or small", mostly 
>> by guessing how many people were involved. Very qualitative. The 
>> result is that the structure in the events is more pronounced for big 
>> events than for small ones, as one would expect from the GCP 
>> conjecture. Again, many details here.
>>
>> These are the chief results as they stand today.
>>
>>
>> On Oct 9, 2007, at 11:50 PM, James Spottiswoode wrote:
>>
>>> On Oct 9, 2007, at 2:03 AM, Peter Bancel wrote:
>>>> I would like to respond to James' comment on the GCP:
>>>
>>> And I had hoped to elicit something interesting with my peremptory 
>>> rejection.
>>>
>>> Unlike you, Peter, I am unfortunately encumbered with knowledge 
>>> having been in on the lengthy discussions here on PDL at the very 
>>> inception of the GCP.  Some of the arguments then were never resolved 
>>> but I still think it's worth an effort to clarify what we all think 
>>> is going on with the GCP.
>>>
>>> A caveat:  I don't have time to bring myself up to date with all the 
>>> GCP work.  I am sorry.   So all comments here are subject to the 
>>> criticism that they're based on an out of date view of the GCP.  I 
>>> apologise if this the case in advance.  If you can point me at work 
>>> which resolves or supercededs my points I'd be very grateful.
>>>
>>> In trying to figure out how to respond, under time pressure as 
>>> always, I'll put in comments below.  But let me preface with a few 
>>> points, none new to anyone here.
>>>
>>> 1) I have nothing at all against fishing, data searching or what ever 
>>> you call it.  Faraday's discovery of EM induction is a perfect 
>>> example - he knew roughly where to look (lots of B and searching for 
>>> E) but had no idea what the beast would look like.  So he fished for 
>>> many days, as his notes show, till he stumbled on the critical 
>>> factor: dB/dt.
>>>
>>> 2) But having found something, in particular something (unlike EM 
>>> induction) which is statistical, or to be more precise since 
>>> everything is statistical, something in the 0 - 5 sigma regime 
>>> (unlike induction),  one has to check that this might not be just a 
>>> fluke particularly these days where one can "look" in say 10^6 to 
>>> 10^8 places, for instance, in an hour at a powerful workstation.  The 
>>> traditional solution, of course, is to set up a precise hypothesis 
>>> based on an apparently successfully fished expedition (1) and check 
>>> that it replicates. (Popper et al).
>>>
>>> My questions about the GCP have centered around how to accomplish (2) 
>>> and whether it has in fact been accomplished.  If you don't think 
>>> that some version of (1) and (2) is what the GCP effort is about, 
>>> Peter, then everything else I write is irrelevant.
>>>
>>>
>>> On Oct 9, 2007, at 2:03 AM, Peter Bancel wrote:
>>>
>>>> The GCP is a long-term 24/7 experiment which asks whether anomalous 
>>>> correlations between group mental activity and the environment can 
>>>> be measured /independently of individual subjects./ The project 
>>>> hypothesizes that the simultaneous attention or emotion of large 
>>>> populations correlates with data deviations from a global network of 
>>>> random number generators.
>>>
>>> This is a restatement of the original hypothesis as discussed on PDL 
>>> many years ago.  I have the same objections now as then.  This 
>>> statement describes itself as a (scientific - what other kind would 
>>> be in context?) hypothesis.  Taking that claim seriously means that 
>>> we are entitled to unpack the statement into an 
>>> operationalisable procedure for testing it.  At least, the ancillary 
>>> statements and knowledge as to how to test ought to be readily 
>>> available and non-controversial.   
>>>
>>> (Analogy: hypothesis is dB/dt causes E.  Can someone please tell me 
>>> how to set up the dB/dt condition in the world and how would I detect 
>>> the claimed E?  And of course, in this case, 
>>> fully operational procedures can be given (and could be given by 
>>> Faraday even in his day) which answer my question).  
>>>
>>> Now in the GCP hypothesis case there are real problems.  What does 
>>> "simultaneous attention or emotion of large populations" mean?  In 
>>> particular what procedures should I use, please,  to determine 
>>> whether this condition obtains in some real world case?  There is no 
>>> answer.  Unpacking this phrase further does anyone know how to 
>>> objectively (that is operationalisably) measure "attention" or 
>>> "emotion" for large numbers of people at once?  And of course what is 
>>> "large?"  None of these legitimate questions have been answered, to 
>>> my knowledge.
>>>
>>> More problems. What is a "data deviation?"  Suppose one was to 
>>> consider the set of possible statistical functions and add to this 
>>> that any  test must also involve windowing the data then at a  
>>> minimum one has a triply infinite set of possibilities (test type x 
>>> start time x end time).  A hypothesis cannot contain a triply 
>>> infinite ambiguity.
>>>
>>> We can escape this misery by discounting the word "hypothesizes" in 
>>> Peter's statement.  Then the statement becomes a completely benign 
>>> recipe for a fishing expedition a la my point (1).  It roughly 
>>> fleshes out an area for enquiry.  But note: (at least some of) the 
>>> problems described here will inevitably resurface later and must 
>>> eventually be faced.  
>>>
>>> Example: Suppose after fishing around in the GCP data a researcher 
>>> notices some deviation in the data around event E, occurring at time 
>>> Te.  Specifically suppose she observed a statistically significant 
>>> deviation of a specific statistical function, say S,  applied to GCP 
>>> data in a specified time window, say [Te - T1,Te + T2], around the 
>>> event E.   The GCP researcher would then presumably like to make a 
>>> hypothesis of the form:
>>>
>>> Whenever events similar to E occur, statistics S calculated from the 
>>> GCP data occurring in  the time windows [Te - T1,Te + T2] will 
>>> deviate from MCE.
>>>
>>> This is all fine of course except for the word "similar."  Folks I 
>>> have argued this with will usually say it means an event with  
>>> "similar levels of global attention or emotion" but this all just 
>>> begs the unanswered questions above.
>>>
>>>
>>>> By focusing on the collective behavior of millions of people, the 
>>>> GCP essentially averages out variables dependent on individual 
>>>> ability, individual intention or individual state-of-mind. 
>>>>
>>>> The GCP addresses psi as an undercurrent of the world, not as a 
>>>> personal experience or engagement. This is an full twist on thinking 
>>>> about psi which can potentially provide a unique, complementary 
>>>> perspective. This is why I find the project immensely intriguing.
>>>>
>>>> The project has other worthy aspects. It is open source. It is a 
>>>> community resource. If you work the database, you appreciate the 
>>>> huge power of having continuous, decade-long calibrated data with 
>>>> temporal and spatial structure.
>>>>
>>>> I make these comments, in part, to acknowledge my gratitude to 
>>>> Roger, Dean and many others who conceived and set-up the project and 
>>>> Roger's herculean efforts to keep it running.
>>>>
>>>> James' comment reminds me how thoroughly misunderstood the GCP 
>>>> really is. This is the responsibility of Roger and myself and we are 
>>>> working hard to correct this situation. The chief misunderstanding 
>>>> is that the network "detects" an event. It cannot. The effect is 
>>>> weak, and results are statistical. In this aspect it is like most 
>>>> other psi research. Yet people persist in analyzing individual 
>>>> events. This Randi style "show me" approach ( if you'll permit the 
>>>> good-natured provocation) leads to no end of confusion. Worse, this 
>>>> idea was promoted by the GCP itself in the early years of the 
>>>> project. That was, imo, a pardonable sin, given that the GCP was 
>>>> navigating unchartered waters. But the result is a lasting confusion. 
>>>
>>> I hope my screed above makes it clear that I do not imagine that 
>>> GCP-type hypotheses refer only to single events.  Quite the 
>>> contrary.  Nor do i imply that the GCP can "detect an event."  But 
>>> the least that must be claimed, I think, is that sets of values 
>>> of statistical functions applied to the output of the GCP REG's when 
>>> those output sets occur at some given time with respect with a 
>>> defined class of events in the world shows a deviation from chance 
>>> expectation.  This is a long winded way of saying that the GCP 
>>> outputs correlate in some way with certain kinds of external world 
>>> events.  If this is not the kind of claim being made then I am really 
>>> at sea.  And this is a more modest claim than "detection", which 
>>> implies causation I think, whereas my interpretation of GCP's claims 
>>> are that they are correlational. 
>>>
>>> Given that we agree that the goal is to find a class of events during 
>>> which the GCP RNG's exhibit some specifiable behaviour, then  this 
>>> dispute about whether GCP hypotheses refer to single or multiple 
>>> events is hopefully moot.
>>>
>>> The problem has always been defining the events.  Suppose it was 
>>> pointed out that whenever the geomagnetic index Kp exceeded 5, say, 
>>> or the DOW lost more than 3% on the day, then the variance of the GCP 
>>> RNG's doubled, say, with some timing parameters for all this given, 
>>> then there would be no argument. A few new sets of data replicating 
>>> any of these hypotheses and we would all agree that something 
>>> immensely interesting was going on and in dire need of explanation. 
>>>
>>>>
>>>> James' comment and the linked paper stem from the mistaken notion we 
>>>> all more or less participated in. James' analysis was not wrong, but 
>>>> the question was.**
     [**Of course with Bayesian hindsight people may have different
     interpretations about the 9/11 data. James also critiqued methods and
     interpretations of post-hoc analyses and that bugaboo, the 0.05 pval
     level, a pox on its hide...]
>>>>
>>>
>>> I think some of the bias towards a single event interpretation was 
>>> due to the nature of the 9/11 catastrophe.  The event looked so 
>>> unique and seemed to fit so well the bill for "globally engaging 
>>> event" as people put it that a lot of attention was focussed on this 
>>> single event.   
>>>
>>>> What the GCP does find is that, on average, anomalous correlations 
>>>> in the data appear at the time of major world events. The 
>>>> correlations are not strong enough to been seen or tested for 
>>>> individual events. At least not as we currently do things.
>>>>
>>>> I'd like to ask the PDL group to take note. It would help the GCP 
>>>> enormously if we could correct this misunderstanding.
>>>>
>>>> As for 9/11, it has brought considerable attention to the GCP, but 
>>>> I'm not sure the publicity was worth the ensuing confusion...
>>>>
>>>
>>> Quite.  Also there have been a few cases where the GCP stuff has been 
>>> presented to a broader audience as the best psi research has to 
>>> offer.  I recall Dick (sadly no longer on PDL) saying something like 
>>> this to to Gerard 't Hooft.  What a disaster!  We have 
>>> a smorgasbord of well executed lab experiments with crystal clear 
>>> hypotheses involving simple measurements, such as counting hits and 
>>> using obvious basic statistics for their analysis.  Describing the 
>>> GCP work as our best example of something anomalous seems to me self 
>>> destructive.  It reminds me of when Feynman came to the PA in Huston 
>>> and was taken to a spoon bending party.  I spent some hours with him 
>>> over the next few days trying to convince him that the nonsense he'd 
>>> seen at the party was not everything that the field did, that there 
>>> were serious careful experimenters who tried hard to tear holes in 
>>> their own work before presenting it, and generally that there were 
>>> rational grounds for thinking that there was something going on worth 
>>> worrying about.  I think I succeeded. 
>>>
>>> Thanks for reading this rather disorganized response,
>>>
>>> James

GCP Home