Comments on Long Trend Analysis

Comment 050108

Things are starting to look very interesting indeed.

I have been working on two fronts.

The big news is that I have looked at the alternate second data and there is a very clear correlation between the Stouffer z-sqrd cumdev [and its Janus twin, the network variance] for two datasets with alternating seconds of data. That is all odd seconds go to one set and the even seconds go to the other. There is a strong correlation which is exactly what one expects if an anomalous effect is responsible for the structure in the data. [The significance depends on the numbers and I'm calculating empirical distributions for my correlation function. So it will take a day or two to have more precision.]

But it's better than that.

The correlation only comes through for the post-9/11 data. And it is coming from the same structure that we see correlating with the poll results. So this alternate second result is independent of the poll result and also supportive of it ( or visa vera) Anyway it is starting to make a very nice story, since we now have

1. event experiment significant for an effect on short timecsales and connecting to global scales.
2. alternate datasets showing significance on long timescales.
3. poll results connecting long time behavior to global events.

That's the short of it.

The attached html doc has some more details and some more are to come.

The technical point of importance is that I've devised a correlation test capable of detecting correlations in structure [as opposed to the too limited usual correlation coeffcients that miss detailed structure like peaks]. This is sketched in the doc.

The other front is the poll results. I'll use my structure correlation test to look at the data/poll correlation quantitatively. But another very interesting avenue is to study the correspondance by making a mapping from one to the other. I fiddled around a bit and it looks possible, but it will take some work. It would be a nice demo to show how you can generate the structure in the GCP data by a simple transformation of the poll data.

I think we have some red meat.

At this point we should reconsider the late April date for a meeting. This could be the good moment to do it and we could still make it if we move fast. Read over the doc and let me know what you think.

Comment 050109

Here's a little update.

I should have some pval envelopes done a little later in the day, but at the moment it indeed looks like a zscore of 3 for the correlation on post-9/11 alt-sec netvar sets. The z-score for the pre-9/11 will be small, probably less than 0.4. In terms of pvals, the post-9/11 correlation is near .002.

I have also looked at the device variance for the same correlation. There is nothing there. That's very interesting indeed because it helps us in our quest to find "the right statistic". It's looking more and more like the stouffer Zsqr'd aka netvar is a good one. This ties in nicely with the significant result of the event based analyses, which are mostly standard analyses aka netvar. Interestingly, our official NYear variance events measure the device variance and we don't see any effect there. This could be corroborating evidence.

So these are some things we can look at with regard to the altsec results.

I will send you an updated memo with these new results by the end of the day.

When do you want to talk about 'what's next?'

Comment 050110

Here are two revised plots with envelopes updated (tho' I'm still calculating...) Netvarcorr.gif should replace index_gr_5 (give it that name if you want it to load in the memo htm page) NetvsDevcorr.gif replaces index_gr_9.

The basic result is a pval of better than 0.003 for the alt-sec post 9/11 correlation. The Z-score for that is 2.8.

The big picture that is emerging is this: We show correlation between GCP data and a societal metric AND we show independently that the GCP data has non-random structure (via the alt-sec analysis). The icing on the cake is that we can unpack the alt-sec analysis to show the data features that correlate on alternate seconds are precisely the ones that correlate with the poll. So it's a check and mate situation. I think we can show that this hangs together at the 0.001 pval level.

The big things we are learning are:
1. We can test to see if data trends are non-random (incredible!).
2. We can determine what stats capture the effect. For instance is it the netvar or the devvar? (this was the goal of the event-based analysis).

What we simplify for the moment is the possibility that the effect has several aspects (it could be global consc + experimenter, after all...)

So where do we go from here?

First, we should talk. Can we do it today? We need to move quickly if we want to try for a Spring meeting. Also, I need to make some commitments for the next 6-9 months in the next few days. [I delayed decsions when I first saw the poll correlation] What we decide to do for the project effects my choices.

Here's what I'd like. 1. Have an meeting in late April If we want a meeting we should send the analysis memo to Dean (and Marylin?) asap to get them excited and fix the Ions date. If they're ok, send emails to principals and nail it down.

2. Write a paper for FoP A big lesson I learned (the hard way) during my thesis and later during the post-doc at IBM was when one should cut the work and sit down and write a Letter. My gut is telling me big-time this is a cut-and-write case. I'm pretty sure we can get a Letter published. This is also an excellent preparation for the April meeting. It will also help loads for funding requests, so best to get it in the pipeline now.

3. Find some money so I can put time into the analysis. Eternal problem but I'm a pumpkin without some revenue.

Some immediate next things to do:

1. Calculate the correlation of netvar and presidential poll data.

2. See if the correlation for alt-secs works on shorter timescales : look at 9/11. If this is so we have independent evidence that the strong 3-day deviation after 9/11 was not merely an extraordinary chance fluctuation. That would be a substantial result.

3. Check the correlation for alternate minutes of data, instead of alternate seconds . [This will be a nail-in-the-coffin for "inherent electronic autocorrelations in the devices"-type arguments against anomalous interpretations. Actually, there is a good story with several parts to destroy those objections]

These 3 are all quick to do.

There are important and obvious further tracks to take. But most of them could potentially get bogged down and take considerable time to get right.

One priority direction is to look for another metric like the poll data. Another is to look for a better stat than the netvar. [actually, I suspect that a measure of the average reg pair-correlation is the underlying statistic. This is a major component of the netvar...] But I think we should focus on a draft paper for the end of February.

Comment 050110.2

It's quite amazing how things are coming together.

Here is another piece of the puzzle. Attached is a plot from the early blocking page I did. Look at the upper right plot. The BLACK trace is the event analysis cumdev for the 1-sec standard analysis (aka netvar).

It's a bit hard to see, but it shows the cumulative result for all accepted events using the standard recipe. The big downward kink between 85-125 corresponds exactly to the big drop in the overall netvar from Jan 2002 - Oct 2003.

[Detail: the plot is a little different than the usual z-score cumdev. It plots the terminal value of the aggregate result at each point.]

So all those events were effected by that persisent year(s)-long trend. If we want to estimate post-hoc what's the real-time effect of the events, we need to take out the long-timescale background trend. So clearly there is a stronger event effect happening than get from the

the formal analysis result.

And, by the way, it appears that decrease effect goes away at larger blockings, just like the overall event effect. More evidence that the 1-sec netvar is our best stat.

GCP Home