prespollnotes.nb

Did the long-timescale character of the GCP data change after 9/11?.

Vertical bar marks Sept. 11, 2001.
The plot shows the cumulative deviation of the daily values of the network variance [netvar] from Oct 1, 1998 to Sept. 8, 2004. The netvar for each day is expressed as a z-score. The parabola is the 5% probability envelope for the cumdev. There are 2166 plotted daily values. [there is no GCP data for the period Aug 5-8, 2002]

[Graphics:images0109/index_gr_1.gif]

The random deviations seem qualitatively greater after 9/11.

Does this suggest a non-random trend in the post 9/11 data?
This is not confirmed by mean and variance tests on the pre- and post- 9/11 data.
A mean test for zero difference between the pre and post 9/11 data subsets gives a pval = 0.11
A variance ratio test for identical variance of the pre-/post- datasets yields a pval = 0.38.
Comment, RDN: While 9/11 is arguably a good point to make a division (which necessarily is post facto and arbitrary), another reasonable point might be the visually obvious inflection during the Afghan war. If that point were used, the mean difference would almost certainly be significant.

Using data at the minute level gives pvals of 0.08 and 0.24, respectively for the mean and variance. These tests do not support the hypothesis of a non-random trend developing in the data after 9/11. However, visually, there appear to be pronounced long-timescale structure after 9/11, with both positive and negative slopes. The mean and variance tests are not sensitive this.

Another approach is the following:

Calculate two netvar datasets using alternating seconds for each set. The interdigitated datasets are thus rigorously independent. If there is strong, non-random, long-timescale structure in the cumdev after 9/11, it will be present in both datasets. In that case correlations exist between A and B which is strong evidence for an anomalous effect.
Call the interdigitated datasets A and B. A plot of the two sets is shown below:

[Graphics:images0109/index_gr_2.gif]

Visually, there is a correspondence between the red and blue curves after 9/11 AND both show structure similar to the full netvar curve (in grey, rescaled and offset) AND there is little correspondence before 9/11.
Comment, RDN: The visual correspondence before 9/11 is pretty strong except for the first few months. After 1999, there tends to be a lot of parallel shifting of the trends on the order of weeks or longer, though not as much as post-9/11.

This is the main qualitative result of the A-B data splitting. Additional Comments, Jan 9

Note that the strong peak of the Iraq campaign (near day 1700) and the preceeding steep descent appear in all three curves. In order to test the correspondence quantitatively, we want a test sensitive to structure on this scale. Standard correlation coefficients are not sensitive to detailed structure since they only test linear, or at best monotonic, correlations.
The z-score for the Pearson correlation for the full, pre- and post- 9/11 segments of  A and B are:
Full    1.81
Pre    1.72
Post    0.81
which derives from the modest linear correlation between A and B.
The Pearson coefficient tests the very long timescale correspondence. Here it is only marginally significant. And it does not distinguish the pre- and post- 9/11 periods.
Comment, RDN: Not sure of your meaning here. The Z-scores indicate the pre-9/11 data are significantly correlated as suggested earlier.

One way to test for correlation in the structure is to fit the curves and test correlations between the two sets of fitting parameters. Below is a plot of fits to A and B using 51 cosine functions. The fit is done on the cumdev because we want the structure to be prominent. The fitting parameters are the cosine amplitudes and the cosine wave vectors are 2n Pi / L , where L is the number of data points. The fits are done for n = [0,50]. Using n up to 50 allows fitting of structure on timescales of 1 month and longer.
The cosine expansion is most efficient for centro-symmetric structures, so the cumdev is concatenated with its reflection before fitting. The figure below shows the fits (grey) for the full 2165 data points (days) for curves A and B. The centro-symmetric reflection doubles the number of points to 4330. The fit uses 100 cosine functions. [The center of the plot, which is the last day of data, is the reflection point. It is marked by a vertical bar.]

[Graphics:images0109/index_gr_3.gif]

The fitting procedure gives a set of coefficients (cosine amplitudes) for each curve A and B. These are the cosine amplitudes. To look at the difference between pre- and post- 9/11, split the sets at that date and calculate correlations for the periods separately. The number of points is halved for each period and we need only 50 amplitudes. Let the coefficients be A[n] and B[n] where n = [0,50] labels the cosine functions.
Then the correlation is the sum of pair products of the coefficients:

Sum[ (n+1)^2*A[n]*B[n] ].

Note: The pair products are weighted by the squared cosine wave index, n^2, which compensates the average falloff in the cosine amplitudes (they decrease as (roughly) n). We try to give equal weight to structure on all timescales. The cosine index weighting is one way to do this. The correlation thus measures realatively local (in time) structure, such as the Iraq war "peak", and broad structure, such as the decline. The plot below shows the relation between wave index and the standard deviation of fitted cosine amplitudes. A fit (blue) is stddev = (t/n)^2.16 , where t = 1.618.

[Graphics:images0109/index_gr_4.gif]

Below is a preliminary result of the correlation calculation. The A & B datasets were each split into segments before and after 9/11. The cosine amplitude correlation was calculated for the A & B pre-9/11 data and again for the post-9/11 data. A plot of the cumulative correlation for the two data regions shows that the correlation for the post-9/11 period is significant whereas the pre-9/11is clearly insignificant. (note: the horizontal axis is the cosine wave index; low order indices contribute to the long-timscale features and high order indices to short timescale structure. Indices around number 50 correspond to structure in the netvar cumdev with half-widths of roughly a month.) Explain half-width -- is that like the half width of a distribution, e.g. like a standard deviation of a mean length? Probability envelopes for the correlation are being calculated. Preliminary results suggest that the pval for the 51-amplitude fit is around 0.001 (z-score = 3). The cumulative also shows that many wave indices between n=0 and n=50 contribute to the correlation. This is consistent with the netvar cumdev which shows structure on the scale of months to years. Thus, at the 3-sigma level (and to be confirmed by further calculations on the amplitude correlation probability distribution), the post-9/11 data contain non-random structure on long timescales.

This is the main quantitative result of the A-B data splitting. Additional Comments, Jan 10 Additional Comments, Jan 10.2

Below is the cumulative for the A/B correlation for the post-9/11 data. Empirical envelopes show the probability of correlation is roughly .0025 for fits with resolution down to the month level (n up to 50). The correlation for the pre-9/11 data (blue) is clearly not significant.

[Graphics:images0109/index_gr_5.gif]

We can study the correlation by looking separately at different timescales. Steps in the correlation cumulative show that there is correlation associated with wave indices where the correlation increases sharply. This information lets us decompose the fits to see which features are contributing to the overall correlation. The figure below shows the fits for A and B datasets using cosine amplitudes through n = 5, 25, 39 and 73. The right hand panels show fits with the preceeding lower-order-n fits subtracted out. This isolates structure responsible for correlation for timescale windows evidenced from the correlation cumulative.

[Graphics:images0109/index_gr_6.gif]

The following plots repeat the A/B analysis for the device variance. Visually, the datasets have very different cumdevs and no correlation is obvious. In the plot below, the colors distinguish the two sets A and B.

[Graphics:images0109/index_gr_7.gif]

Correlation for both the pre- and post- periods are negligible for the device variance. In the plot below the colors distinguish the pre- and post- 9/11 periods.

[Graphics:images0109/index_gr_8.gif]

The following plot compares the full datasets A and B for the netvar and the devvar. The cumulative A/B correlation for the the full dataset from Oct 1998 to Sept. 2004 is marginally significant for the netvar (Z about 2.4) and insignificant for the devvar (Z less than 1.0).

Comment, RDN: It appears that both netvar and devvar have a steep cumulative -- lots of correlation beginning with n~45. For devvar, ~45 to ~55, and for netvar ~50 to ~75.

[Graphics:images0109/index_gr_9.gif]

Is there a change in network behavior associated with major world events related to terrorism and terror politics?

Polls that ask the question: "Do you approve or disapprove of the way the president is handling his job?" probe a general sense of political and societal well-being.
Does the network variance grow when there are strong, persistent feelings of unity, rally and common purpose?
Does the network variance decrease when there are strong, persistent polarizing forces?

[Graphics:images0109/index_gr_10.gif]

Figure caption: Red trace: US Presidential approval ratings from 6 US polling sources (AP, Harris, Gallup, ABC, Pew, NBC). Blue trace: cumulative deviation of GCP network variance (variance of network mean at one-second resolution).

Vertical bars mark major events:
Bush Inauguration,
Shaded region I: Terrorist attack and Afghan campaign (9/11 attacks , Sept 11, 2001 to announcement of Taliban defeat, Dec 16, 2001),
Shaded region II: Iraq campaign (official announcement of bombing , May 19, 2003 to announcement of end of "major combat operations", May 1, 2003),
capture of Saddam Hussein (Dec 13, 2003),
Madrid terrorist bombings (March 11, 2004),
Bush re-election.

The poll results are for 556 separate polls from Aug 9, 1998 to Dec 15, 2004. Poll dates are take to be the closing day of the polling period [most polls are conducted over 3-4 days]. Values are averaged when more than one poll closes on the same day. There are 506 data points representing 506 unique polling dates.

Same as above, with 3-pt smoothing of poll results.

[Graphics:images0109/index_gr_11.gif]

Same as above, with 8-pt smoothing of poll results and 20-pt smoothing of the netwok variance.

[Graphics:images0109/index_gr_12.gif]

A view of the network variance cumdev and poll plots when both are normalized to unit variance.

[Graphics:images0109/index_gr_13.gif]

It's quite amazing how things are coming together.

Here is another piece of the puzzle. Attached is a plot from the early blocking page I did. Look at the upper right plot. The BLACK trace is the event analysis cumdev for the 1-sec standard analysis (aka netvar).

[Graphics:images0109/index_gr_11.gif] It's a bit hard to see, but it shows the cumulative result for all accepted events using the standard recipe. The big downward kink between 85-125 corresponds exactly to the big drop in the overall netvar from Jan 2002 - Oct 2003.

[Detail: the plot is a little different than the usual z-score cumdev. It plots the terminal value of the aggregate result at each point.]

So all those events were effected by that persisent year(s)-long trend. If we want to estimate post-hoc what's the real-time effect of the events, we need to take out the long-timescale background trend. So clearly there is a stronger event effect happening than get from the the formal analysis result.

And, by the way, it appears that decrease effect goes away at larger blockings, just like the overall event effect. More evidence that the 1-sec netvar is our best stat.

Converted by Mathematica January 9, 2005