# Long Term Network Variance Analysis

Reviewing more than a decade of data

It is possible to perform the same analysis we use for most formal events on larger amounts of data, without regard to any particular events. This can even be done for the full database of more than a decade of data. This allows asking a different kind of question, namely, whether there might be any trends over time. One possible application for such analysis might be to determine whether a correction might be appropriate for analysis of the individual events, assuming there is a background trend. (A tiny correction can be generated but it is not consequential.) A more interesting question is whether there might be a reason

for any long lasting, consistent deviation in our standard measure.

While we cannot easily identify explanations for them, it is possible not only to see trends, but to test them for statistical significance. Obviously, if there are slight, but statistically unimpressive deviations, there is no need to scratch our heads seeking their meaning. On the other hand, if the evidence for long term trends is clear, they become interesting and justify some effort to understand what they might mean.

The figure below shows the period from January 1, 2000 to August 8, 2012, almost 13 years of data processed in the same way we analyse most individual formal events. The latter are typically a few hours long (6 hours is the most frequent length). We are here looking at almost 2 orders of magnitude more data. It is not selected to represent any particular moments, but is literally a look at the long history of all data collected by the GCP. The calculation is the squared Stouffer’s Z-score across all eggs each second, which we refer to as the network variance. The plotted line is the cumulative sum of this measure. Its expectation is a horizontal random walk.

The figure definitely shows long trends of consistent deviation. From the end of 2001, the network variance tends to be low until near the end of 2008. This persistent downward trend looks impressive, and an analysis by Peter Bancel showed that it is statistically significant. (He tested the parameters of a fitted curve.) The trend reverses in late 2008 and has an even more extreme slope for the next couple of years, then returns to what looks like the expected random variation for such data.

Some points are marked on the horizontal zero line which are the dates of a selection of events that were subjects of formal analysis. Some of these appear to correspond to inflections in the network data trace, but this is likely to be just coincidence. Individual events are unlikely to have such long lasting effects. There is, however, one clear parallel that can be drawn. The two major trends in this 12 year figure correspond largely to the US presidential tenures of George Bush and Barack Obama. Looking for sociological variables that might correlate with GCP data, Bancel collected US presidential favorability ratings. He found some 500 data points over 8 years and in exploratory analysis discovered a substantial correspondence. Again, we should not conclude there is a causal relationship, but the coincidence does symbolize a worldwide perspective on the US presence in global affairs.

On the other hand, a comparison of this long term GCP data trend with the variation in sunspot counts over the same time period shows what appears to be a substantial correlation between the GCP Netvar measure and Solar Cycle counts.

A separate analysis of odds-ratio spikes in the long term data presents another interesting view of the full database.

It is important to keep in mind that we have only a tiny statistical effect, so that it is always hard to distinguish signal from noise. This means that every success

might be largely driven by chance, and every null

might include a real signal overwhelmed by noise. In the long run, a real effect can be identified only by patiently accumulating replications of similar analyses.