Last Update 15 November 2007

This page is a working document and is not to be quoted or referenced

                                 Return to:  GCP home page            Background:   Analysis 2004


2007 Update, Correlation of Presidental Poll and GCP Network Variance

Analysis and discussion by Peter Bancel

Polls that ask the question: "Do you approve or disapprove of the way the president is handling his job?" probe a general sense of political and societal well-being.

The formal GCP experiment examines network statistics around the time of well defined events. However, it is possible that the intense worldwide focus on events relating to the US "war on terror" can be identified as an extended source of correlation with the network data. One way to approach this is to look for correlations between independent metrics (societal, political, etc) germane to the period and study how theycorrelate with the data.

A striking case is presented here. We show a marked correlation between the results of US presidential job approval polls and the network variance during the post-9/11 period through mid-2007. This is an update that essentially replicates the previous indications, making a stronger case that extraordinary historical periods can correlate with long-term effects in the data. For details on how the data have been assembled, read the main page on this topic.

Changes in this page relative to the original presentation from 2004 include:
-Update of poll and netvar data through (July 17) now November 15 2007
-Correction of the netvar to take account of the 4-day gap from Aug 5-8, 2002. This shifts the fit slightly after these dates and gives an improved (albeit barely perceptible) improvement for some of the structure (such as the position of the Iraq war spike).
-Change of the reference level for the poll level from 50 to 55%. In the fit I use the reference level to decide if contributions of the poll level give a positive or negative contribution to the fit. Previously, I had guessed 50% as a mean poll value. I have since found that the Gallup average over all presidents for which they have polls is 55%. Gallup is the longest running poll for this polling question (by far) and it's better to use the Gallup number since my 'guess' is really a hidden, third parameter. It doesn't make a big difference, but the procedure is cleaner.

Some explanation of how this works

In this extension/replication, the same approach as before is used, with minor changes that stabilize the modeling process. The plot below compares the network variance cumdev to the polling data. It usea a "toy model" that attempts to reproduce the netvar directly from the poll data. The approach is to assume social homeostasis which mean that discontent or ease are felt in a population following perceived deviations from societal well-being.

We assume that the Presidential job approval rating correlates with an overall assessment of well-being by the population. The model asks whether trends in the polling data, being representative of societal concerns, correlate with GCP data deviations. In the event experiment we have demonstrated the possibility that the netvar statistic correlates with short periods of collective, emotive behavior. Here we ask whether emotive behavior which persists on the time scale of years can also been seen to correlate with the netvar measure of global random data. In homeostasis the deviation from a perceived norm is translated into a positive or negative assessment. In life systems this leads to corrective action on the part of the organism. Two factors which come into play are the absolute deviation from a norm and the rate of change towards or away from an acceptable range of normality. For example, various stress reactions occur when the body is too warm, but these reactions are mitigated if there is an awareness that the environment is cooling off. This is also evident in societal situations where actions and behavior will depend on assessments of the current state of affairs as well as whether things are improving or not. The toy model developed here posits that correlations of the poll data with the netvar statistic will depend on both the value and the slope of the polls. The model expresses this as a linear function

N = alpha*P + beta* dP.

Here N is the measured value of the netvar in the GCP database, P is the poll value and dP is the slope of the poll. The linear model has two fit parameters corresponding to the two homeostatic factors described above.

The fitting procedure used minimizes the squared sum of differences of the cumulative deviation curves of the model and the netvar. The poll data are prepared by fitting to a Cosine/Sine Fourier series down to length scales of about 14 days. This procedure is essentially a filtering process which eliminiates noise on time scales shorter than two weeks. To avoid transients, the data are broken into segments at the position of large discontinuities in the poll data ( Bush inauguration, 9/11 and the start of Iraq war). After this preparation the fitting is straightforward. See below for a graph showing how a minimum in parameter space is found.

Fitted Pres Pol on Netvar


Blue trace: Model Fit to US Presidential approval ratings from ~14 US polling firms (sources: pollingreport.com, ropercenter.uconn.edu). Brown trace: cumulative deviation of GCP network variance (variance of network mean at one-second resolution).

Fitted Pres Pol on Netvar


Update of the same figure to June 24 2008. Blue trace: Model Fit to US Presidential approval ratings. Brown trace: cumulative deviation of GCP network variance. Vertical line shows the time when Obama's campaign took off in early 2008, leading to his likely nomination.

An issue that should be discussed is that the trend in the event data is towards a positive netvar deviation. Here it's negative. At this point any interpretation as to why this is so would be highly speculative so we won't undertake it, although it deserves consideration and we will have to keep it in mind. It is particularly noteworthy that the effect size per unit time, if we assume there is one, is vastly smaller than what we see in the event experiment. There is about 50x more data here than in the formal event experiment. If the effect size were comparable to what we see for the events, the negative going Netvar trend from 2002 through 2007 would have a z-score of nearly 100-sigma. The small long-term effect size is rather in keeping with what we expect, given the diffuse nature of the societal factor we have. Still, *if* there is something to this, we begin to bracket the effect size from two directions: intense short events and diffuse long-term concerns.

To augment the understanding of the model fit presentation, it is useful to look at the raw data. The next figure shows the Netvar cumulative with the raw polling data superimposed. The scales are adjusted to bring the two data sequences into the same numerical range. The correspondences (long-term correlations and spikes) are striking, and this explains why the model fitting is so powerful.

Fitted Pres Pol on Netvar

Blue trace: Raw data from the presidential approval ratings from ~14 US polling firms (sources: pollingreport.com, ropercenter.uconn.edu). Brown trace: Cumulative deviation of GCP network variance (variance of network mean at one-second resolution). Scales are addjusted to plot both curves in the same numerical range.

We note for consideration that the poll data are at least superficially Americano-centric. We won't go into the issue in detail, but empirically it seems not to have hindered the correlation. We might conjecture that because of America's outsize role in world politics, and in the wars that ultimately affect every person on earth, the president's approval ratings may indeed reflect a global attitude.

Notes

The data are gathered from two websites: ropercenter.uconn.edu and pollingreport.com
I use 1095 separate polls from Aug. 9, 1998 to Nov 7, 2007.

The cumulative deviation plot is generated with an optimized fit, as described. The blue is the fit (generated from polling data and fitted with 2 (!!) parameters) The brown is the netvar cumdev (with the same level of smoothing as the fit).

Bush's data begin at the end of Jan 2001. The main feature of the GWB data is its steady decrease. It is puncutated with three spikes and three periods of small increase. The spikes correspond to 9/11, the March 19, 2003 start of the Iraq war and the Dec. 15, 2003 capture of Saddam Hussein. Three broad, modest increases correspond to active campaigning during June-October of 2004 and 2006. The small increase around Jan 2006 occured as the White House was promoting the the success of Iraqi parlamentary elections. It is interesting that the short period preceeding 9/11 also was marked by a downward trend. Shortly after 9/11 Bush achieved the highest poll ratings ever recorded. His recent approval ratings match the low values of Carter at 28%, but remain above the 23% nadirs of Nixon and Truman.

Clinton's presidency was marked by a slow, gradual increase in approval throughout his two terms (and despite the Lewinsky scandal). He suffered a 10-point drop in approval in the spring of 1999 which is visible on the plot. This is probably due to anxiety over NATO's engagement in the Balkan war.

Minimizing Fit Differences

The fitting process seeks to minimize the differences between the two data curves. In particular, the squared sum of differences between the model and the netvar must be determined. Such a minimum in parameter space is readily found as shown in the plot below.

parameter fitting
minimizing



GCP Home