|
Last Update 20 Jan 2005 NOTE: The poll fitting section has been updated to Nov
2007. Read this section for This page is a working document and is not to be quoted or referencedReturn to: GCP home page Background: Analysis 2004
Correlation of Presidental Poll and GCP Network Variance
Polls that ask the question: "Do you approve or disapprove of the way the president is handling his job?" probe a general sense of political and societal well-being. The formal GCP experiment examines network statistics around the time of well defined events. However, it is possible that the intense worldwide focus on events relating to the US "war on terror" can be identified as an extended source of correlation with the network data. One way to approach this is to look for correlations between independent metrics (societal, political, etc) germane to the period and study how theycorrelate with the data. A striking case is presented here. We show a marked correlation between the results of US presidential job approval polls and the network variance during the post-9/11 period through mid-2004. This case study is sketchy at this point, but it does suggest that extraordinary historical periods can correlate with long-term effects in the data. The poll data correlation was a serendipidous discovery and is not the result of a systematic search through many metrics. That would be useful to do, of course. Nevertheless, there is some rationale behind the choice of this poll which asks the question: "Do you approve or disapprove of the way the president is handling his job?". First, this poll question gives a fairly well defined metric. It has been asked in nearly identical form by many polling organizations, for many years. It is thus standardized and long-term. We were able to find regular reports of this poll question (nearly 2 per week) for the entire period spanned by the GCP data. For access to the poll results and comments on long-term interpretation of polling data a useful site is http://pollingreport.com. Second, the question is general enough so that a person's current sense of individual and societal well-being enter into responses, and, clearly, answers are not criteria-based evaluations of how the president performed in doing an objectively defined "job". Can a US poll be considered global-event correlative? It is certainly worth a try. Indeed, the poll question may be considerably less US-centric than one would guess. Put differently, it would be rash to claim that an American poll did not correlate - in any way - with world opinion (whatever that may mean!), particularly during a period of such intense worldwide debate as the post- 9/11 years. A debate on the methodology of a studies like this one will be useful. We hope this study will provoke further thinking. The poll results are for 556 separate polls from Aug 9, 1998 to Dec 15, 2004. Poll dates are take to be the closing day of the polling period [most polls are conducted over 3-4 days]. Values are averaged when more than one poll closes on the same day. There are 506 data points representing 506 unique polling dates. The plot below compares the network variance cumdev to the polling data. Several questions that come to mind are: Most generally, as reflected in the presidential polls, we ask, "Is there a change in network behavior associated with major world events related to terrorism and terror politics?" Does the network variance grow when there are strong, persistent feelings of unity, rally and common purpose? Were such feelings expressed by some populations in the weeks after the 9/11 attacks or during the short Iraq campaign? Does the network variance decrease when there are strong, persistent polarizing forces? Were polarized or negative sentiments dominant in some populations during the "axis of evil" diplomacy leading up to the US invasion of Iraq or during the subsequent deterioration of the situation there?
A Normalized Comparison of Poll and GCP DataA view of the network variance cumdev and poll plots when both are normalized to unit variance bears further study. The comparison is complicated because the netvar trace is cumulative and the poll trace is not, but thoughtful consideration may help in the search for other correlates to the long term structure.
A Model of Polling vs Netvar DataThere are a number of ways to study the correspondence between these curves. One approach is to model the netvar cumdev by taking the polling data as input. A linear transformation F[polling data] = model_netvar provides a simple model. Assume, for the moment, that something ressembling a collective mood is reflected in the poll and also in netvar. Observing that the netvar rises when the poll values are high and stationary, and that the netvar decreases when there is a strong decline in the poll (examine the plot above to see this), we can make a simple model: F[poll] = a (poll value) + b (poll slope). This model says both the poll value and the poll trend (up or down) need to be taken into account. To make the model explicit, take the poll value to be (poll value - 50). This means ratings around 50% have little effect, and high/low ratings have positive/negative effects. A small refinement recognizes that polls are non-linear since, in the real world, an 80% rating is hugely more significant than, say, a 60% rating. In order to take this political reality into account, choose a model that takes the signed square of the poll value: F[poll] = A {(poll - 50)*Abs[(poll - 50)] + B*slope}. Some technical notes on the model: We want to simulate daily values of the network variance directly from the polling data. To do this, a daily poll value is calculated by interpolation from the raw poll values and dates. A daily slope value is obtained from this as the difference between poll values at an interval of 5 days. The choice of a 5 day lag is not crucial. The lag is highly correlated with parameters A and B and does not represent a third fitting parameter (at least for reasonable lag choices of 2 to 20 days). The slope procedure gives spurious results when there are discrete jumps in the poll data. We use piecewise continuous interpolation to avoid this problem at three points: the changing of presidents around the Bush inauguration, the 9/11 attacks and the start of the Iraq war. Finally, we reduce the noise level somewhat by making a 3-day moving average of the poll interpolation (actually, this has little effect and can be dropped from the model. Adjustment for the noise level is accomodated by fitting parameter A). The plot below shows the raw (blue) and interpolated polling data.
Because this is only a demonstration, we don't wish to push this toy model beyond a simple first approximation. Nevertheless, after playing with a few values of A and B the following plot obtains. The netvar is the blue trace and the model is in orange. The (inverse) fitting parameters are 1/A = 12,000 and 1/B = 2. The lag for determining the poll slope is 5 days.
The correspondence between the model and the GCP data is reasonably close. We find it remarkable that a two parameter model does this well over 6 years of data. Below is the same plot broken into sections for pre- and post- 9/11. (Because the cumdev is summed starting from Oct 1, 1998 and Sept. 11, 2001, respectively, for the two plots, the plots each begin at the origin). This work has been updated to 2007
We can learn a bit more about the model if we calculate the correlation between the model output and the netvar data using cosine fits. A plot of the correlation is shown below, along with indications of probability envelopes. Over the full dataset of 2169 days, the correlation accumulates to about a 5-sigma level. The correlation is between the data and a fitted model and it should not be interpreted as 5-sigma correlation between the poll and netvar data. But it does demonstrate that the model reproduces important trends in the data on a range of timescales. A large contribution to the correlation comes from the first few cosine amplitudes. This is because the model accurately reproduces the deline in the netvar cumdev from 2002-2004. However, if we remove this contribution, the model still shows significant correlation with the data over timescales from 6 months to 2 weeks. This indicates that the model is reproducing structure on these shorter timescales as well. This point is emphasized by the second trace which repeats the cumulative, but with the large contribution from the first few cosine amplitudes removed.
Finally, we compare the periods before and after 9/11. As for the A/B data-splitting, we find that the significant correlations in the post- 9/11 data, but not in the preceeding data.
As with the full dataset, the post- 9/11 correlation has a major contribution from low wavelengths. If their contribution is removed from the correlation, there is still significant correlation from structure on timescales down to several weeks. This behavior is qualitatively reproduced by the toy model.
Converted by Mathematica January 13, 2005 |