|
Last Update 15 November 2007
This page is a working document and is not to be quoted or
referenced
Return to: GCP home page
Background:
Analysis 2004
2007 Update, Correlation of Presidental Poll and GCP Network Variance
Analysis and discussion by Peter Bancel
Polls that ask the question: "Do you approve or disapprove of the way
the president is handling his job?" probe a general sense of political
and societal well-being.
The formal GCP experiment examines network statistics around the time of
well defined events. However, it is possible that the intense worldwide
focus on events relating to the US "war on terror" can be identified as
an extended source of correlation with the network data.
One way to approach this is to look for correlations between
independent metrics (societal, political, etc) germane to the
period and study how theycorrelate with the data.
A striking case is presented here. We show a marked correlation between
the results of US presidential job approval polls and the network
variance during the post-9/11 period through mid-2007.
This is an update that essentially replicates the previous indications,
making a stronger case that
extraordinary historical periods can correlate with long-term
effects in the data.
For details on how the data have been assembled, read the main page on
this topic.
Changes in this page relative to the original presentation
from 2004 include:
-Update of poll and netvar data through (July 17) now
November 15 2007
-Correction of the netvar to take account of the 4-day gap
from Aug 5-8, 2002.
This shifts the fit slightly after these dates and gives
an improved (albeit barely perceptible) improvement for some of the
structure (such as the position of the Iraq war spike).
-Change of the reference level for the poll level from 50 to
55%. In
the fit I use the reference level to decide if contributions
of the
poll level give a positive or negative contribution to the
fit.
Previously, I had guessed 50% as a mean poll value. I have
since
found that the Gallup average over all presidents for which
they have
polls is 55%. Gallup is the longest running poll for this
polling
question (by far) and it's better to use the Gallup number
since my
'guess' is really a hidden, third parameter. It doesn't make
a big
difference, but the procedure is cleaner.
Some explanation of how this works
In this extension/replication, the same approach as before is used,
with minor changes that stabilize the modeling process.
The plot below compares the network variance cumdev to the polling data.
It usea a "toy model" that attempts to reproduce the netvar directly
from the poll data.
The approach is to assume social homeostasis which mean that discontent
or ease are felt in a population following perceived deviations from
societal well-being.
We assume that the Presidential job approval rating correlates with an
overall assessment of well-being by the population. The model asks
whether trends in the polling data, being representative of societal
concerns, correlate with GCP data deviations. In the event experiment we
have demonstrated the possibility that the netvar statistic correlates
with short periods of collective, emotive behavior. Here we ask whether
emotive behavior which persists on the time scale of years can also been
seen to correlate with the netvar measure of global random data.
In homeostasis the deviation from a perceived norm is translated into a
positive or negative assessment. In life systems this leads to
corrective action on the part of the organism. Two factors which come
into play are the absolute deviation from a norm and the rate of change
towards or away from an acceptable range of normality. For example,
various stress reactions occur when the body is too warm, but these
reactions are mitigated if there is an awareness that the environment is
cooling off. This is also evident in societal situations where actions
and behavior will depend on assessments of the current state of affairs
as well as whether things are improving or not.
The toy model developed here posits that correlations of the poll data
with the netvar statistic will depend on both the value and the slope of
the polls. The model expresses this as a linear function
N = alpha*P + beta* dP.
Here N is the measured value of the netvar in the GCP database, P is the
poll value and dP is the slope of the poll.
The linear model has two fit parameters corresponding to the two
homeostatic factors described above.
The fitting procedure used minimizes the squared sum of differences of
the cumulative deviation curves of the model and the netvar. The poll
data are prepared by fitting to a Cosine/Sine Fourier series down to
length scales of about 14 days. This procedure is essentially a
filtering process which eliminiates noise on time scales shorter than
two weeks. To avoid transients, the data are broken into segments at the
position of large discontinuities in the poll data ( Bush inauguration,
9/11 and the start of Iraq war).
After this preparation the fitting is straightforward. See
below for a graph showing how a minimum in parameter space
is found.
Blue trace: Model Fit to US Presidential approval ratings from
~14 US polling firms (sources: pollingreport.com,
ropercenter.uconn.edu).
Brown trace: cumulative deviation of GCP network variance (variance of
network mean at one-second resolution).
Update of the same figure to June 24 2008. Blue trace: Model
Fit to US Presidential approval ratings.
Brown trace: cumulative deviation of
GCP network variance. Vertical line shows the time when
Obama's campaign took off in early 2008, leading to his
likely nomination.
An issue that should be discussed is that the trend in the
event data is towards a positive netvar deviation. Here it's negative.
At this point any interpretation as to why this is so would
be highly speculative so we won't undertake it,
although it deserves consideration and we will have to keep it in mind.
It is particularly noteworthy that
the effect size per unit time, if we assume there is
one, is vastly smaller than what we see in the event experiment. There
is about 50x more data here than in the formal event
experiment. If the effect size were comparable to what we
see for the events, the negative going Netvar trend from 2002 through
2007 would have a z-score of nearly 100-sigma. The small long-term
effect size is rather in keeping with what we expect,
given the diffuse nature of the societal factor we have. Still, *if*
there is something to this, we begin to bracket the effect size from two
directions: intense short events and diffuse long-term concerns.
To augment the understanding of the model fit presentation,
it is useful to look at the raw data. The next figure shows
the Netvar cumulative with the raw polling data
superimposed. The scales are adjusted to bring the two data
sequences into the same numerical
range. The correspondences (long-term correlations and
spikes) are striking, and this explains why the model
fitting is so powerful.
Blue trace: Raw data from the presidential approval ratings
from ~14 US polling firms (sources: pollingreport.com,
ropercenter.uconn.edu).
Brown trace: Cumulative deviation of GCP network variance
(variance of network mean at one-second resolution).
Scales are addjusted to plot both curves in the same
numerical range.
We note for consideration that the poll data are at least
superficially Americano-centric.
We won't go into the issue in detail, but empirically it
seems not to have hindered the correlation. We might
conjecture that because of America's outsize role in world
politics, and in the wars that ultimately affect every
person on earth, the president's approval ratings may indeed
reflect a global attitude.
Notes
The data are gathered from two websites:
ropercenter.uconn.edu
and
pollingreport.com
I use 1095 separate polls from Aug. 9, 1998 to Nov 7,
2007.
The cumulative deviation plot is generated with an optimized fit, as described.
The blue is the fit (generated from polling data and fitted
with 2 (!!) parameters)
The brown is the netvar cumdev (with the same level of
smoothing as the fit).
Bush's data begin at the end of Jan 2001.
The main feature of the GWB data is its steady decrease.
It is puncutated with three spikes and three periods of
small increase.
The spikes correspond to 9/11, the March 19, 2003 start of
the Iraq
war and the Dec. 15, 2003 capture of Saddam Hussein.
Three broad, modest increases correspond to active
campaigning during
June-October of 2004 and 2006.
The small increase around Jan 2006 occured as the White
House was
promoting the the success of Iraqi parlamentary elections.
It is interesting that the short period preceeding 9/11 also
was
marked by a downward trend.
Shortly after 9/11 Bush achieved the highest poll ratings
ever recorded.
His recent approval ratings match the low values of Carter
at 28%,
but remain above the 23% nadirs of Nixon and Truman.
Clinton's presidency was marked by a slow, gradual increase
in
approval throughout his two terms (and despite the Lewinsky
scandal).
He suffered a 10-point drop in approval in the spring of
1999 which is visible on the plot. This is probably due to
anxiety over NATO's
engagement in the Balkan war.
Minimizing Fit Differences
The fitting process seeks to minimize the differences
between the two data curves. In particular, the squared sum
of differences between the model and the netvar must be
determined. Such a minimum in
parameter space is readily found as shown in the plot below.
GCP Home
|