From bethke@execpc.com Thu Dec 14 15:57:19 2000 Date: Thu, 14 Dec 2000 10:55:14 -0600 From: Paul Bethke To: 'rdnelson' Subject: RE: Wow! [ The following text is in the "iso-8859-1" character set. ] [ Your display is set for the "US-ASCII" character set. ] [ Some characters may be displayed incorrectly. ] Hi Roger, I see I have a few messages awaiting, I'll read/reply to them in sequence... I haven't gotten to analyze that data any further - yet. Let me try to address the questions you pose here... > If the level is the chisq value, which I understand is > expected to be 201 df, why is the main mass at about 50 or 60? I am still playing with "what the values mean". Obviously, the lower the chisq, the closer to "random" the data is. The procedure is to take a "window" of data, in this case 1200 seconds, (+/- 600 seconds) and perform the bell-curve fit on that data. Generally, the data fits the bell-curve pretty well (the norm is around 50). When a very low or very high value (bitsum) actually occurs, however, they tend to produce these "peaks", which last for 1 window of data. Typically, these peaks occur sporadically. I've tended to think that they were misleading and had been trying to determine a way that they would be less conspicuous. But last night's data changed my thinking in that, yes - they will occur, but perhaps seeing multiple peaks coinciding among various EGGs might be what to look for. The method I am using is almost identical to that explained at the RPKP Probability and Statistics page - http://www.fourmilab.ch/rpkp/experiments/statistics.html - whereas you know the probability of how the counts of the bitsums should be distributed, and you compare that to the actual distribution. (and doing a quick re-read of the RPKP page shows that it's actually 200 df, not 201 - sorry.) > The peaks appear to be ranging up to values well past the > .05 value of 244, which sounds like the Chisq values > indicate large deviations, but I suspect I need much better > understanding of what you are doing. I have tried to figure > out what you mean by "abnormally thin peaks" but definitely need help. First the "abnormally thin peaks". That reference is actually to another, combined graph that I didn't include. I reference the thin peaks simply indicating what led me to further investigate to get to the graph which I sent you. Those thin peaks don't show up in this graph. Typically the "peaks" are the width of the window (1200 seconds). Seeing thin ones was peculiar. As is seeing overly wide ones - which is how the "bad egg" problem exhibited itself a week or two ago. The large amplitude of the peaks is generally related to the impact of bitsums which are very distant from the mean. For instance, a bitsum of 65 (or 135) has a probability of 2.2e-7. So when it *does* occur in a sampling of 1200, it has a large impact. > What are the "Series" 1 - 18? The "Series" labels are simply the default labels that Excel (where the graph is from) gives to data series. They represent individual EGGs, and there are actually more than what appear in the "Legend" - the number of EGGs reporting. I didn't pretty-up the graph before I sent it to you. > What is the variation at the peak top? Are you doing some > sort of smoothing? The variation at the peak top is the same as the variation at the bottom - it is the variation caused by dropping one sample off the tail of the sample, and tacking on a new one at the head. The actual calculation of the chisquare could be downsampled, as calculating it every second is really overkill. But this is the way I'm doing it for the moment. > Do you have any notion of the "physical" meaning for the values? I have not yet found a good meaning. Perhaps it's not that there *are* peaks, but their coincidence that is significant. I think I should show you an example of the normal graphs I've been looking at so you see why I was so excited by this one. I will attach such a graph and move onto your next message. ;) (Actually, the sample attached does look a little interesting toward the end of the sample, but I just chose it randomly.) Paul -----Original Message----- From: rdnelson [mailto:rdnelson@Princeton.EDU] Sent: Wednesday, December 13, 2000 11:33 PM To: Paul Bethke Cc: Roger D. Nelson Subject: Re: Wow! On Wed, 13 Dec 2000, Paul Bethke wrote: > I am so excited! > > I did some analysis on the GCP data from last night. I did my > "gauss-curve-fit" chisquare analysis on the data and saw abnormally narrow > peaks. The only explanation I could determine was that there must have been > peaks from different eggs close to each other in time, and their overlap > produced these strangely thin peaks. It is very exciting. I got the note with the graph just as people were arriving, and they just left so I couldn't properly study what you had sent. I have some questions, and hope you are going ahead with the other work you said you had in mind. > > So I did the same analysis by individual eggs - !!! I could hardly believe > what I found. There are at least 6 separate EGGs peaking at the same time! I > am including the graph for you to see. I have never seen them coincide like > this. > > What you are seeing: The data is 3 hours of EGG data starting at 0200 UTC. > The level is the chisquare value. (BTW the P=0.05 value for 201 df is > 233.9943) The width of the peaks is basically the window size, or how many > seconds make up the window over which the chisquare is calculated. (So the > actual data causing the large differences is at the center of the "peak"). > If the level is the chisq value, which I understand is expected to be 201 df, why is the main mass at about 50 or 60? The peaks appear to be ranging up to values well past the .05 value of 244, which sounds like the Chisq values indicate large deviations, but I suspect I need much better understanding of what you are doing. I have tried to figure out what you mean by "abnormally thin peaks" but definitely need help. What are the "Series" 1 - 18? What is the variation at the peak top? Are you doing some sort of smoothing? Do you have any notion of the "physical" meaning for the values? > I have yet to identify which EGGs are involved to see if that tells us more. > > I will tell you that I had been skeptical of this analysis method from > looking at some of the "case" data in the prediction chart - as they seemed > to only be producing spurious bits of interesting data. But this is pretty > interesting... > > I have more work to do on it, but I wanted to share what I have with you. > > Please let me know what you think...Paul I think it is great. More application to see whether it may be a way to see structure in the data, but really interesting just visually, as indication that something is going on. We just need to understand what. Best, Roger [ Part 2, Image/GIF 21KB. ] [ Unable to print this part. ]