| Architecture, REGG-Net version 1.5, GHN, April, 1998 | ||
Part
1: Description and overview of issues PART 1: DESCRIPTION AND OVERVIEW OF ISSUES INTRODUCTION This document describes some of the technical issues being discussed about the Random Event Generator Global Network, or REGG-Net, also known as EGG, an acronym for ElectroGaiaGram. . This is an edited version of a comprehensive architecture discussion and specification. Part 1 presents the general picture and some background discussions. In Part 2, which overlaps Part one, more specific proposals are detailed. Part 3 is a glossary for the many technical terms and acronyms. Note added 98-09-07: A full understanding of the issues generated in creating a real-time network of this sort comes through the actual experience, and this leads to additions and revisions. A great deal of work has now been done, and some revised specifications have been generated by John Walker. They supplement this document, which provides the basic outlines for the technical instantiation of the network. The new specifications have been implemented, and as of 98-09-06, the new versions of data collection and archiving programs are running. The specifications provide considered options and suggestions to focus discussion. This is a working draft intended to help finalize enough details that we can begin the implementation of the network's "backbone" and possibly some of the other aspects. It incorporates valuable contributions from many of the project members, only some of which have been properly acknowledged below. The items that most require input are noted, but other discussion or suggestions are welcome. Many of the default specifications will be apparent, but are quite malleable at this point (and some will remain settable options). The network will consist of one or more centralized servers ("Baskets"), acting as clearing houses for data collected at numerous client sites ("Eggs"). We will plan at this point for no more than a hundred Eggs, and likely either two or three Baskets. The connections between the various Eggs and the Internet is likely to be different in different cases. Some Eggs will be connected directly to ethernet drops at commercial or university sites, while others will require dial-up connections. Since some of the dial-up connections may be expensive, we plan to allow for both permanent and on-demand connections. There is an expressed interest in eventually bringing in data from other types of sites running with different hardware or experimental parameters (in Jiri's terminology, "Cuckoo Eggs"). Although we are open to this possibility, there must be mechanisms for synchronizing and Co-analyzing the data with that taken from the native Eggs both in temporal and statistical terms. By creating a multi-layer protocol, it will be much easier to incorporate such data. The following set of protocol layers allows for a great deal of flexibility and reasonable independence of certain choices:
To incorporate data from Cuckoo Eggs, the first three layers would be replaced by any desired methodology of getting the data to the Basket (though not including throwing out the original Eggs, as Cuckoos do...). However, the complexity of the analytical techniques will increase as more disparate data needs to be incorporated, so as an initial design we will assume a uniformity of everything but the first layer. (Layer 3 might also change as a function of increasing security requirements.) Each Egg may have a set of options, with some potentially remotely configurable while others might be set at compile or installation time, or only settable at the site. These options apply primarily to the first two layers, since the third layer is intended to be a uniform interface essentially independent of the settings at the Egg-site. In the remainder of the document we discuss the layers in order, followed by some more general issues that are layer-independent. At the present time, this document has very little to say about the fourth and fifth layers, since these are still very much open-ended issues. DATA ACQUISITION At this time the acquisition software is being designed primarily around the PEAR "Bradish box" or "micro" REGs. This protocol level can be replaced by a different set of code to support other devices such as "Bierman boxes." At the present time, we do not have sufficient specifications to build the layer appropriate to these devices. An option can be provided to select the type of REG/RNG device, though this is probably fixed at compile-time or only selectable at the Egg-site, rather than remotely settable. The data acquisition should be as close as possible to a "real-time" process (or "isochronous," if possible), to guarantee that the sampling rates selected (see below) are met exactly, without any sort of systematic drift. Although there might be slight variations in the spacing of the samples (differential non-linearity) we expect that the average number of samples per second (integral non-linearity) will be controlled quite precisely, and therefore synchronized among systems. This should be easily accomplished using as a reference the system clock, which has near microsecond resolution on Linux machines. Synchronization of these clocks across machines is discussed below in the section "Broader Network and Protocol Issues." DATA ENCODING AND EGG-SITE OPTIONS Given a low-level acquisition layer, a number of choices remain about what data to collect, how much to collect, when to collect it, how to represent it, and so on. Furthermore, in order to analyze the data effectively, a variety of information is required in addition to the raw sample or trial values. In particular, for the data to be comparable across sites, some form of uniform timestamp information is required. The second layer of the protocol is designed to address these issues. Many of the choices to be made are quite arbitrary, and it seems desirable to leave them as options that can be changed later. Some may even be usefully changed at run-time, to change the nature of the experimental setup globally by a single administrative choice. We discuss these issues in terms of a set of options, with certain practical limitations. After some discussion, it seems fairly well agreed that the data should be transmitted in the form of "trials" which collect some number of bits. Although the raw bits may be of interest, the basic Egg hardware design will not have the storage capacity, and often will lack the necessary bandwidth to communicate the data. Further, to keep the communication protocol between Egg and Basket simple, it is preferable to base it always on the notion of a "trial" rather than allowing both trials and raw bitwise data to be communicated. Thus we consider our other options in terms of "trials". (An option to use bitwise data collection at some Eggs is briefly discussed below.)
We encourage an agreement to use the same sample size and sample rate at all Egg-sites. This should completely mask any differences in the underlying device types. Devices which are doing bit-collection would simply do this in addition as a parallel task, and that data would presumably flow through a different stream at its own rate. Thus, even in this case, the server will see the same number and size of summed-bit trials. This will make is possible to use QEEG type computations; these are made (when the source is a brain) from the output of electrodes of the same type placed at array of locations. A strict analogy is required in order to apply similar computations, and this requires uniform data. Once we have characterized the data in these terms, it can be encapsulated in an appropriate format for network transmission. Throughout we continue to discuss binary transmission and storage formats primarily for efficiency reasons. It is understood that it may be desirable to build analytical tools that rely on more "human-readable" forms of the data; this can be accommodated through simple binary-to-hex, or more likely binary-to-text conversion utilities, which can be provided as needed. DATA TRANSMISSION/NETWORKING Regardless of the choices made above, a uniform packet format should be implemented. We propose transmitting the data in packets built out of a variable number of records. The current proposal is for each record to contain some timestamp and checksum information, and be essentially independent of the rest of the packet. The packetization would primarily be used to increase the efficiency of the communication protocol, although other ancillary information (current settings for encoding options, for example) should probably be contained in the packet. There are some other choices that impact the packet design, often by the fact that they may vary from Egg-site to Egg-site.
|