Architecture, REGG-Net version 1.5, GHN, April, 1998

Part 1: Description and overview of issues
Part 2: Suggested specific values
Part 3: Glossary


PART 1: DESCRIPTION AND OVERVIEW OF ISSUES INTRODUCTION

This document describes some of the technical issues being discussed about the Random Event Generator Global Network, or REGG-Net, also known as EGG, an acronym for ElectroGaiaGram. . This is an edited version of a comprehensive architecture discussion and specification. Part 1 presents the general picture and some background discussions.  In Part 2, which overlaps Part one, more specific proposals are detailed.  Part 3 is a glossary for the many technical terms and acronyms.

Note added 98-09-07: A full understanding of the issues generated in creating a real-time network of this sort comes through the actual experience, and this leads to additions and revisions. A great deal of work has now been done, and some revised specifications have been generated by John Walker. They supplement this document, which provides the basic outlines for the technical instantiation of the network. The new specifications have been implemented, and as of 98-09-06, the new versions of data collection and archiving programs are running.

The specifications provide considered options and suggestions to focus discussion.  This is a working draft intended to help finalize enough details that we can begin the implementation of the network's "backbone" and possibly some of the other aspects.  It incorporates valuable contributions from many of the project members, only some of which have been properly acknowledged below.

The items that most require input are noted, but other discussion or suggestions are welcome.  Many of the default specifications will be apparent, but are quite malleable at this point (and some will remain settable options).

The network will consist of one or more centralized servers ("Baskets"), acting as clearing houses for data collected at numerous client sites ("Eggs").  We will plan at this point for no more than a hundred Eggs, and likely either two or three Baskets.  The connections between the various Eggs and the Internet is likely to be different in different cases.  Some Eggs will be connected directly to ethernet drops at commercial or university sites, while others will require dial-up connections.  Since some of the dial-up connections may be expensive, we plan to allow for both permanent and on-demand connections.

There is an expressed interest in eventually bringing in data from other types of sites running with different hardware or experimental parameters (in Jiri's terminology, "Cuckoo Eggs").  Although we are open to this possibility, there must be mechanisms for synchronizing and Co-analyzing the data with that taken from the native Eggs both in temporal and statistical terms.

By creating a multi-layer protocol, it will be much easier to incorporate such data.  The following set of protocol layers allows for a great deal of flexibility and reasonable independence of certain choices:

1) Hardware dependent data acquisition   
2) Data encoding (timestamps, content type information, etc.)  
3) Data transmission (packet organization, ports, protocol, encryption...)   
4) Analytical techniques   
5) Presentations of data

To incorporate data from Cuckoo Eggs, the first three layers would be replaced by any desired methodology of getting the data to the Basket (though not including throwing out the original Eggs, as Cuckoos do...).  However, the complexity of the analytical techniques will increase as more disparate data needs to be incorporated, so as an initial design we will assume a uniformity of everything but the first layer.  (Layer 3 might also change as a function of increasing security requirements.)

Each Egg may have a set of options, with some potentially remotely configurable while others might be set at compile or installation time, or only settable at the site.  These options apply primarily to the first two layers, since the third layer is intended to be a uniform interface essentially independent of the settings at the Egg-site.

In the remainder of the document we discuss the layers in order, followed by some more general issues that are layer-independent. At the present time, this document has very little to say about the fourth and fifth layers, since these are still very much open-ended issues.  

DATA ACQUISITION

At this time the acquisition software is being designed primarily around the PEAR "Bradish box" or "micro" REGs.  This protocol level can be replaced by a different set of code to support other devices such as "Bierman boxes."  At the present time, we do not have sufficient specifications to build the layer appropriate to these devices. 

An option can be provided to select the type of REG/RNG device, though this is probably fixed at compile-time or only selectable at the Egg-site, rather than remotely settable.

The data acquisition should be as close as possible to a "real-time" process (or "isochronous," if possible), to guarantee that the sampling rates selected (see below) are met exactly, without any sort of systematic drift.  Although there might be slight variations in the spacing of the samples (differential non-linearity) we expect that the average number of samples per second (integral non-linearity) will be controlled quite precisely, and therefore synchronized among systems. This should be easily accomplished using as a reference the system clock, which has near microsecond resolution on Linux machines. Synchronization of these clocks across machines is discussed below in the section "Broader Network and Protocol Issues."  

DATA ENCODING AND EGG-SITE OPTIONS

Given a low-level acquisition layer, a number of choices remain about what data to collect, how much to collect, when to collect it, how to represent it, and so on.  Furthermore, in order to analyze the data effectively, a variety of information is required in addition to the raw sample or trial values.  In particular, for the data to be comparable across sites, some form of uniform timestamp information is required.  The second layer of the protocol is designed to address these issues.

Many of the choices to be made are quite arbitrary, and it seems desirable to leave them as options that can be changed later.  Some may even be usefully changed at run-time, to change the nature of the experimental setup globally by a single administrative choice.  We discuss these issues in terms of a set of options, with certain practical limitations.

After some discussion, it seems fairly well agreed that the data should be transmitted in the form of "trials" which collect some number of bits.  Although the raw bits may be of interest, the basic Egg hardware design will not have the storage capacity, and often will lack the necessary bandwidth to communicate the data.  Further, to keep the communication protocol between Egg and Basket simple, it is preferable to base it always on the notion of a "trial" rather than allowing both trials and raw bitwise data to be communicated.  Thus we consider our other options in terms of "trials".  (An option to use bitwise data collection at some Eggs is briefly discussed below.)

  • Trial type
    After some discussion, it seems to be agreed upon that either Z-scores or bit-sums are a reasonable mechanism for representing the trials.  Given a known trial size, either can be transformed into the other.  For this reason, we believe the technical advantages of bit-sums (reduced local computation, storage, and communication) suggest implementing only the bit-sum method.
  • Trial size
    The number of bits accumulated into a trial should be variable, since there is no consensus on this at the present time.  It may even be desirable to have the trial size be different at different Egg-sites, though from a technical and analytical standpoint this seems undesirable, and we have found no strong argument in favor of this possibility. The range for this (in bits/trial) should be set as needed to give some decent statistical information within the trial, meaning a minimum of 32 and preferably 50 or more.  Keeping the number below 256 has the technical advantage of efficient storage and transmission.
  • Sample rate
    The sample rate (in trials/second) should also probably be uniform across all Eggs at any given time.  However, it should be made easy to change this rate if the consensus is that the data is too sparse or more voluminous than necessary.  Any number less than about 10 trials/second seems reasonable, with a maximum for time between trials set at 5 minutes.  An initial number between 3 trials/second and 3 seconds/trial seems a good starting point. The sample rate and trial size combine to give bits/second, and there are technical limitations on certain devices that preclude very high bit rates.  In the established configuration, the PEAR "Bradish Box" REGs can produce about 488 bits/second, which limits the output to less than 2 trials/second at 255 bits/trial, or 10 trials/second at 48 bits/trial.
  • Sample spacing
    Two major possibilities exist for how to turn bits into trials when the device produces more bits/second than are being included in trials.  In any case, all the bits produced by the device must be read and some discarded, or else (at least for serial devices) there will be a significant lag between the production of the data and the time at which it is read out of the buffer because of the FIFO nature of the serial communication. One possibility is to read all the bits required for a trial as rapidly as possible at the beginning of the trial time-interval. After this point, data is discarded until the next trial time-interval begins. Another possibility is to decimate the input stream at an appropriate rate so that the bits for a trial are (approximately) evenly spaced throughout the trial time-interval. There is probably no need to implement both methods, but we should come to a consensus on which is preferred.

We encourage an agreement to use the same sample size and sample rate at all Egg-sites.  This should completely mask any differences in the underlying device types.  Devices which are doing bit-collection would simply do this in addition as a parallel task, and that data would presumably flow through a different stream at its own rate.  Thus, even in this case, the server will see the same number and size of summed-bit trials.  This will make is possible to use QEEG type computations; these are made (when the source is a brain) from the output of electrodes of the same type placed at array of locations.  A strict analogy is required in order to apply similar computations, and this requires uniform data.

Once we have characterized the data in these terms, it can be encapsulated in an appropriate format for network transmission. Throughout we continue to discuss binary transmission and storage formats primarily for efficiency reasons.  It is understood that it may be desirable to build analytical tools that rely on more "human-readable" forms of the data; this can be accommodated through simple binary-to-hex, or more likely binary-to-text conversion utilities, which can be provided as needed.  

DATA TRANSMISSION/NETWORKING

Regardless of the choices made above, a uniform packet format should be implemented.  We propose transmitting the data in packets built out of a variable number of records.  The current proposal is for each record to contain some timestamp and checksum information, and be essentially independent of the rest of the packet.  The packetization would primarily be used to increase the efficiency of the communication protocol, although other ancillary information (current settings for encoding options, for example) should probably be contained in the packet.  There are some other choices that impact the packet design, often by the fact that they may vary from Egg-site to Egg-site.  

  • Networking protocol
    TCP and UDP over IP are the only real choices.  TCP offers some advantages including built in acknowledgement, segment size optimization, and streaming data organization.  UDP has the advantages of being very low-overhead in both bandwidth and implementation, and of being packet-oriented.  The protocol proposed below doesn't rely on any particular aspects of either protocol, so we intend to postpone this part of the implementation until the later part of the development cycle to allow for more feedback and discovery during the implementation process.
  • Communication mode
    Some sites will be connected all the time ("permanent" connections, ignoring network problems) while others will need to disconnect between data transfers (which we call "dial-and-drop" connections).  In the latter case, it must be the responsibility of the Egg to let the Basket know it is available for data transfer.  There seems to be no good reason why the same procedure should not be used for permanent connections.  Although the Basket could contact the Egg in these cases, it creates an unnecessary complexity.  There is an analogy here to the "server-push" and "client-pull" capabilities in HTTP, which accomplish essentially the same purpose with slightly different performance results. (Note that the public web-site modes are not determined here. For updates of analytical displays, both modes may be available, at least for Netscape browsers.)
  • Communication frequency
    The frequency with which an Egg-Basket session is initiated depends on several factors: the cost of such a session, the cost of the connect time, the availability of data from the acquisition layer, and the desire for an interactive (and reactive) central analysis and display. At a minimum, some sites with high connect costs may wish to call infrequently (perhaps once per hour or even less often) and transfer larger blocks of data. At a maximum, directly connected sites not being charged for bandwidth may wish to transfer the data essentially as soon as it becomes available.  There is probably a maximum rate beyond which there is a greatly diminishing data/protocol ratio, so we might limit the updates to minimally include a full packet worth of data.
  • Record size
    Each record might contain one or more than one trial.  The number is directly related to the frequency of the timestamps versus the frequency of the trials.  The record size should probably be fixed, or computed by a fixed algorithm like the one described below.
  • Packet size
    If the packet contains configuration information, its size will be greater than the sum of the records.  Minimally it must contain at least one record, unless we add the complexity of two different kinds of packets, one with configuration and one with data. For efficiency, the size of the packet should be as large as possible while still fitting within the typical MTU of the net connections being used.  The packet size could easily be different across different sites, if necessary.

    By creating our own protocol for this interaction, we can optimize the efficiency of operation and at the same time perhaps reduce design effort. Most standard protocols are either difficult to implement (FTP, HTTP), inefficient (GOPHER, SMTP), insecure (TFTP), or inappropriate (NFS) for this sort of communication. Writing our own protocol also allows us to leave open the choice of TCP versus UDP (even potentially at run-time, although implementing both seems unnecessary effort). It leaves the option open to enhance the security of the entire protocol, if the effort was deemed useful.

    ANALYTICAL TECHNIQUES

    Even if the devices operate in perfect synchronization, differences in local site costs (primarily the connection type) or communication difficulties may require an analysis to "go without" data from some set of Eggs at any given time. Therefore, we will also require some flexibility in the analytical mechanism. The matrix will often be incomplete at the instantaneous level, but more complete if the display is computed from older data. It is likely that there will often be "holes" in the data array. Some of the calculations (the "complete" calculation using all active sites) will probably be delayed for hours. This should be no problem with good analytical software design, but it reinforces the need for an accurate way of synchronizing the data which arrives long after-the-fact.

    There has been some discussion of applying QEEG calculations as one analytical technique. At this point we are unsure of the time-scale of the consciousness under study as compared to the time-scale of a human brain. In fact, there is of course much uncertainty about whether we're likely to measure a "coating of animal consciousness" or the consciousness of Gaia as an entity, which would likely operate on very different time scales.

    Further details of the analytical techniques remain to be developed.

    DATA PRESENTATION

    Most of the details of the data presentation remain to be developed. However, it has been agreed that it should be possible for an interested viewer to go back in time to review the data for specified past time periods in addition to being able to view the "present" state.

    Part 1: Description and overview of issues
    Part 2: Suggested specific values
    Part 3: Glossary


[next page]