Polygraphic Recording Data Exchange - PolyRex

Frequently Asked Questions

last updated: May 9, 2012

9-Aug-2004	1.	What is the problem of converting 24-bit data files to 16-bit data files?
	2.	Sometimes the conversion seems to be fine, but when converting large data files it appears that some data are lost during the conversion?
	3.	How do the settings for gains and offsets affect the conversion process?
	4.	What is the purpose of the A/D gain histogram and the integer recording level display?
	5.	How does re-referencing the recorded data affect the gain resolution?
12-Aug-2004	6.	How can I exclude channels with bad or artifactual data from the conversion?
	7.	How can I exclude artifactual data records from the conversion?
	8.	Is it preferable to use separate gain calculations for each channel, rather than using a fixed gain or a single gain calculation across all channels?
	9.	What is a reasonable gain?
23-Feb-2006	10.	Why do all target files have the same file length when multiple source files are converted? Furthermore, the target file(s) contain(s) empty data records at the end, or data records are missing.
	11.	In my hardware setup, stimulus and response events are coded on the same byte. How can I separate these codes?
12-Oct-2006	12.	It would be helpful to have a more detailed data display than the crude integer overview in minutes for the entire recording. Is there any way to display the data in seconds in order to estimate more precisely when recording artifacts occur?
	13.	Are there any procedures for pre-filtering the data prior to conversion?
9-May-2012	14.	Are there any known limitations to the conversion?

1. What is the problem of converting 24-bit data files to 16-bit data files?

In BioSemi's BDF data format, each analog value is converted to a digital 3-byte or 24-bit represention, meaning that integer numbers ranging from -8,388,608 to + 8,388,607 can be used to represent any particular analog value. BioSemi's ActiveTwo hardware setup records a physical data range from -262,144 to +262,143 µV (524 mV; see BioSemi's ActiveTwo documentation on a 24-bit system and digital resolution), which means that each integer value corresponds to a gain of 0.03125 µV (524,288 µV / 16,777,216).

In contrast, systems using a 2-byte or 16-bit data storage format can represent any particular analog value only by integer numbers ranging from -32,768 to + 32,767. A typical gain would be 0.5, meaning that a physical data range from -16,384 to +16,384µV would be covered. Data outside these range limits would saturate or "pin" at these integer minima and maxima.

The above graphic depicts to scale the (signed) 24-bit and 16-bit data recording ranges (512 pixel in green for 24-bit, 2 pixel in red for 16-bit). If one would try to record a physical data range from -262,144 to +262,143 µV with 16-bit, one would have to sacrifice gain resolution: one integer would correspond to a gain of 8 µV (524,288 µV / 65,536).

Alternatively, one could keep the original gain of 0.03125 µV per integer value, but in this case the 16-bit data range could only cover a physical data range from -1,024 to +1,024 µV. However, this would still correspond to a reasonable physical range for EEG data, in which blink amplitudes rarely exceed 500 µV.

The problem, of course, is that the acquired data can fluctuate anywhere within the 24-bit integer range, particularly with BioSemi's true DC recordings (no physical high pass filter), and artifactual data fluctuations may spanned the entire recording range (BioSemi uses only digital filter). The decision is then to either loose the data outside the A/D conversion range and keep a reasonable A/D data resolution, or keep all recorded data with low A/D data resolution. Alternatively, one would try to compromise between these two extremes.

2. Sometimes the conversion seems to be fine, but when converting large data files it appears that some data are lost during the conversion?
When the converted data are displayed (e.g., with NeuroScan's Edit program), they may to look "crude" or "chopped", not "like EEG" data (cf. left panel below). This is a direct result of the 24- to 16-bit down-conversion using a low gain resolution (more µV per integer value). The presence of some kind of recording artifact is a frequent cause for creating a low gain resolution when using the PolyRex default settings. In contrast, using a high gain resolution (less µV per integer value) during conversion would result in a more resonable representation (cf. right panel below; both panels have the same time and µV scales). The conversion settings of PolyRex can be changed to accomplish these results at the expense of loosing (artifactual) data.

Both the specifics of the recorded data and the selected conversion settings of PolyRex will affect the gain that is used during data conversion. The gain that is used during the conversion is reported on the destination matrix form (below left), while the original recording gain is reported on the source matrix form (below right).

There are several ways to influence the gain selection of PolyRex. First, there are basic settings for gains and offsets, which should be chosen according to the conversion goal. Second, rereferencing the recorded data "on-the-fly" during conversion will also affect the conversion gain. Lastly, one can try to identify the source for a low gain conversion using the gain and level display, and exclude data records or flag bad channels accordingly.

3. How do the settings for gains and offsets affect the conversion process?

The whole purpose of PolyRex is to optimize the 24- to 16-bit conversion process, trying to keep the loss of gain resolution to a minimum. By checking the 'Remove recording offsets' option, PolyRex will analyze the integer range of the recorded data in a first data scan, and subtract its mean, an estimate for the overall recording offset level, before integers are written to the 16-bit data file. As a result, only the actual recorded integer range has to be squeezed into the 16-bit integer integer. If the width of the recorded integer range is less than target integer range (i.e., 65,536 or less), no loss in gain resolution will occur. Leaving this option unchecked will convert the data as recorded, and will result in data saturation for integers outside the 16-bit range.

If data get saturated during the conversion process, PolyRex will indicate these as occurences of over- and underflow on the main form, and also in the log notes.

Checking the option 'Adjust break offsets at epoch intersections (recording on/off)' will exploit the information when data acquisition was turned off and on again within the same data file (BioSemi's ActiView acquisition software flags these events). Most likely, the overall recording EEG level has changed during the acquisition pause, resulting in sharp recording 'offsets' or 'jumps' within the recording. PolyRex will eliminate these 'break' offsets to further optimize data conversion.

For most pratical EEG or ERP purposes, the overall recording level has no informational value. Nevertheless, these data can be stored in the header of the converted NeuroScan file, or written to an external ASCII file, if this information is needed.

Unless the 'Adjust 24-bit integer range' option is checked under 'Gains', no attempt will be made to change the original integer values. The integer range slider below this option, together the edit fields 'Restric range to:' will set the target integer range. Although less the total 16-bit range of 65,535 will further limit the gain resolution, it may be advisable to avoid using the full range, as follow-up data processing (e.g., blink reduction, filtering, baseline correction) may result in data saturation (PolyRex uses 50% as the default).

There are three modes to compute a new gain: 1) 'Separately for each channel' will optimize the gain resoultion for each channels, but will most likely result in different gains across channels; 2) 'Across all included channels' will produce the same gain for each channel after determining the integer recording range across all channels;

and 3) 'Apply a fixed gain value' determined by the user.

Whereas options 1 and 2 guarantee that the recorded data stay within the conversion target range (unless bad channels are flagged), option 3 may result in data saturation (these occurences are reported in the log notes and indicated on the main form). [Note: Since version 1.1.3.7, PolyRex will compute and remove the median integer level from the converted data for option 3, as this will likely leave most of the 'good' recording periods within the target conversion range.]

4. What is the purpose of the A/D gain histogram and the integer recording level display?

The 'Analyze' button on the main form will allow to determine the best conversion settings by previewing the effects of the new gain calculation, because it forces PolyRex to analyze the integer range of recorded data.

The gain histogram provides a quick feedback how the gain will differ between channels after conversion (cf. graphic below). However, only the 'Calculate new gain' option 'Separately for each channel' under the 'Offsets and Gains' settings can produce different gains for each channel, otherwise gains will be identical for all channels (gains may also be identical, for example, if no fit is necessary, thereby keeping the optimal recording gain). The lowest and highest gain values are listed in the upper left corner of the display.

A left-mouse click on the gain histogram (or via right-mouse pop-menu) will show the original integer recording levels for the entire recording.

As these values are computed as integer means for each data record (usually 1 s) and each channel, they allow a rough evaluation of how recording levels fluctuate across the entire recording. Artifacts, drifts, or any other recording problems can be realized and addressed accordingly, for instance, by flagging a bad channel, or by excluding artifactual recording epochs.

The example below identifies a recording artifact in channel T8 (green line) between 8 and 9 min. Note that integer values for channel T8 are varying across the entire positive 24-bit integer range, but not for the other three EEG channels.

A closer look at the event table on the converted events reveals no stimulus or response activity between 340 and 652 s, suggesting that EEG data were collected during an intermission.

This problem could be solved by converting the file in two conversion runs, selecting only data records before (i.e., up to 340 s) and after the recording artifact (i.e., from 652 s to the end). Alternatively, a fixed gain may be determinded based on the projected gains of the other channels, which will saturate channel T8 during the period of the recording artifact. Or, a bad channel flag may be set for T8.

5. How does re-referencing the recorded data affect the gain resolution?

By default, BioSemi's stores all acquired data without a physical recording reference (see BioSemi's documentation for using an active reference with a passive electrode). While data may be converted in this original, "reference-free" state, it is more useful to convert the data to a physical reference using one (or more) of the electrodes included in the recording montage. This can be done "on-the-fly" during the conversion process by checking the 'Rereference' option under 'Channels and Reference' settings, and highlighting one (or more) channels in the reference montage box (use the Shift-key to select more than one channel). All data transformations are performed before determining an appropriate gain, thereby affecting the computation of the observed recording integer range that has to fit into the target integer range.

6. How can I exclude channels with bad or artifactual data from the conversion?

If a channel has been identified containing artifacts, it may be best to simply exclude this channel from the conversion by unchecking its 'Recorded Channels' enable box under the 'Channels and Reference' settings.

However, if the channel should be included in the montage of the converted file, it can also be flagged as a bad channel by using the right-mouse pop-up menu or 'Ctrl-B' as a short-cut.

In this case, all original data of the flagged channel are converted, but the new gain calcaluation for the flaged channel will be set to the lowest gain of all good channels included in the conversion (marked with red labels in the A/D gain display).

All flagged channels will also be flagged as bad in the header of the NeuroScan file. [Note: As of version 1.1.3.8, flagging channels as bad may result in data saturation, and, like using a fixed gain option, PolyRex will compute and remove the median rather than the mean integer levels from the converted data of flagged channels.]

The flagging of one or more channels as bad will also be indicated on the main form during the conversion process. [Note that bad channel flags are transient and will be reversed after selecting a different file, un- or reselecting the same file, or by re-entering the settings form.]

Any data saturation resulting from a limit imposed on the gain calculation (e.g., as a result of flagging a bad channel) are reported as over- or underflows (i.e., integer values are outside the target integer range after conversion).

7. How can I exclude artifactual data records from the conversion?

Sometimes, the original recording includes data recorded during unattended time intervals (e.g., at the beginning or end of a recording block, during session breaks, etc.), which frequently causes all kinds of EEG recording artifacts (movements, current induction, etc.; see an artifact example above). Due to the nature of a wide aquisition range of the analog signal, these artifacts will not easily saturate (as they do with a restricted 16-bit acquisition range), but rather will be recorded as they are (i.e., huge artifacts). By default, PolyRex will try to accomodate the 24-bit recording range and fit the observed signal into a 16-bit target range, which will result in an undesirable low gain resolution. With the aid of the A/D gain histogram and the integer level display, artifact time periods and the contributing channels can be identified.

The main window allows to convert only a subrange of the original data records. The example to the right shows that the original file holds 1304 data records with a duration of 1 s/record. Checking the 'Range' option, and specifying the subrange of data records (e.g., from 1 to 527) will restrict the conversion (and all gain calculations) to this subrange (i.e., from the begining of the recording up to recording time 8 min 47 s = 527 s). [Note that this option must be unchecked or adjusted for a different data file.]

8. Is it preferable to use separate gain calculations for each channel, rather than using a fixed gain or a single gain calculation across all channels?

There is no easy answer to this question: it depends on the data and the intended use. Calculating the same gain across all channels will limit the gain resolution for all channels to the gain of the "weakest" contributor; however, it has the advantage of having the same gain for all channels, which is most comparable to the acquisition scenario (usually, all EEG amplifiers are set to the same gain). Obviously, if the "weakest" contributing channel provides a very bad gain, all other (perfectly good) channels will have the same bad gain. Using a well-chosen fixed gain value may provide a reasonable comprise, as this will (hopefully) saturate the weakest channel during artifactual recording periods (see Offset and Gain).

A low gain does have a significant impact on the raw data, and therefore on all EEG measures generated directly from the raw data (e.g., spectral measures). The impact on secondary EEG measures, such as ERPs, which are generated by averaging (many) EEG epochs, will depend on the overall signal-to-noise ratio (SNR). With a sufficient high SNR, stemming from either a large ERP signal (i.e., components), low background noise, or both, it is very likely that small differences in acquisition gain are negligible or non-existent in the averaged ERP waveform.

9. What is a reasonable gain?

As a rule of thumb for typical EEG or ERP studies, one would like to avoid the gain dropping below 1 µV/integer value. Any gain equal or better than 0.5 µV/integer value would be comparable to the gain typically accomplished with a conventional 16-bit EEG acquisition system. Ultimately, the gain selection must be based on the objectives or priorities of the study. Ideally, PolyRex will convert the 24-bit data without any loss in the original recording gain of 0.03125 µV/integer value.

10. Why do all target files have the same file length when multiple source files are converted? Furthermore, the target file(s) contain(s) empty data records at the end, or data records are missing.

This bug was addressed with the release of PolyRex 1.2. Prior to this version, the number of data records to be included in the conversion for each target file was (incorrectly) determined by the subrange values of the currently (lastly) selected file (see BDF header display on the main form). If multiple files were selected to calculate identical conversion gains across the data range of these files, the number of data records of the last source file added to the list determined the number of data records included in all converted target files. Thus, if the number was smaller than that of a previous file in the list, some data records were missing in the conversion of the previous file(s). If the number was greater, the previous file(s) contained additional data records at the end of the target files consisting of zero values. The latter procedure (i.e., adding the largest source file with most data records last to the list) was therefore an easy work-around for this problem. Since version 1.2, data records are correctly converted. As a result of this correction, converting a subrange of data records has been limited to the conversion of single source files.

11. In my hardware setup, stimulus and response events are coded on the same byte. How can I separate these codes?

Before the release of PolyRex 1.2, this possibility was not directly addressed, and would therefore require a manual decoding of the integer numbers stored for an event (e.g., all stimulus codes greater than 15 denote a stimulus event, all codes equal to or less than 15 denote a response event). Since version 1.2, the Triggers and Events settings dialog allows the individual selection of any one of the three status bytes stored in the last channel of a BDF data file for any encoded event (StimType, KeyPad, FKey).

Triggers and Events

This example uses status byte #2 (corresponding to bits 8-15) for encoding of both stimulus events (StimType; bit range 12-15) and response events (KeyPad; bit range 8-11). Note that the integer number for both of these events will range from 0 to 15. The possibility of freely assigning the status byte to any event code (StimType, KeyPad, FKey), in combination with all other byte decoding options (byte inversion, decoding a numeral or line of triggers, identifying raising/falling egdes, determining bit position and bit range), allows to efficiently capture most hardware scenarios of recording stimulus and response events with a BioSemi system.

12. It would be helpful to have a more detailed data display than the crude integer overview in minutes for the entire recording. Is there any way to display the data in seconds in order to estimate more precisely when recording artifacts occur?

The main purpose of PolyRex is to (more efficiently) convert BioSemi data files (*.BDF) to the NeuroScan 3.x data format (*.CNT) in order to take advantage of NeuroScan's data processing and analysis software. The integer overview display was implemented in PolyRex because no such tool is available in NeuroScan's EDIT program for evaluating the overall time course of the general recording level. However, the issue can easily be addressed by first converting the entire data file with a low gain resolution and then using EDIT for a more detailed view of the data to determine exactly when artifacts occur. The artifact-contaminated data records (their duration of usually 1 s is determined by the properties of BioSemi's acquisition software) can then be excluded in a second conversion to attain an improved conversion gain.

Brief Recording Artifact: PolyRex Integer Display

The above example shows the integer recording level display of PolyRex, in which a brief, transient recording artifact is essentially unrecognizable due to the large time scale (minutes).

Brief Recording Artifact: EDIT Display

After converting this file at a very low gain, loading the converted file into EDIT, and scrolling through the recorded data on a second-by-second basis, the artifact can be detected in channel T8 approximately at recording time 3:51 [min:sec], therefore occuring in data record 232 (3 * 60 + 51 + 1).

13. Are there any procedures for pre-filtering the data prior to conversion?

As a simple data conversion software, PolyRex is not intended to perform other typical data post-processing routines, although some functions (re-referencing, adding EOG channels) are provided. In fact, the conversion should allow the use of NeuroScan's EDIT program for routine data processing, including filtering. While applying a high pass filter to the data before conversion could help improving the gain, it will also remove the DC recording property of the BioSemi aquisition system, which may or may not interfere with the research objective. Pre-conversion filtering could probably be accomplished with Matlab (there are links to free MatLab code from third-party developers to read BioSemi data files on BioSemi's download page), but obviously filtered data need to be saved to *.BDF format for subsequent use with PolyRex.

14. Are there any known limitations to the conversion?

For the latest public release of PolyRex (version 1.2.1.2), data conversion is limited to 256 data channels due to the internal handling of individual channels (i.e., set of byte). [Note: PolyRex will not explicitly indicate this limitation to the user, which may result in failed conversions or program crashes (this bug was unknown until 5/2/2012 as our lab has only processed BioSemi data files with substantially less channels). This problem will likely be corrected in case of a future public release.]