The RC+S streams data in packet form. Because the streaming backend employs a UDP-like protocol (where there is no capacity for packet recovery), it becomes critical to account for packet loss during streaming and data post-processing. Failing to account for packet loss will adversely affect computation of PSDs and other event related measures.
For most data science applications, it is useful to format data in a "tidy" matrix in which each row represents an instance of a time domain measurement and each column represents the set of features associated with that particular measurement (i.e. the metadata). Our code performs the operation of efficiently transforming raw data from the RC+S (saved in the form a JSON file) into a CSV file (albeit at the expense of increased storage utilization).
Below is a visual summary of what raw saved data looks like (in JSON form) vs. the derived data from our code:
-
Matlab
- Parses RC+S time domain data (in the form of a
RawDataTD.json
file) into a CSV output with samples along rows and channels in columns. - Creation of a CSV file is optional thus allowing for in-memory manipulation of the data without the overhead time associated with data saving it to disk.
- Parses RC+S time domain data (in the form of a
-
Python
- Parses both RC+S time domain and accelerometry data (in the form of a
RawDataTD.json
or aRawDataAccel.json
file) into a CSV output with samples along rows and channels in columns. - Command-line arguments are available for control of timestamp processing (timestamps may be left in seconds since March 1st, 2000 or converted into a human-readable datetime format)
- Packet deserializer can be invoked from the command-line via the
python
interpretor and run as a standalone program, or the deserializer can be imported as a module into a python program of your creation giving you access to the processing functions contained therein.
- Parses both RC+S time domain and accelerometry data (in the form of a
Samples of raw RC+S time domain data (in JSON format) with the processed "tidy" output created from it are available in the sample_data
folder of this repo.
The following packet headers are found in a RawDataTD.json
file:
dataTypeSequence
- 8bit Packet number counter, rolls over.
systemTick
- 16bit INS clock-driven tick counter, rolls over, LSB is 100microseconds (high accuracy and resolution).
TdSampleRates
- Hexadecimal value corresponding to sample rate used for a given recording: 0x00 is 250Hz, 0x01 is 500Hz, 0x02 is 1000Hz, and 0xF0 is disabled.
timestamp
Timezone-naive INS wall clock time, does not roll over, LSB is seconds (high accuracy, low resolution). Time calculated in seconds since March 1st, 2000 at midnight.
To convert timestamp
to the actual datetime in Matlab
use:
datetime(datevec(timestamp./86400 + datenum(2000,3,1,0,0,0)))
To convert timestamp
to the actual datetime with pandas
in Python
use:
pd.to_datetime(timestamp, unit='s', origin=pd.Timestamp('2000-03-01'))
- For streaming time domain data at 1000, 500, or 250 Hz, the intersample interval (in
systemTick
) time is 10, 20, and 40systemTick
counts, respectively. - To guard against large-scale packet losses in which the
dataTypesequence
counter rolls-over (or thesystemTick
counter rolls-over), we track packettimestamp
. A gap >= 6.5536 seconds in the successivetimestamp
of two packets implies a large packet loss and triggers a re-initiation of oursystemTick
linearization algorithm.- For example, in our schematic above a large packet loss occurred between packets 5 and 117.
- The
systemTick
time associated with each packet is the INS time of the last data sample in the associated packet.- The
systemTick
timestamp of the first data sample in the packet can be calulated as follows: Time First Sample =systemTick
- (Interstample Interval insystemTick
time * (data points
- 1)) - The time for the other remaining data samples in the packet can be linearly interpolated by equidistant spacing of time between the
systemTick
time associated with the first and last data sample in each packet, respectively.
- The
- Packet losses with < 6.5536 seconds between successive timestamps can be captured by tracking the
dataTypesequence
counter (represented aspacket #
in our schematic above).- For example, in our schematic above a small packet loss occurred between packets 2 and 4.
- The number of lost data samples for a small packet loss can be calculated as follows:
- [
systemTick
of first data sample in next packet - (systemTick
of last data sample in previously received packet + Interstample Interval insystemTick
time)] / Interstample Interval insystemTick
time - For the small packet loss shown in our schematic, we lost [((7537 - 240) - (6539 + 10)) / 10] = about 75 data samples.
- [
- When neither large nor small packet loss occurs (the default case), we can calculate the unique timestamp of each data value as follows:
- The
timestamp
of the first packet in the session is recorded. - The
systemTick
time at which each data sample in a packet occurred is calculated by linearly interpolating between the first and lastsystemTick
time associated with the first and last data sample, respectively (details explained in Point #3 above). - A linearization algorithm uses modular arithmetic to convert the roll-over
systemTick
counter into a monotonically increasing counter such that thesystemTick
time of data samples in successive packets is always greater than thesystemTick
time of data samples in the previous packet. - The
systemTick
time is converted to microseconds and added to thetimestamp
associated with the start of the recording session.- If a large packet loss occurred during recording, the
systemTick
time of data samples occurring after the packet loss are added to thetimestamp
associated with the first successfully-received packet after the large packet loss occurred.
- If a large packet loss occurred during recording, the
- The
A fully commented algorithm for deserializing RC+S time domain data can be appreciated in the Python
implementation by reading the comments (in sequential order) contained within the extract_td_meta_data
, code_micro_and_macro_packet_loss
, and unpacker_td
functions.
A number of factors impact the fidelity with which the RC+S streams data back to the host computer. Several RC+S streaming parameters can be configured depending on the individual use case:
- The RC+S CTM can operate in two different modes: Mode 3 & Mode 4.
- Mode 3 is optimal for streaming data across longer distances but at a slower rate.
- Mode 4 is best for streaming data across shorter distances but at a faster rate.
- The number of packets sent by the INS to the host computer vs. the number of packets sent by the host computer to the INS can be varied programmatically at the start of a recording session.
- In general, high CTM Ratios should be used for high-throughput sensing applications; lower CTM Ratios should be used when INS parameters most be updated rapidly (as would be the case in a distributed DBS application).
- A number of other factors impact streaming performance. These include, but are not limited to, distances between the host computer, CTM, and INS; 60Hz environmental noise; number of channels being streamed; and sampling frequency.
- Recordings found within the
sample_data
folder were acquired with CTM Mode = 3, CTM Ratio = 4, two time domain channels streaming at 1000 Hz, and accelerometry data streaming at 64 Hz.
Python
:- Requirements:
- Python >= v3.5
- pandas
- numpy
- Usage:
- TBD...
- Requirements:
Matlab
- Folder with Matlab code to do the same. seeMAIN
function to select TD file.
- Add routines to process data folders
- Consider implementing more efficient datetime storage (double rather than string) if human readability not important.
- Backtrace first packet
timestamp
from a system rollover. - Consider using data that exists in TimeSync.json option.