TO DO: Replace this text with a very short description of the dataset.
TO DO: Replace this text by general info about the dataset, e.g. generic description of subjects and when the data was collected.
Replace this text that desribes how subjects were recruited, possibly including links to recruitment material used.
Inclusion criteria were:
- replace with inclusion criterion 1;
- replace with inclusion criterion 2;
- etc.
TO DO: Replace this text with (links to) data management plan, privacy policy and (if applicable), DPIA.
In the sections below, the data pre-processing and data formats used in the data files will be described.
TODO: describe
We used the following measurement device types to collect data. Some devices consisted of a main device and one or two satellite devices.
TO DO: Change the markdown table below as needed.
Source type | Category | Main device repo | Sattelite device 2 repo | Sattelite device 2 repo |
---|---|---|---|---|
OpenTherm-Monitor |
comfort + installation + occupancy | twomes-opentherm-monitor-firmware | ||
DSMR-P1-gateway |
energy | twomes-p1-gateway-firmware | ||
DSMR-P1-gateway-Tin |
energy + comfort | twomes-p1-gateway-firmware | twomes-room-monitor-firmware | |
DSMR-P1-gateway-TinTsTr |
energy + comfort + installation | twomes-p1-gateway-firmware | twomes-room-monitor-firmware | twomes-boiler-monitor-firmware |
DSMR-P1-gateway-TinTsTrCO2 |
energy + comfort + installation + occupancy/ventilation | twomes-p1-gateway-firmware | twomes-room-monitor-firmware | twomes-boiler-monitor-firmware |
All timestamps were measured in Unix time format, using device clocks regularly synchronized via NTP with the correct UTC time. Setting the local device clock to the proper UTC time via NTP was one of the first steps performed by the measurement devices after they were connected to the internet via the home Wi-Fi network of a subject. Each measurement device synchronized its device clock via NTP every 6 hours. Uploads of measurement data (which could contain more than one measurement) were timestamped both by the measurement device according to the local device clock and by the server. We did not yet check for deviations between the last device timestamp of a measurement upload and the upload timestamp at the server.
Timestamps were converted to a timezone-aware pandas.Timestamp
value, in the Europe/Amsterdam timezone. In the csv files we use ISO 8601 format with time offset: YYYY-MM-DDThh:mm:ss±hhmm
.
Raw masurements will be available in the folder /raw-measurements/ in two formats:
- twomes_raw_measurements.parquet: a single parquet file with data for all subject ids;
- nnnnnn_raw_measurements.zip: zipped csv files, one for each subject id;
All measurement data is structured according to the table below. By importing the parquet variant using pandas.read_parquet(), you automatically get a DataFrame wih the recommended indices and data types.
Alternatively, you can also read the zipped csv files, but this typically takes much longer. You can use the code below to endup with a DataFrame with the recommended indices and data types:
Index/Column | Name | Type | Description |
---|---|---|---|
index | id |
category |
unique code of the home |
index | source_category |
category |
catewgory, e.g. device, cloud_feed, energy_query, batch-import |
index | source_type |
category |
device type name of the measurement device |
index | timestamp |
Timestamp |
start of the interval (timezone aware) |
index | property |
category |
property name of the measurement |
column | value |
object |
value of the measurement |
column | unit |
category |
unit of the measurement value |
In the folder /raw-properties/ we will make various measured properties available in an 'unstacked' format with each property in its own column and an appropriate datatype. Similar to measurements, we will make data available in two formats:
- twomes_raw_properties.parquet: a single parquet file with data for all subject ids;
- nnnnnn_raw_properties.zip: zipped csv files, one for each subject id;
All property data is structured according to the table below. By importing the parquet variant using pandas.read_parquet(), you automatically get a DataFrame wih the recommended indices and data types.
Alternatively, you can also read the zipped csv files, but this typically takes much longer. You can use the code below to endup with a DataFrame with the recommended indices and data types:
Index/Column | Name | Type | Description |
---|---|---|---|
index | id |
category |
unique code of the home |
index | source_category |
category |
catewgory, e.g. device, cloud_feed, energy_query, batch-import |
index | source_type |
category |
device type name of the measurement device |
index | timestamp |
Timestamp |
start of the interval (timezone aware) |
column | property_1; see property table below | data_type_1 | measured value of this property |
column | property2 | data_type_2 | measured value of this property |
... | ... | ... | ... |
column | property_n | data_type_n | measured value of this property |
Below is a table that lists all properties that were measured, the data type in the raw-properties DataFrame, the measurement unit, the measurement interval, the source device and sensor that measured it, as well as the the property name and value format as retrieved from the Twomes database.
TO DO: Change the markdown table below as needed.
Property | Type | Unit | Measurement interval [h:mm:ss] | Description | Source Type | Sensor | Database property | Database format |
---|---|---|---|---|---|---|---|---|
co2__ppm |
float32 |
ppm | 0:05:00 | CO₂ concentration | DSMR-P1-gateway-TinTsTrCO2 |
SCD41 | CO2concentration |
%d |
Weather data was collected and geospatially interpolated using HourlyHistoricWeather from the Royal Netherlands Meteorological Institute (KNMI), based on average hourly values.
For all subject ids, we used the same location for geospatial interpolation of weather data:
lat, lon = 52.xxxxx, 6.yyyyy
. Average values were converted from the source units to the units as indicated in the table below.
Index/Column | Property | Type | Unit | Measurement interval [h:mm:ss] | Description | Source Type | Source property | Source value format | Source unit |
---|---|---|---|---|---|---|---|---|---|
index | timestamp |
Timestamp |
start of the measurement interval | KNMI | YYYMMDD , H |
H=1: 0:00:00 - 0:59:59; H=24: 23:00:00 - 23:59:59; | |||
column | temp_out__degC |
float32 |
°C | 1:00:00 | outdoor temperature | KNMI | T |
%d | 0.1 °C |
column | wind__m_s_1 |
float32 |
m/s | 1:00:00 | wind speed | KNMI | FH |
%d | 0.1 m/s |
column | ghi__W_m_2 |
float32 |
W/m2 | 1:00:00 | global horizontal irradiance | KNMI | Q |
%d | J/(h·cm2) |
TO DO: change preprocessing description below.
Preprocessing of measurements from the measurement database was done using get_preprocessed_homes_data(). Preprocessing steps include:
- removal of duplicate measurements;
- calculation of derived properties as a combination of other properties, as indicated in the column
Calculation
in the table below; - removal of absolute outliers, i.e measurement values smaller than the value in the column
Min
or larger than the value in the columnMax
in the table below; - removal of statistic outliers, i.e. measuremnt values with an absolute z-score higer than the value indicated in the
Sigma
column in he table below; - interpolation of measurements to intervals of 15 minutes (no interpolation between measurements that were 60 minutes apart or more);
- All column values represent the average during the interval that starts at the timestamp indicated.
TO DO: Change the markdown table below.
Index/ Column | Name | Type | Unit | Description | Calculation | Min | Max | Sigma |
---|---|---|---|---|---|---|---|---|
index | id |
Int16 |
unique code of the home | 000000 | 999999 | |||
index | timestamp |
Timestamp |
start of the interpolated interval (timezone aware) | |||||
column | T_out__degC |
float32 |
°C | outdoor temperature | -28 | 40 | ||
column | wind__m_s_1 |
float32 |
m/s | wind speed | 0 | 35 | ||
column | ghi__W_m_2 |
Int16 |
W/m2 | global horizontal irradiance | 0 | 1000 | ||
column | T_in__degC |
float32 |
°C | indoor temperature | 0 | 40 | 3 |
Dataset is: collected, anonimization-in-progress
This data is made available under the CC BY 4.0 by the Research group Energy Transition, Windesheim University of Applied Sciences
Data collection was a joint effort of:
- <contributor name 1> · @Github_handle_1 · Twitter @Twitter_handle_1
- <contributor name 2> · @Github_handle_2 · Twitter @Twitter_handle_2
- <contributor name 3> · @Github_handle_3 · Twitter @Twitter_handle_3
- etc.
Thanks go to those who are the ultimate source of this dataset:
- all anonymous subjects who volunteered to make their measurement data available
We use and gratefully aknowlegde the efforts of the makers of the following source code and libraries:
- HourlyHistoricWeather, by @stephanpcpeters, licensed under an MIT-style licence