Skip to content

UserGuide

Richard Darst edited this page Jan 30, 2018 · 15 revisions

Koota is a data collection server. Data comes in, it gets linked to devices that are a part of user accounts, and users can extract it. Users can also be a part of groups which allow the sharing of data. There is a very strong emphasis on privacy.

This is the User Guide for the koota-server system. You can use this guide to learn what is the koota-server system, what data does it collect, how to make an account, and what to do to begin collecting your data. The server is under active development, so please report problems and send suggestions using the GitHub Issues tab.

There are several modes of operation. One is "independent", where a person manages their own data collection and uses their own data. The other is in a study, where the user is part of a research study and some other researchers can access the data which is specified by the study. This page describes the independent case, but most principles also apply to the centrally managed operation. Both modes can exist at the same time.

General idea

Koota is a server that allows you to collect data from different devices, such as android phones, iphones, digital surveys, and bed sensors. You can have as many devices as you want to generate data. Koota gathers all the data in one place, facilitating the analysis.

Once you add a device, it sends data to the server, which stores it sorted by user and by device. Then, you can download it for analysis after you log in to your user account. However, raw data is rather messy and requires a lot of work to get it into a more convenient form. So, the server provides various converters (pre-processers) which convert raw data to an easy-to-analyze-form, such as CSV. These converters depend on the type of device it is, and have to be written for each individual analysis. They also manage privacy, and serve to filter the data available in studies.

Privacy

The data is privacy-sensitive. Therefore, data is treated with care. Each person decides what to share and by setting up a device. Data is not shared with anyone unless you are part of a study. You can see the studies you are a part of.

If you are not part of a study group, the data can only be seen by you. However, if you are part of one or more study groups, the group researchers of that group can have access to the anonymized version of the data. In this case, the data and anonymization is defined by each group and by signing up to the group, you accept its terms and conditions. Each group has its own level of limitation to the data that has been shared. In addition, the converters also provide a way to limit the data that can be seen by others. For example, if you are in a group for a study on bed sensors, the researchers can only extract the data relevant to that study in a processed, anonymized form; they cannot extract information about other devices that you have set up.

The PrivacyPolicy defines the default service and privacy terms for the independent case.

In general,

  • You have the right to examine your own data on the server, download it, and do your own analysis. There is no secret data. The management of these data is your own responsibility.
  • You may be enrolled in different "research project groups". A group has its own research goals and privacy procedures, which you will approve separately when you enroll.
  • Researchers and the data controllers do not have the right to examine raw data, or any data until you have agreed separately.

Sign up

To begin, go to https://koota.cs.aalto.fi/ and register. It should be anonymous, and that means that there should be no real information given: Use a username which means nothing (the page lists a suggestion) Do not enter an email address - it's not used for anything. The overriding principle is that you should not identify yourself to anyone else.

You should always use the anonymous account for collecting data, and if you take part in studies, use a separate real account.

Once you log in, explore the navigation a little bit. If you go to your device list, you can add devices and view device info/config instructions. The device list is the primary place to be.

Devices

A "device" represents one type of input data source, for example an app that records phone activity. Your account can have multiple devices of the same or different type linked to it. You add a device to the system, configure the source to send data, and then the server collects data. You can look at the device on the server, see how much data has been collected, and access that data. If you are involved in research studies, researches will be able to access this data according to the study parameters.

In the device list page, there is a "create device" link. If you go there, you can create a new device and specify the following options:

  • Name: something for you to tell the devices apart (example: main phone, nexus 5)
  • Device type: The type of data it is. Must be set correctly
  • Usage: Set to "primary personal device" if the data belongs to you as a person. It is OK to have multiple primary devices of the same type. (For example, if you have a phone that is yours, and a tablet that is yours, and the data of both fully represents you). Secondary device would mean that that the data represents you, but you are not the only user of the device (shared phone/tablet, for example) or the data is much less important. "Test device" is obvious, and "other" could be used for anything else. The most important practical concern is this: when doing research, both primary and secondary devices are used in studies, the others are not.
  • Comment: any note to yourself or to researchers. This in not used at all now. Can be left blank
  • Archived: Leave this blank. Normally you don't delete devices, instead you just archive so it doesn't bother you.

Right now, the following devices are available:

  • PurpleRobot - an android application available on Google Play. Install it and follow the instructions on the config screen of the device. This app is a bit hard to configure, since everything must be done manually.
  • Aware - another android app.
  • MurataBSN (bed sensor node) - a device which you attach to your bed, which can monitor the times you sleep.
  • FunfJournal - (Android) The application Funf Journal on the Google Play store. This is under development. Could be applicable to other Funf apps (it is a general framework).
  • WebSurveys - Surveys.

Devices we may add in the future:

  • Capture of data (via APIs) from other web services
  • Fitness trackers

An important note: one data source = one server device. Use different devices on the server to keep your data logical. There is no limit to how many devices you can use.

You may create as many devices as you would like. For example, if you have multiple Android devices, you should create multiple virtual devices on the server. Also, you may create extra server devices if you would like to logically separate the data from the same physical device - just re-configure it. For example, you can create an initial sleep sensor device while you are testing it, then another one once your primary research begins and you have decided on your final settings. If you moved your sleep sensor to different positions on the bed, that could also be different devices. Just think about what will make your analysis easiest in the future. If you are involved in a research study, they will have more strict details about how devices should be configured on the server

Once you link the device, you should see data points being collected. You should verify this by seeing the most recent packet arrivals.

There are two views of the device: data view and config view. The config view has information on how to set up the data collection, which usually points back to this wiki.

Viewing data

Go to the device info page, and you see some basic info, including when the first/las datapoints were, and the most recent raw data. The server saves all data raw, so the raw data is whatever text format the device sent.

At the bottom of the page, there are various converters. Each of these converters reprocesses the raw data into tabular data, which is more useful for analysis. These exist for two reasons: (1) convenience, so that you don't have to do basic reprocessing yourself, and (2) to preserve privacy. Researchers don't get access to raw data, only "privacy-preserving" data that comes through converters that are specifically allowed for that study.

When you go to a converter, it lets you scroll through the data on web pages. This is not very useful for analysis, but lets you quickly see recent activity. You can use the buttons at the top to download the data in various formats: csv, JSON, sqlite dumps, and more if requested.

All devices have a "Raw" converter which shows raw data. Other than that, converters depend on the device. There are usually some converters that produce meta-information, such as all the timestamps or data sizes. Then there are a bunch of converters that produce the data which is useful for analysis.

It is easy to add converters - send feedback.

For the full information on downloading data, see the ResearcherGuide.

Modifying data

Data can not be modified once it is uploaded. Analysis needs to take all possible wrong data into account. This is why we separate the data into primary/test devices and so on.

Modifying data is a dangerous game, and can easily increase complexity and make problems. With that being said, it is possible to modify data in exceptional cases. See the next section, though.

Deleting data

We would like a way for you to delete data yourself, but this is not yet implemented. Since it only makes sense to do this if it irreversable, there are many possibilities for problems. We haven't yet figured out the best way to do this automatically. we would welcome suggestions. Contact us.

Feedback

If you have questions regarding the system, let us know. You can send email, use the github issue tracker, or even directly improve these documents or submit a pull request.

Clone this wiki locally