Spyglass is an open-source framework for managing and analyzing data in neuroscience research. The Export feature allows users to generate scripts1 to recreate both their conda environmen and the database, as well as upload data to DANDI Archive.
This repository is intended to be used with the Docker to create and share a reproducible environment for replicating a paper's analyses.
- Pre-requisites:
make
anddocker
. - Register for Docker Hub and run
docker login
. - Clone this repository to your local machine.
- Copy
env.example
to.env
and edit the values. - Copy the paper's notebooks to
notebooks/
2. - Remove items from the
environment.yml
that require GPU support likejax
. - Run
make build
to build the docker image. - Navigate to
http://localhost:8888/lab
, using the paper ID as the password. - Test the notebooks.
- Run
make publish
to publish the image. - Share the image with collaborators, who can run
make run
to start the container and visit the same URL. They will need ...- The
.env
file you used. - The
docker-compose-collab.yml
file for building from the published images. - The
Makefile
for the command to run the container from the published images.
- The
Makefile
: Contains commands for building and publishing the docker image.copy_files
: Copies the exportsql
andyml
files to theexport_data/
directory.down
: Stops and removes existing docker containers.up
: Runsdown
, then starts the docker container.build
: Alias forup
.enter
: Enters the running docker container for debugging.
docker-compose.yml
: Defines the docker containers and volumes.db
: Service. MySQL database container.hub
: Service. Jupyter notebook server container.conda
: Volume. Cache of the hub's conda environment.db_data
: Volume. Cache of the database's data.
docker-compose-collab.yml
: Similar todocker-compose.yml
, but using thehub
image from Docker Hub. This file is intended for collaborators.Dockerfile
: Adds additional instructions to thehub
container.- Copies in datajoint and jupyter configuration files.
- Installs
git
for possible git installs in the conda environment. For a faster build time, remove this line if no such installs are needed. - Installs the paper's conda environment.
- Runs
entrypoint.py
to configure the datajoint connection.
env.example
: Example environment variables for the.env
file. Must be copied to.env
and edited.config
: Contains additional configuration files..datajoint_config.py
: Default configuration for the datajoint connection.entrypoint.py
: Edits the datajoint config based on environment variables.entrypoint_db.sh
: Loads exportedsql
files. Run my thedb
service.jupyter_server_config.py
: Configures the jupyter notebook server. - Sets the default kernel to the paper's conda environment. - Sets the server password.
The first time you run make build
, the docker image will be built from
scratch. This can take a while, depending on the size of the conda environment.
Subsequent builds will be faster, as docker will cache the layers.
To speed up the process, projects that do not use the position pipeline can
remove the line in Docker_hub.Dockerfile
that installs ffmpeg
and other
dependencies.
If your build is still slow, try removing unnecessary packages from your conda
environment.yml
file. Note that running make build
will copy the file from
it's original location.
This repository is intended for use in a secure environment. It is not intended for use in a production environment.
By default the jupyter notebook server password is the paper ID variable.
If you encounter any issues, please check the status of the docker containers
with docker ps -a
. This will show the status of containers db
and hub
.
If either is 'restarting', you can check the logs with docker logs <name>
.
If conda environment creation fails, you may need to remove items from
the environment.yml
that require GPU support like jax
.
By default, the Makefile
will copy the sql
files to the export_data/
and
run the following commands on each file:
sed -i 's/ DEFAULT CHARSET=[^ ]\w*//g' _Populate_YourPaper.sql
sed -i 's/ DEFAULT COLLATE [^ ]\w*//g' _Populate_YourPaper.sql
This gets ahead of (a) OperationalError
when trying to import a table or (b)
SQL ERROR 3780 (HY000)
in the docker logs.
What does this do?
These sed
commands remove encoding specifications from the sql
file(s).
CREATE TABLE your_table (
...
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE swedish_latin=ci COMMENT='X';
Will become:
CREATE TABLE your_table (
...
) ENGINE=InnoDB COMMENT='X';
The line with ENGINE=InnoDB
should always end in ;
. It may or may not have
a COMMENT
field.
Spyglass instances declared before version 0.4.3 permit longer keys than MySQL
defaults would permit. This may cause the import of downstream tables to error,
reporing Excessive key length
. By default, the Makefile
will run the
following command on each file, mirroring the adjustments from
PR #664.
sed -i -e \
's/ `nwb_file_name` varchar(255)/ `nwb_file_name` varchar(64)/g' \
's/ `analysis_file_name` varchar(255)/ `analysis_file_name` varchar(64)/g' \
's/ `interval_list_name` varchar(200)/ `interval_list_name` varchar(170)/g' \
's/ `position_info_param_name` varchar(80)/ `position_info_param_name` varchar(32)/g' \
's/ `mark_param_name` varchar(80)/ `mark_param_name` varchar(32)/g' \
's/ `artifact_removed_interval_list_name` varchar(200)/ `artifact_removed_interval_list_name` varchar(128)/g' \
's/ `metric_params_name` varchar(200)/ `metric_params_name` varchar(64)/g' \
's/ `auto_curation_params_name` varchar(200)/ `auto_curation_params_name` varchar(36)/g' \
's/ `sort_interval_name` varchar(200)/ `sort_interval_name` varchar(64)/g' \
's/ `preproc_params_name` varchar(200)/ `preproc_params_name` varchar(32)/g' \
's/ `sorter` varchar(200)/ `sorter` varchar(32)/g' \
's/ `sorter_params_name` varchar(200)/ `sorter_params_name` varchar(64)/g' _Populate_YourPaper.sql
This may result in being unable to import keys longer than the specified length.
If you encounter this issue, you may need to adjust the sed
commands in
the Makefile
to match the keys in your sql
files.
The default hub container does not have sudo access. If you need to install additional package or debug within the container, you may wish do the following:
Admin within the container
Add sudo for the default user, mysql credentials to the Dockerfile
, and add
mysql-client
to allow command line access to the database.
USER root
# Allow sudo
RUN echo "jovyan:jovyanpassword" | chpasswd
RUN echo "jovyan ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/jovyan
# Add mysql credentials - Vars must also be added to docker-compose.yml
ARG MYSQL_HOST
ARG MYSQL_USER
ARG MYSQL_ROOT_PASSWORD
# Add default mysql credentials
RUN echo -e "\
[client]\n\
host=${MYSQL_HOST}\n\
user=${MYSQL_USER}\n\
password=${MYSQL_ROOT_PASSWORD}\n\n\
[mysqld]\n\
character-set-server = latin1\n\
collation-server = latin1_swedish_ci" > ${HOME}/.my.cnf
RUN apt update && apt install mysql-client -y
USER ${NB_UID}
Each ARG
item must also be added to the docker-compose.yml
file under the
hub
service:
build:
context: .
dockerfile: Dockerfile
args:
MYSQL_HOST: db
MYSQL_USER: root
MYSQL_ROOT_PASSWORD: ${MYSQL_ROOT_PASSWORD}
And add GRANT_SUDO=yes
to the .env
file.
Footnotes
-
The
.sh
scripts generated by Spyglass must first be run by a database administrator to create the database and tables. The resulting.sql
will then be used to populate the Docker database. ↩ -
If your paper depends on a specific version of Spyglass or additional custom packages, please link to these in your notebooks, and ensure they are included in the
environment.yml
file in the export directory. You can find the version of Spyglass at the top of any.sql
file, and find the link in the list of Spyglass tags. ↩