Skip to content

Latest commit

 

History

History
142 lines (97 loc) · 4.89 KB

CONTRIBUTING.md

File metadata and controls

142 lines (97 loc) · 4.89 KB

Table of contents

Contributing to datasetinsights

If you are interested in contributing to datasetinsights, your contributions will fall into two categories:

  1. You want to propose a new models/datasets/evaluation metrics and implement it.
  2. You want to implement a feature or bug-fix for an outstanding issue.

Developing datasetinsights

Here are some steps to setup datasetinsights virtual environment with on your machine:

  1. Install poetry, git and pre-commit

  2. Create a virtual environment. We recommend using miniconda

conda create -n dins-dev python=3.7
conda activate dins-dev
  1. Clone a copy of datasetinsights from source:
git clone https://github.com/Unity-Technologies/datasetinsights.git
cd datasetinsights

Note: clone the repo from git@gitlab.internal.unity3d.com:machine-learning/thea.git before datasetinsights source are available on public github.

  1. Install datasetinsights in develop mode:
poetry install

This will symlink the Python files from the current local source tree into the installed virtual environment install. The develop mode also includes Python packages such as pytest and black.

  1. Install pre-commit hook to .git folder.
pre-commit install
# pre-commit installed at .git/hooks/pre-commit

Add new dependencies

Adding new Python dependencies to datasetinsights environment using poetry like:

poetry add numpy@^1.18.4

Make sure you only add the desired packages instead of adding all dependencies. Let package management system resolve for dependencies. See poetry add for detail instructions.

Codebase structure

The datasetinsights contains the following modules.

datasetinsights

  • commands This module contains the cli commands.

  • configs This module contains estimator configuration files.

  • datasets This module contains different datasets. The dataset classes contain knowledge on how the dataset should be loaded into memory.

  • estimators This module contain estimatos are used for training and evaluating models on the datasets.

  • evaluation_metrics This module contains metrics used by the different estimators and are specific in the estimator config file.

  • io This module contains functionality that relates to writing/downloading/uploading to/from different sources.

  • stats This module contains code for visualizing and gathering statistics on the dataset

Unit testing

We use pytest to run tests located under tests/. Run the entire test suite with

pytest

or run individual test files, like:

pytest tests/test_visual.py

for individual test suites.

Style Guide

We follow Black code style for this repository. The max line length is set at 80. We enforce this code style using Black to format Python code. In addition to Black, we use isort to sort Python imports.

Before submitting a pull request, run:

pre-commit run --all-files

Fix all issues that were highlighted by flake8. If you want to skip exceptions such as long url lines in docstring, add # noqa: E501 <describe reason> for the specific line violation. See this to learn more about how to ignore flake8 errors.

Some editors support automatically formatting on save. For example, in vscode

Writing documentation

Datasetinsights uses Google style for formatting docstrings. Length of line inside docstrings block must be limited to 80 characters with exceptions such as long urls or tables.

Building documentation

Follow instructions here.