-
Notifications
You must be signed in to change notification settings - Fork 12
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #70 from worldbank/good-practices
Adding recommended data science workflows
- Loading branch information
Showing
4 changed files
with
39 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
# Folder Structure and Naming Conventions for Project Setup | ||
|
||
Establishing a clear and consistent folder structure with standardized naming conventions is essential for efficient project setup and collaboration. This guide outlines best practices for organizing files and directories, helping team members quickly understand the layout, find key resources, and maintain consistency across projects. | ||
|
||
Following these conventions will not only improve productivity but also ensure that new team members can easily navigate projects, reducing the learning curve and minimizing errors. These practices are designed to adapt to the needs of our data science workflows while maintaining flexibility for project-specific requirements. | ||
|
||
## Data Science Project Folder Setup | ||
Our data science project folder setup follows [this standardized GitHub template](https://github.com/worldbank/template), ensuring consistency and streamlined collaboration across projects. This template provides a well-organized structure with predefined folders for data, notebooks, src, and docs. Please navigate to the template, read and understand the template and its philosophy. | ||
- **data**. This folder holds all the data utilized in the project. When appropriate, we recommend distinguishing between ```raw-data```, which serves as input for data analysis processes, and ```derived-datasets```, which are outputs generated after analysis. Additionally, within this ```data``` folder, we suggest organizing datasets by thematic folders. For example, folders could be named: conflict, admin-boundaries, population, and so forth. | ||
- **notebooks**.This folder contains all Jupyter notebooks for the project. As noted above, create subfolders within this directory with descriptive names, such as agriculture, conflict, etc., to improve organization. If your notebooks output datasets, ensure these are saved in the derived-data folder rather than within the notebooks folder. Also, note that scripts (e.g., Python ```.py``` files) belong in the ```src``` folder. | ||
- **docs**. The docs folder primarily holds documentation used for building Jupyter Books. However, any other relevant project documentation can also be placed in this folder. | ||
- **src**. Data processing scripts such as Python scripts sit here. Please refer to documentation on [the standardized GitHub template](https://github.com/worldbank/template) for futher details. | ||
|
||
## Naming Conventions | ||
since there are many things which can be named, here provide general guidelines. | ||
- **Lower case Vs. Upper Case.** Please use all lower cases instead of Camel case or other variations. | ||
- **Underscore Vs. hyphen.** Except for cases where use of hyphen is not allowed (e.g., Python script names), all folder and file names should be separated by hyphen. For example, ```damage-assessment``` as opposed to ```DamageAssessment``` or ```Damage-Assessment```. | ||
- **Theme based naming.** As much as possible, ensure names are informative and match with topic/theme. For example, in the data folder, one can have directory for ```admin-boundaries``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# Good Practices for Data Science Projects | ||
In the following series of documents, we share what we consider as best practices for executing data science projects. Its worth noting that these best practices are designed with the work of the <span style="color:#3EACAD">Data Lab</span> in mind. As such, they dont fully apply to all data science projects although we belive they are still extremely useful. These will cover the following topics. | ||
|
||
1.**GitHub and Git workflows and practices** | ||
|
||
2.**Communicating with Clients/Audiences** | ||
|
||
3.**Documenting Analytics and Notebook Styling** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# Good Practices for Data Science Projects | ||
In the following series of documents, we share what we consider as best practices for executing data science projects. Its worth noting that these best practices are designed with the work of the <span style="color:#3EACAD">Data Lab</span> in mind. As such, they dont fully apply to all data science projects although we belive they are still extremely useful. These will cover the following topics. | ||
|
||
1.**GitHub and Git workflows and practices** | ||
|
||
2.**Communicating with Clients/Audiences** | ||
|
||
3.**Documenting Analytics and Notebook Styling** |