Welcome to the ogs-data-exploration repository π
This repository is dedicated to the cartography and exploration of greenhouse gas (GHG) emissions data at various geographical scales. Two main goals are defined for this repository:
- Documentation: Build a knowledge base on GHG emissions data (collection methods, concepts, providers...)
- Data Exploration: Explore GHG emissions data at various scales in order ot identify relevant data sources, build a complete data catalog and define a data model enabling the integration of explored data.
The goal of documentation tasks consists of identifying relevant GHG data and documenting associated knowledge and concepts for understanding its content.
- List of data sources with respective links in this wiki data-source page
- Feeding wiki with references, knowledge and concepts related to GHG emissions data
The goal of data exploration tasks consists of analyzing the different data sources identified in the documentation tasks in order to provide insights on GHG emissions data.
The repository structure is splitted into 3 main folders:
- data: contains raw data to explore or links to raw data if the dataset size is bigger than 100MB.
- notebooks: contains data exploration notebooks (in R or Python)
- data-catalog: contains the json files describing data sources attributes. These json files will be used to build a crelevant data catalog.
We proposed to have a first split of each folder by topic (e.g. ghg-emissions, socio-economic, geo-referential...) and then a second split by data source short name (e.g. ademe, wri (World Resources Institute), eea (European Environmental Agency)...).
Every data source folder should contains a README file for describing the data source and data provider and files containing datasets.
project_name/
βββ README.md # overview of the project
βββ data/ # notebooks for exploring raw data
βββ ghg-emissions
βββ source 1 # folder for data source 1
βββ README.md # Notebook for exploring data source 1
βββ dataset # Notebook for exploring data source
βββ source 2 # folder for data source 2
βββ README.md1 # Notebook for exploring data source 1
βββ dataset # Notebook for exploring data source 2
βββ socio-economic
βββ geo-ref
βββ notebooks/ # notebooks for exploring raw data
βββ ghg-emissions
βββ source 1 # folder for data source 1
βββ Notebook 1 # Notebook for exploring data source 1
βββ Notebook 2 # Notebook for exploring data source
βββ source 2 # folder for data source 2
βββ Notebook 1 # Notebook for exploring data source 1
βββ Notebook 2 # Notebook for exploring data source 2
βββ data-catalog/ # notebooks for exploring raw data
βββ ghg-emissions
βββ source 1 # folder for data source 1
βββ data-desc.json # json file describing data source 1
βββ source 1 # folder for data source 2
βββ data-desc.json # json file describing data source 2
Every contributor select one or two data sources to work on. The list of datasets are provided in this wiki page. NB: Some data sources provide several GHG emissions datasets such as the World Resources Institute.
Once you have selected your own datasource, you should clone the repository in your local machine (more informations about git commands are available at the end of this tutorial).
We decided, to make a branch by datasource in order to facilitate independant work at the beginning of this project. So you should have a specific branch for your data source.
In order to work on your branch, do not forget to switch to your branch directly before working on your data (git checkout NameOfYourBranch
)
Once you are in your branch, you can start by adding information in the readme file of your datasource.
Now we can start the exploratory analysis, go to notebook/ghg-emissions/name of your data source
and create your first notebook there.
After making some analysis you can commit the changes and push it to the remote repository. Do not forget to put the issue number correponding to the data exploratory task in your commit message (git commit -m "your message #x"
).
After pushing your changes, you shoudl go to Github, select your branch and make a pull request to merge your chnages made in your branch with the main. When you make pull request, you can assign the pull request to yourself, define a reviewer, attribute the pull request to a project and milestones. In order to follow better the backlog evolutio, do not forget to link the pull request to the issue you are working on.
You will receive a message of acceptance when your pull request is merged.
We want to build a data catalog making the compaison between different data sources easy. Hence, we need some information of data sources you habe just explored!
We ask you to fill a json file containing main dataset atttributes. Once filled, the json file should be stored in data-catalog/ghg-emissions/data source/data-desc.json
Here is an exemple of data desription:
{
"Version": "2021-02-04",
"DatasourceAttributes": [
{
"DataSourceStorageName": "ademe",
"Topics": ["GHG emissions"],
"DataProvider": {
"DataProviderName": "Lou Dupont",
"DataProviderLink": "https://www.data.gouv.fr/fr/datasets/bilans-demissions-de-ges-publies-sur-le-site-de-lademe-1/",
"DataProviderDesc": "NA"
},
"DataSource": {
"DataSourceName": "BEGES",
"DataSourceLink": "https://www.bilans-ges.ademe.fr/fr/bilanenligne/bilans/index/siGras/0",
"DataSourceOrganism": "ADEME",
"DataSourceDesc": "NA",
"DataFormat": "csv",
"DataAccess": "download"
},
"Coverage": {
"SpatialCoverage":{
"MainCoverage": "France",
"NumberOfSpatialEntities": 238,
"ListOfSpatialEntities": []
},
"TemporalCoverage": {
"MinDate": 2009,
"MaxDate": 2020,
}
},
"Resolution": {
"SpatialResolution": ["CommunautΓ© Urbaine", "CommunautΓ© d'agglomΓ©ration","CommunautΓ© de Commune",
"Communes", "DΓ©partements","MΓ©tropole","RΓ©gions"],
"TemporalResolution": "year",
},
"Gases": {
"Included": "All GHG gases",
"FilterByGase": "False"
},
"Scopes": {
"Included": ["Scope 1", "Scope 2", "Scope 3"],
"FilterByScope": "True"
},
"Sectors": {
"Included": "All sectors",
"FilterBySector": "False"
},
"Protocol": {
"ProtocolName": "Base Carbone",
"ProtocolLink": "https://www.bilans-ges.ademe.fr/fr/accueil/contenu/index/page/decouverte/siGras/1"
},
"EstimationMethod": {
"MethodType": "Bottom-up",
"MethodDescription": "https://www.bilans-ges.ademe.fr/fr/accueil/contenu/index/page/decouverte/siGras/1"
}
}
]
}
Notebooks (Python or R) providing exploratory analysis of raw data
- Feeding a Data Catalog with information about analyzed data stored in json files.
- Selection of datasets to integrate in OGS platform
- Identification of potential transformation necessary for normalizing each data source
We use Github Project to manage backlog. Every task is defined as an issue. Every issue is assigned to one contributor and belong to a milestone and a project.
We have 4 stages:
- Backlog: tasks that are identified but we do not know when we will work on
- To do: Task integrated for the current sprint
- In progress: Tasks with contributors working on
- Done: Finished tasks
When we commit changes and make pull request, we should specify the issue number in order to automatize issue stage management.
- Install Git
- Using GIT CMD go to the folder where you want to clone the repository (cd file/file)
- Clone the repository
git clone https://github.com/OpenGeoScales/ogs-data-exploration.git
- Using GIT CMD go to the cloned repository (cd file/file)
cd ogs-data-exploration
- By default you are working in the main branch. Change the branch and select the branch associated with the dataset you are working on.
git branch 'name of the branch' #If the branch does not already exist, create a new branch named "name of the branch"
git checkout 'name of the branch' #Switch to "new-branch"
For example to switch to the branch 'ademe'
git checkout ademe
- Make changes: add notebooks, add data, change readme...
- Track all the changes
git add .
- Or Track specific files
git add data-catalog/ademe/ademe-data-desc.md notebooks/ademe/ademe-notebook.md
- Commit the changes by specifying a message and the issue number you are working on after #
git commit -m "my message #1"
- Push the changes in remote repo by specifying the branch name you want to push on
git push origin ademe
- When you want to merge changes, go Github, select the branch you changes, and make pull request by assigning a reviewer.
- When the reviewer accept your pull request, your changes will be merged with the main!