Outreachy Internship Project Discussion | December 2021 round #8
Replies: 2 comments 2 replies
-
That's a good start - let's clearly separate two workstreams, for two mentees:
The first is building a framework that improves the developer/researcher experience, the second is using these tools to use for open source data science tasks.. What do you think a good way to maintain this topic is - shall we both edit the top level post, so that it always reflects the most up to date information, and use the comments for discussion? There are some good guidelines for mentors at the Outreachy FAQ https://www.outreachy.org/mentor/mentor-faq/ - you might like to read the sections on defining a project timeline and what do consider during the internship. It's a good perspective. |
Beta Was this translation helpful? Give feedback.
-
Hi @aornugent For the last pointer Here is a talk at PyCon US that highlights the potential capabilities of Kedro and we can potentially use it to standardize all of our research and data work while creating effective deployments. |
Beta Was this translation helpful? Give feedback.
-
Background
Outreachy provides paid, remote, three-month internships to under-represented contributors who would like to make a difference by contributing to open source projects. The aim of this internship is to support diversity in Free and Open Source Software (FOSS) and uplift the under-represented sections of society.
This term's Outreachy internship at moja global is focused on the development of FLINT module to account for the dynamics of carbon from forest living biomass into dead organic matter pools. Interns will work with moja global ecosystem modeling experts to work on forestry dynamics, data collection, parameter calibration, and more.
Project deliverables
Workstreams
1. Assist Reporting Tool maintainers to clean up the project and make it deployment ready for the further use-case
After a lot of development and corresponding documentation, we would like to make the first versioned release of the Reporting Tool. However, we still need some scoped work to be done, before moving towards the release. The intern will be working with the Reporting Tool maintainers to better understand the software, its architecture, and the corresponding implementation. It would further provide them the opportunity to sketch the CI matrix that would be required in the next milestone.
After an extensive study, the intern would work towards cleaning up the project files, refactoring the code, squashing potential bugs being discovered during the process and writing scripts to automate most of the manual work. The intern will also work towards writing a basic tutorial while working with the Reporting tool and publishing it on the community website with the help of the Documentation working group.
2. Implement Continuous Integration pipelines using GitHub Actions for the FLINT Reporting Tool.
The goal of this project is to design the CI/CD framework for the Reporting Tool and to ensure reproducible research. The build and test pipeline would be considered from two points of view:
This would allow the other intern to scope the work towards running the FLINT, getting the output from the Reporting Tool, and refining it as per the UNFCCC standards, before publishing the same. It should also take into account the full support requirements: A full range of architectures, OSes, and packaging technologies, as well as development requirements like code coverage and documentation previews.
The CI will be required to build and test the following components of the Reporting Tool:
The CD part of the milestone will be focused on packaging the reporting tool to be made ready for deployment through Docker Compose (similar to FLINT-UI). This milestone will be quite focussed on interacting with Reporting Tool maintainers to understand the caveats of the project structure and making necessary pipelines for building and shipping with confidence.
The process itself should be documented in a clear manner to provide a roadmap for future projects under the Moja global umbrella wishing to make similar improvements. This does not have to be exhaustive (or exhausting). It should just be a high-level overview, key decisions made, and relevant links/documentation.
3. Strategize and centralize Docker images across the community for standardization and uniformity
FLINT provides Docker images to deploy and test on any container orchestrator supporting either Linux or Windows/macOS. They can be found on Docker Hub and we can easily pull them. In 2020, however, Docker Hub announced changes to their image retention & rate limiting. The paid plan also restricts us from centralizing all the Docker images under a common organization/team. We would be working on centralizing and migrating the Docker images across the community to GitHub Container Registry.
Currently, the Docker images are scattered as well. Some of them are available on individual accounts of @Tlazypanda, @arnav-t, @shubhamkarande13. We also lack proper documentation on how these Docker images are maintained and updated. This milestone would require the intern to strategize a plan to migrate Docker images from Docker Hub to GCR and collaborate with the TSC and the CI/CD working group while documenting the efforts.
4. Implement a framework for reproducible research
This milestone would concern the development of a workflow to ensure long-term reproducibility of Python-based data analysis on moja global datasets, being implemented by the first intern. The workflow should leverage established tools and practices from software engineering (examples: Binder, Colab, Kedro, Data Version Control), and should leverage tools like Docker, Jupyter Notebooks, Markdown to ensure integration of version management, dynamic report generation with cross-platform reproducibility.
The main goals of the framework should be:
During this milestone, the intern would be working their co-intern to demonstrate practical examples from moja global datasets and combine containerization, dependency management, version control and document generation to increase scientific productivity and make reuse of code and data possible.
Skills required
The skills required for the project encompass but are not limited to:
Timeline
Mentors
Notes
This is a rough proposal of the plan from the expected intern. We would love to take the community feedback, opinions from the prospective interns and fine-tune the plan before having a go at the expected deliverables.
Beta Was this translation helpful? Give feedback.
All reactions