-
Notifications
You must be signed in to change notification settings - Fork 8
Implementing biodiversity data checks for the bdchecks package
bdchecks is an infrastructure for performing, filtering and managing various biodiversity data checks using R. Data checks are a key to promoting biodiversity data quality. bdchecks offers various features for different types of R users:
- An interactive and user-friendly Shiny app for inexperienced R users.
- Full command line functionality for more experienced R users.
- Advanced R users can easily edit, add and manage their own collection of data checks, using one single YAML file and only two supporting R functions.
bdchecks (available on CRAN) is part of The bdverse infrastructure and is a dependency for another bdverse package - bdclean.
Our main mission is to successfully implement all core suite of tests and assertions being developed by TDWG’s Biodiversity Data Quality ‘Task Group 2: Data Quality Tests and Assertions’. Though bdchecks core is designed to match the test structure, developing and maintaining complete synchronization will be challenging.
Your coding project key points:
- Get familiar with the bdchecks package and it’s data checks infrastructure (YAML file incorporation, dataCheck class)
- Construct and test as many data checks as possible
- Implement a report that lists unsuccessful data checks and describes the errors Implement analysis reproducibility
R and shiny.
Advantage: experience in working with biodiversity big-data.
Improving the quality of biodiversity research, in some measure, is based on improving user-level data cleaning tools and skills. Adopting a more comprehensive approach for incorporating data cleaning as part of data analysis will not only improve the quality of biodiversity data, but will impose a more appropriate usage of such data.
Students, please contact mentors below after completing at least one of the tests below.
-
Tomer Gueta tomer.gu@gmail.com is leading the bdverse project. He is a postdoctoral fellow at the Faculty of Civil and Environmental Engineering at the Technion, working with Prof. Yohay Carmel. His research deals with developing tools and methodologies for data-intensive biodiversity research. During the last two years, Tomer served as a GSoC mentor with the R project organization.
-
Thiloshon Nagarajah thiloshon@gmail.com is a key member in bdverse development team. He was past GSoC and GCI student for Fedora Project, Sahana Foundation and R Language.
-
Vijay Barve vijay.barve@gmail.com is the author and maintainer of bdvis and a key member in bdverse development team. Vijay is a biodiversity data scientist who has been a GSoC student and mentor since 2012 with the R project organization. Vijay has contributed to several packages on CRAN.
Students, please do one or more of the following tests before contacting the mentors.
- Medium: Implement already existing data check (ie., improve existing data check function) and import check into R using bdchecks yaml file. Provide benchmarks for performance improvements.
- Hard: Implement non-existing data check, create an entry in dataChecks.yaml for it and import it into R using.
- Hard: Implement code tests using testthat package for any data check (or multiple data checks).
Students, please post a link to your test results here in the format: Name - Email - University - Link to solutions