Skip to content

Latest commit

 

History

History
100 lines (79 loc) · 4.1 KB

README.md

File metadata and controls

100 lines (79 loc) · 4.1 KB

rprocval

R badge Github badge
develoment badge

Introduction

Package currently in development

The goal of rprocval is to provide a series of queries to help identify potential errors in fisheries data. Most functions are designed to operate on relational databases with a structure consistent with the standard FishNet2 data model.

Installation

You can install rprocval like so:

devtools::install_github("HoldenJe/rprocval")

Bugs and Feature Requests

Please file a github issue if you encounter a bug or have an idea for a feature request. This approach provides a notification to the code maintainers and allows tracking of commits related to specific tasks.

How to Use

There are two types of function currently utilized within the package:

  1. Data structure queries (e.g. fn121_column_check, fn123_error_queries)
  2. Data quality queries (e.g. fn125_fl_gt_tl, fn125_vonb_check)

Data structure queries provide checks for orphaned records or inconsistencies between tables or conventions such as failing to convert a total catch weight (CATWT) of a net to kg. The data quality queries focus on individual fish attributes and generally use fitted models to identify outliers using a log ratio threshold between the observed and predicted values. The general naming convention of all functions is to start the function name with the primary table the function will check. Note however that the function may have to access data from another table to perform the tests.

The functions are designed to be run interactively (as opposed to a batch process). Tests are generally designed to be species specific and thus it is either necessary to run the tests on a subset of species specific data or preferably, split the data by species in to a list and use purrr::map to apply the function across all species.

myoutput <- FN125 %>%
  split(.$SPC) %>%
  map(fn125_tl_rwt_lm, makeplot = T) %>%
  map(fn125_fl_rwt_lm, makeplot = T)

This interactive approach allows users to correct errors in the data and then easily re-run the functions. For instance, an outlier may prevent one of the nls models from converging and thus identifying any outliers. Such outliers should be immediately apparent when the associated plot is viewed. This error should be corrected or removed and then it is likely that successive tests will provide meaningful results. Similarly, this approach allows users to interactively test methods of correcting the data by applying the correction within the R environment prior to making a change to the data record and then testing whether the applied changes result in records passing the necessary checks. This approach works well for errors in units (e.g. g vs kg as an expected unit) or a transcription error that resulted in FLEN values in the TLEN column (and thus FLEN > TLEN).

Most of the fn125_* functions have 2 optional arguments: makeplot and fail_criteria. The makeplot argument determines whether plots should be made as an output of the running the function. It is useful to view these plots but is not always necessary and can often slow the processing time down considerably. The fail_criteria has default values set for each specific test. If records that are suspected errors are not being flagged it is possible to provide a more restrictive (smaller value) criteria through the use of this argument. Details of the fail_criteria can be found on the help pages for each function.

FN125 Error Checking Cheatsheet

qid1_error = FLEN > TLEN
qid2_error = TLEN and FLEN don’t fit regression model
qid3_error = FLEN and RWT don’t fit regression model
qid4_error = TLEN and RWT don’t fit regression model
qid5_error = TLEN and AGE don’t fit VonB model
qid6_error = FLEN and AGE don’t fit VonB model