potential data biases #9

auremoser · 2016-05-29T04:43:51Z

A interesting added feature could be functionality for exploring potential data biases of analytical datasets.

taxonomic biases (ie calculate taxonomic distinctness of subsets of complete case species for different variable combinations)
data gap biases
basic covariance structure between variables. Could be used to relate to data gaps to understand how missing values might affect results.

annakrystalli · 2016-09-18T08:47:04Z

What features, tools or functionality would be most helpful in probing your datasets and assessing for biases?

Missing data in trait databases is a persistent problem affecting analyses. The most common approach is to delete missing cases but this can introduce additional biases and reduce statistical power of analyses and affect model selection and inference.

In these cases imputation might be more appropriate and there are a number of approaches suggested, making use of both relationships between traits as well as taxonomic relationships.

proposed tools

taxonomic biases:

detecting taxonomic bias (data not missing taxonomically at random) in trait data availability:
- calculating taxonomic distinctness of trait data subsamples? (using vegan::taxondive())
- calculating phylogenetic representativeness?

imputation:

Currently exploring use of missForest and Rphylopars to impute missing data. A framework for testing out different imputation approaches would probably work best.

Crossvalidated imputation error can also be used to assess contribution of individual to traits to overall imputation error.

Model/trait selection:

Can we develop a framework for assessing trait usability in analysis and guide variable selection? Ie establish a reasoning behind excluding traits on the grounds of biases in data availability?

Your input is needed!

Feel free to leave suggestions on formalising such a process, useful tools and approaches or get in touch if you have an idea for a feature to add.

auremoser · 2016-09-27T20:42:04Z

Cool. I'd also add these resources:

Lots of these are for newsrooms but I thought they might be useful for everything.

annakrystalli · 2016-09-28T09:16:01Z

Thanks for these! Going to also add them to #10 as a lot refers to basic data quality checks.

annakrystalli added enhancement consultation labels Sep 18, 2016

annakrystalli added the help wanted label Sep 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

potential data biases #9

potential data biases #9

auremoser commented May 29, 2016

annakrystalli commented Sep 18, 2016 •

edited

Loading

auremoser commented Sep 27, 2016

annakrystalli commented Sep 28, 2016

potential data biases #9

potential data biases #9

Comments

auremoser commented May 29, 2016

annakrystalli commented Sep 18, 2016 • edited Loading

What features, tools or functionality would be most helpful in probing your datasets and assessing for biases?

proposed tools

taxonomic biases:

imputation:

Model/trait selection:

Your input is needed!

auremoser commented Sep 27, 2016

annakrystalli commented Sep 28, 2016

annakrystalli commented Sep 18, 2016 •

edited

Loading