Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misuse of table function #27

Open
valentinitnelav opened this issue Sep 3, 2020 · 1 comment
Open

Misuse of table function #27

valentinitnelav opened this issue Sep 3, 2020 · 1 comment
Assignees
Labels
documentation Improvements or additions to documentation
Milestone

Comments

@valentinitnelav
Copy link
Owner

I should give a more realistic example in documentation about the fact that boot_specieslevel or boot_networklevel expect a list of one or more data frames of interactions. Each interaction (row in the data frame) must be repeated as many times as it was observed. E.g. if the interaction species_1 x species_2 was observed 5 times, then repeat that row 5 times within the data frame.

One misleading workflow in data preparation now is to build a web matrix with the table function from the row data, that most likely contains an enumeration of interactions. The table function will not consider the abundance of the interactions, unless they are repeated within the raw data. So, there is high risk to lose data. Then, once that web is build with the table function, the user goes to using web_matrix_to_df, which kind of completes a vicious data processing circle.

So, the user tends to build the web matrix from a data frame and then transform the matrix back into an expanded data frame. This is an unnecessary journey and I guess was inspired somehow from how I constructed the example with Safariland from bipartite. But a more real case is to take the raw data with the interactions, enumerate/explode the rows based on some abundance column and then that data is already good to use directly for boot_specieslevel or boot_networklevel. No need to use table, especially that creates a misleading way towards data loss.

So, try to make a more realistic simple usage example of boot_specieslevelandboot_networklevel, without needing to use web_matrix_to_df`, which seems to be reserved rather in rare cases. The user tends to have the raw data more as data frame from Excel than as a web matrix/community matrix.

@valentinitnelav valentinitnelav added the documentation Improvements or additions to documentation label Sep 3, 2020
@valentinitnelav valentinitnelav added this to the Documentation milestone Sep 3, 2020
@valentinitnelav valentinitnelav self-assigned this Sep 3, 2020
@valentinitnelav
Copy link
Owner Author

Also, this seems related to #7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant