Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing test statistics based on bootstrapnet confidence intervals #35

Open
danielsandacz opened this issue Oct 8, 2021 · 6 comments
Labels
stats Questions regarding your analysis

Comments

@danielsandacz
Copy link

danielsandacz commented Oct 8, 2021

The example below is code for comparing robustness between network plots in a nature preserve and network plots on adjacent land of neighbors. In this situation the confidence intervals provide evidence that the neighbor plots are more robust than the preserve plots, particularly when considering the higher level (pollinators). I am curious if I could take this a step further and implement a test statistic based on the confidence intervals generated or another type of statistical test. Sample size is a limitation for the specific example I provide, but I am interested in any recommendations based on the example provided or future analysis where I compare the preserve plots in 2021 (n=32) to preserve plots from 2016 (n=32).

Where "preserve" (n=32) and "neighbor" (n=8) are matrices of plant-pollinator interactions across different plots

robust.compare <- list(Preserve.Plots = preserve, Neighbor.Plots = neighbor) %>%
lapply(web_matrix_to_df) %>%
boot_networklevel(col_lower = "lower", 
                    col_higher = "higher", 
                    index = "robustness",
                    level = "both",
                    start = 50,
                    step = 20,
                    n_boot = 50,
                    n_cpu = 3)

robust_graph<- gg_networklevel(robust.compare)
robust_graph

Robustness HL
Robustness LL

@valentinitnelav valentinitnelav added question Further information is requested stats Questions regarding your analysis and removed question Further information is requested labels Oct 11, 2021
@valentinitnelav
Copy link
Owner

valentinitnelav commented Oct 11, 2021

Hi Dan,

Thanks for opening the issue here. To my understanding, you want to compare the preserve plots in 2021 (n=32) to preserve plots from 2016 (n=32).

If you manage to get network metrics for each plot (I understand that you have a network for each plot) then you can try a t-test or a permutation test to check for significant differences between the two distributions of network metrics (each of n = 32, one for 2021, and the other for 2016).

I can give you some R code for running a permutation test applicable to two distributions if you need that (I can post it in a comment below). Note that, the non-parametric approach (permutation test) is a bit more conservative (doesn't detect a difference that "easily" as a t-test), but doesn't require the assumption of normality as the t-test.

One risk in your design is pseudoreplication. And that, in your case, is indicated by how close/far the plots are. For example, you or other experts in ecology would consider that a distance of 100 m (not sure what your distance is, so I am just picking a random number) between the plots is big enough to not have pseudoreplication? For example, is 100 m a big enough distance so that the insect species from one plot do not create interactions with the plants from another plot? In other words, so that you do not risk sampling the same species again because of the short distance? This applies to the plants as well.

You need to be far enough between plots so that you sample the entire diversity of that ecosystem/landscape which you analyze or compare with another (or across time as in your case). Your sample of plots needs to be as representative as possible, capturing the natural diversity without the risk of being too dense spatially. If is too dense spatially, then you need to pool together plots and that will unavoidably mean losing from the potential sample size (n = 32 can become n = 10 or so). I do not know exactly what an optimal distance is, for example, some of my colleagues who have more ecological expertise told me that bumblebees can fly up to 5 km during a pollination day. I think you can make a nice discussion about this in your analysis nevertheless.

If the pseudoreplication is indeed an issue (most likely yes), then you need to pool all those 32 networks in one big network for each time frame, one for 2021 and one for 2016 as you did in the figures above (I presume). Those graphs show that you reach saturation, that is, you sampled enough so that you can capture the network diversity of each period. If the confidence intervals overlap (case of LL - lower level species, the plants), then you do not have a significant difference, which indicates that the plant community didn't change much across time. The opposite applies for the insect species (higher trophic level), where you have significant differences when looking at "robustness". You might want to check other network and species level metrics from the bipartite package.

Hope that what I wrote here makes some sense. Feel free to continue with your thoughts as I also like to understand better these ecological aspects.

Best,
Valentin

@danielsandacz
Copy link
Author

Thank you Valentin, your commentary is very helpful! I would appreciate the code for permutation tests in the case that my data neither fit assumptions of normality nor are improved by transformation. If not for this exact application, it could prove useful for other analyses within my project.

The spatial scale is a concern as my study site is only approximately 100m x 175m. I was primarily expecting pseudoreplication with this design but I am doing a review of the literature to find some indications that using my plots as true replicates could be appropriate. If my question was examining microsite heterogeneity and how that influences plant-pollinator interactions I could see an argument supported by previous work. For my question and purposes I don't think that is an appropriate application, but please let me if you come across any work that is contrary to this assumption.

@valentinitnelav
Copy link
Owner

Here is an example with R code regarding the permutation test - link.

I do think that pseudoreplication is an issue at such a small scale and very well pointed that with the scale one changes the question.

@valentinitnelav
Copy link
Owner

Hi @danielsandacz ,

I am curious how you progressed with the analysis. I want to collect some study cases where bootstrapnet package was used.

Best,
Valentin

@danielsandacz
Copy link
Author

danielsandacz commented Feb 22, 2022 via email

@valentinitnelav
Copy link
Owner

valentinitnelav commented Feb 22, 2022

@danielsandacz , sounds cool! When you are done with your thesis, or whenever you have the time would be great if you can write about your analysis as a study case of using bootstrapnet in combination with other more popular methods (null model for bipartite networks). I am curious to see how the community of pollination ecologists makes use of it. Or if you published any scientific paper making use of bootstrapnet feel free to let me know and I can list it in the readme.

If you want, you can write the study case in the form of a reproducible RMarkdown document. You could also post that in RPubs as I did for the permutation example above. Is a convenient way of sharing analyses as compiled html documents.

I leave this issue open because in our group we still think about an analysis pipeline based on the bootstrapped values. You might want to check also the work of our colleague, Elena, where we tried to tackle this. Motivans Švara, Elena, et al. "Effects of different types of low‐intensity management on plant‐pollinator interactions in Estonian grasslands." Ecology and evolution 11.23 (2021): 16909-16926. Link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stats Questions regarding your analysis
Projects
None yet
Development

No branches or pull requests

2 participants