-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Submission: BaseSet #359
Comments
Hi @llrs! 👋 I'll be the editor handling your review. I'm starting the inital editors' checks shortly. Will check back in when they are done. 😊 |
Hello again @llrs! Thanks for your patience with the editors checks. I've been traveling but I've also been having some issues with the checks which I'm having to seek some help with. Will be with you shortly, if not with the full checks, at least with those that I am able to perform successfully. |
Editor checks:
Editor commentsInstallation issuesWhen I try to do a full install and build vignettes, some suggested bioconductor dependencies are not installing and causing errors when attempting to build the vignettes. devtools::install_github("llrs/BaseSet", dependencies = T, build_vignettes = T)
Reviewevers as well as potential future collaborators that will need to run eg: BiocManager::install("BiocStyle")
BiocManager::install("org.Hs.eg.db", type = "source")
BiocManager::install("GO.db", type = "source")
BiocManager::install("reactome.db", type = "source")
# test that package install and docs build successfully
devtools::install_github("llrs/BaseSet", dependencies = T, build_vignettes = T, force = T) You can check out more of the adventures I had troubleshooting in this issue comment for more details. If you have some extra insight (as I imagine you are more familiar with bioconductor) to add to the discussion it would be more than welcome!
|
Many thanks for your review @annakrystalli
I'll modify the package to address your points by the beginning of next week I'll have them addressed. I'll let you know when I updated the package. |
Hello again @llrs, Aha, it seems I have been trying to install the package the wrong way all this time, by using However, when I try to install using I'm really intrigued by the fact that the packages builds successfully for you on TRAVIS. Looking at your |
Some of this errors might be due to a failed built on Bioconductor, they recently made some changes on the checks performed on packages. Could you please post the error you got? On travis I am only testing on bioc-devel, but on my computer I am using R 3.6.1 with Bioconductor 3.10 and it builds successfully. I'm not sure this difference is the source of the problems you found. However, I'll add the build and test for r release. |
So the errors I am getting are the same as I was previously with using BiocManager::install("llrs/BaseSet", dependencies = TRUE, build_vignettes = TRUE) E creating vignettes (8.6s)
--- re-building ‘advanced.Rmd’ using rmarkdown
Attaching package: 'dplyr'
.
.
.
Quitting from lines 49-56 (advanced.Rmd)
Error: processing vignette 'advanced.Rmd' failed with diagnostics:
there is no package called 'GO.db'
--- failed re-building ‘advanced.Rmd’
--- re-building ‘basic.Rmd’ using rmarkdown
--- finished re-building ‘basic.Rmd’
--- re-building ‘fuzzy.Rmd’ using rmarkdown
--- finished re-building ‘fuzzy.Rmd’
SUMMARY: processing the following file failed:
‘advanced.Rmd’
Error: Vignette re-building failed.
Execution halted
Error: Failed to install 'BaseSet' from GitHub:
System command error, exit status: 1, stdout + stderr (last 10 lines):
E> --- finished re-building ‘basic.Rmd’
E>
E> --- re-building ‘fuzzy.Rmd’ using rmarkdown
E> --- finished re-building ‘fuzzy.Rmd’
E>
E> SUMMARY: processing the following file failed:
E> ‘advanced.Rmd’
E>
E> Error: Vignette re-building failed.
E> Execution halted The reason this would not fail on your system is because you already have all the necessary package installed, but reviewers or new contributors won't necessarily. The reason I think the travis tests are passing is because of you are testing on Out of interest, you could try removing > session_info()
─ Session info ───────────────────────────────────────────────────────────────────────
setting value
version R version 3.6.0 (2019-04-26)
os macOS Mojave 10.14.3
system x86_64, darwin15.6.0
ui RStudio
language (EN)
collate en_GB.UTF-8
ctype en_GB.UTF-8
tz America/Los_Angeles
date 2020-01-30
─ Packages ───────────────────────────────────────────────────────────────────────────
package * version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.0)
BiocManager 1.30.10 2019-11-16 [1] CRAN (R 3.6.0)
callr 3.4.1 2020-01-24 [1] CRAN (R 3.6.0)
cli 2.0.1 2020-01-08 [1] CRAN (R 3.6.0)
clipr 0.7.0 2019-07-23 [1] CRAN (R 3.6.0)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
curl 4.3 2019-12-02 [1] CRAN (R 3.6.0)
desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0)
devtools * 2.2.1 2019-09-24 [1] CRAN (R 3.6.0)
digest 0.6.23 2019-11-23 [1] CRAN (R 3.6.0)
ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.0)
evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0)
fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.0)
fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0)
glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.0)
htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.0)
knitr 1.27 2020-01-16 [1] CRAN (R 3.6.0)
magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
packrat 0.5.0 2018-11-14 [1] CRAN (R 3.6.0)
pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 3.6.0)
pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.6.0)
processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.0)
ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0)
R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.0)
Rcpp 1.0.3 2019-11-08 [1] CRAN (R 3.6.0)
remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.0)
reprex * 0.3.0 2019-05-16 [1] CRAN (R 3.6.0)
rlang 0.4.3 2020-01-24 [1] CRAN (R 3.6.0)
rmarkdown 2.1 2020-01-20 [1] CRAN (R 3.6.0)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0)
rstudioapi 0.10.0-9000 2019-05-30 [1] Github (rstudio/rstudioapi@31d1afa)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
testthat * 2.3.1 2019-12-01 [1] CRAN (R 3.6.0)
usethis * 1.5.1.9000 2020-01-29 [1] Github (r-lib/usethis@4194fd6)
whisker 0.4 2019-08-28 [1] CRAN (R 3.6.0)
withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0)
xfun 0.12 2020-01-13 [1] CRAN (R 3.6.0)
[1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library |
I've updated the package as requested on your initial review. However, I think that the configuration for .travis.yaml that better matches CRAN is using bioc-release. Because otherwise it doesn't install the Bioconductor packages. I hope that with the new instructions to install the Bioconductor packages reviewers and users will be able to install all of its dependencies. |
Hi @llrs, Sorry, I probably didn't make myself clear enough. For sure you should be testing on Does that make sense? I've opened a PR to your repo with my attempt at implementing this. Let's see if it's successful! |
Well, my impression was that CI should mimic the steps done on CRAN, not for the users. The steps are different, given that CRAN installs Bioconductor packages to be able to build packages which depend on Bioconductor. I will try to modify the github actions to test also on OS, windows and Linux in both conditions to be sure. |
Hello again @llrs 👋 I'm still looking for reviewers currently. I had a potential reviewer respond recently that sadly they did not have time at the moment but they also made the following comment:
I generally agree with the comment and was imagining it would be picked up during review but I now think it's probably best to deal with it now as it might be making it difficult to find reviewers. So could you add a little more detail to the README as well as the description in the |
Hi @annakrystalli, Thanks for sending the feedback, I appreciate it. I've edited both the DESCRIPTION and the README. Hope this way it will be easier to get reviewers. |
@annakrystalli I've been experimenting with macOS builders and using |
In the interest of reducing load on reviewers and editors as we manage the COVID-19 crisis, rOpenSci is temporarily pausing new submissions for software peer review for 30 days (and possibly longer). Please check back here again after 17 April for updates. In this period new submissions will not be handled, nor new reviewers assigned. Reviews and responses to reviews will be handled on a 'best effort' basis, but no follow-up reminders will be sent. Other rOpenSci community activities continue. We express our continued great appreciation for the work of our authors and reviewers. Stay healthy and take care of one other. The rOpenSci Editorial Board |
About the installation problems there have been a discussion about problems installing packages from source and from binaries. See these messages from Bioc-devel mailing list: https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016689.html. In summary it seems that there is a bug in the logic of Just for future references |
In this period new submissions will not be handled, nor new reviewers assigned.
Reviews and responses to reviews will be handled on a 'best effort' basis, but
no follow-up reminders will be sent. Please check back here again after 25 May when we will be announcing plans to slowly start back up. We express our continued great appreciation for the work of our authors and reviewers. Stay healthy and take care of one other. The rOpenSci Editorial Board |
@annakrystalli It's been a week since last message, and I haven't seen any plan announced here, on the website or on twitter. Is there a plan to start again looking for reviewers ? |
Hi @llrs, Apologies for the slow motion on here. I've had a reviewer agree to review before lockdown but I'm still having trouble locating the second one. I should have communicated that here last week. Indeed if you have any suggestions for potential reviewers (that would not have a conflict of interest) they would be gratefully received. Regarding CRAN I think it is fine to submit to CRAN independently of the rOoenSci review. You could always just push a new release to CRAN once the rOpenSci review is done. As for publicising our return to reviewing, we are currently just gauging submission rates before we make a full announcement next week. |
Hi @annakrystalli Thanks for the quick reply, hope that everything is fine at your place. I don't know how important is to wait for the second reviewer, maybe the first one can start without waiting for the second one? There are some authors of related packages that I would appreciate their feedback even if they have some conflict of interest against the package (I pasted their Github username): RaphaelS1, kevinrue. |
The conflict of interest is for us really (rOpenSci) to ensure we get a fair and unbiased review. In any case I could probably start the review with just a single reviewer. Let me just run all this past the rest of the editors and get back to you. |
The following statement in the README is not right: "On fuzzy sets, elements have a probability to belong to a set". In a probabilistic framework, every element is in a set or not in it. We are uncertain which sets an element is in, but the reality is still binary. In a fuzzy framework, the reality is vague. For example, say we are interested in which genes are highly expressed in a tissue. We could transform RNA-seq data for expression levels to a 0-1 scale, where 1 indicates the gene is very highly expressed, 0.6 is fairly highly expressed, 0.3 is weakly expressed, etc. The reality here is not binary. If you use fuzzy methods on probabilistic data, as you do in the README and the fuzzy vignette, the results will be wrong except in very special cases. The probability that a particular element is in A or B is As @j23414 mentioned, the probability function (assuming independence) can be defined to work over fuzzy values (Smithson, 2006). Take the statement, "I am tall OR strong". Now say my tall stat is 0.3 and my strong stat is 0.6. The truth value of this statement is 0.6, using the max operator. This intuitively seems reasonable. Similarly, it makes intuitive sense that the statement, "I am tall AND strong" has a truth value of 0.3. We can do similar calculations using the probability function where, assuming independence, we get However, we don't get to define our own laws of probability. If we are given probabilistic data, then we have to use probabilistic operators. We can't freely change the data from membership probabilities to fuzzy truth values; these mean different things. These distinctions need to be VERY clear in the README and vignette. If they aren't clear, a careless user could easily assume the operators are probabilistic and get completely wrong results. |
@arendsee Perhaps that sentence on the README is not suited for some logics or types of fuzzy sets. I am struggling to introduce the topic for people not familiar with fuzzy sets and at the same time support and explain what the package can do to more expert users. I'll carefully review the wording around the subject. I agree that it is different to work with membership functions (Gene A 0.3 RNA levels means low, 0.6 means highly expressed) than with probabilities (Gene A is on 60% of cases highly expressed, whatever its value is). However, assigning a set from a measure is, nowadays, out of the scope of the package, this is either provided by the source of data or something the user should do. About probabilities of A or B, it is still unclear to me over what do you calculate that probability or when would you like to do this. When I'll post a longer reply I'll provide some examples and questions about this, as I tend to understand something better with a practical question or exercise. On your example While users should not use different logics on the same data the package aims to allow them to use whatever logic they want according to the knowledge they have about the data. But I can't guide them or provide feedback. The fuzzy set theory extends the probability framework as explained on a source @j23414 provided:
Hope to correct the points raised by @j23414 and post about the set theory and fuzzy-set operators that both reviewers had questions by the end of the week. |
Hmm, that quoted source is new to me (Bezdek, 1993)... I provided links to the set theory chapter (Smithson 2006), membership value definition, and confidence value definition. No matter : ) thanks for sharing the source! There definitely seem to be some murkiness to when and where is appropriate to use Fuzzy Sets. For @arendsee's Case 1) if Case 2) if **You can also think of this as a The union of probabilities (Case 2) is usually explained and illustrated as a venn diagram (link with explanation here). |
The first sentence in the README does not apply to the specific logic you are using in the package. That operator you are using will not return membership probabilities. Using that operator to infer union membership probability will give you a meaningless result.
I'm referring to the union operator as it is used in your fuzzy vignette: set.seed(4567) # To be able to have exact replicates
relations <- data.frame(sets = c(rep("A", 5), "B", "C"),
elements = c(letters[seq_len(6)], letters[6]),
fuzzy = runif(7))
fuzzy_set <- tidySet(relations)
The union function is intended to calculate the probability that f is in C or B.
Now, from the laws of probability: Your union function calculates the wrong probability. If you want to use the current logic, then you need to strongly state that the fuzzy values are NOT probabilities.
@j23414 I think I didn't explain my point here clearly. The tall or strong example was purely fuzzy. The data is fuzzy and probabilistic methods do not apply. I was describing two options for operators over fuzzy data. The first is the "standard" fuzzy operator that is currently used in BaseSet. The second is another fuzzy operator that happens to be the same as the probability of independent events, but the interpretation is different. Specifically, 0.72 is not the probability that the statement, "I am tall or strong", is true, rather it is a measure of how true the statement is. These two cases show two artificial metrics for fuzziness. My point was that there are different fuzzy union operators. One of the fuzzy operators looks like the probabilistic operator for independent events. But just because this one operator can be used for set membership (assuming independence), does not mean any fuzzy operator can be used. There are many fuzzy metrics, but only one correct probability. We don't get to choose whether data we are given is probabilities or fuzzy truth values.
The purpose of the README and vignettes is to guide them. But your examples, both in the README and vignettes, incorrectly use fuzzy operators to infer the probabilities of memberships in a set union.
Yes, fuzzy set theory is MORE general than probability. This means that probability is a special case of fuzzy logic. Thus a specific logic within fuzzy set theory applies to probabilistic systems. This does not mean that probability theory equals fuzzy theory. And it emphatically does not mean that any fuzzy logic applies to probability. Specifically, the fuzzy operator you are using for unions is not the same as the probability operator. in probability theory: P(A or B) is not equal to F(A or B), therefore you simply can't use that operator to calculate the probability of membership in a set union. Maybe we should invite a statistician and/or a fuzzy set theorist to comment. |
Hello all. It looks like some guidance from a statistician might indeed be a good idea. This is technically not part of package review but I'll see what other editors think and let you know what we can do. |
Review 2: I'll only reply to comments or points that weren't checked or there were some substantial comments:
ecoli_sets %>%
mutate(
fuzzy = case_when(sets == "GL" ~ 0.2, # Add fuzzy values
sets == "CF" ~ 0.8)) %>%
set_size() Yes, the GL has 0.04 probability to have two genes so the glycolysis pathway probably don't have any genes. That's is if we assume that the fuzzy values are probabilities, if they where other things it wouldn't be accurate. I think this is different from cardinality, as cardinality it is just a number for a set, while here I return several values for a single set. I added to the documentation how is this calculated (via I provided a new method
Changed the automatic names for the full names, also added some links for references to the data and how to interpret it. I tried to use some fuzzy values on the advanced vignette, but it would take too long to calculate for many sets and would make the package fail the checks. Fuzzy logic & explanation Many thanks for all the feedback on this, I realized that I assumed too many things. I modified or added explanations about the fuzzy values around the package. I modified the README section about fuzzy sets, now it reads : "Fuzzy sets are similar to classical sets but there is some vagueness on the relationship between the element and the set.". Similarly I've added a new section on the README about why this package was developed and why it could be useful to others. Also introduced the topic on the fuzzy vignette about what are they and some links. I do not want to force an interpretation of the fuzzy theory which sometimes translate to using different semantics: membership, probabilities, truth vales, and others. I think that the fuzzy column is the most neutral and better understood. There seems to be a movement from fuzzy research to differentiate it from probabilities, but at the same time to expand them(probabilities). One can use any logic according to the specific framework one wish to work. This in encouraged through the package by being able to use any logic. To show the flexibility of the package using the same data I've added some examples with a different logic explaining when this logic could be applied. Hope now it is clear that the package does not forces to use a certain logic. The defaults parameters of the package are meant to provide sensible results with fuzzy sets but to not give surprises on classical sets. On the specific case of union I added an example on the documentation that I hope it makes it clear how to use other logics outside the default for the specific case of probabilities, Given all the discussion around how to interpret the fuzzy sets, I would appreciate your thoughts on when do fuzzy sets arise on practical cases. How do imagine a case where there are multiple cases which might be related or not to a set? For instance imagine I want to record some cards on an intersection; the first blue car goes right, second blue car goes right, third blue car goes left. Would you record as a fuzzy/probability set (set being here right/left) where a blue car has 0.66 relationship with going right and 0.33 going left? Or as a classical set with all three cars two of them on the right set and one on the left set? If it is a fuzzy set, then which logic would you apply? |
The term Think about the case of three sets with probability of membership for each being 1. The probability of the union would also be 1. But I don't think it is necessary for initial acceptance into rOpenSci, but you should eventually consider how to relax the independence assumption. I'd definitely recommend talking to a statistician (statisticians are great). I think fuzzy sets have lots of applications within bioinformatics. For example, someone might want to know whether metabolite A and metabolite B both have high concentrations in a given tissue. You could scale the molar concentrations to values between 0 and 1 where 0.5 is the "normal" value. |
@arendsee I added a helper function The validation of the TidySet class includes a test that check that fuzzy values are between 0 and 1 (both included). There is no need to add a unit test as it is automatically checked with each exported method (via As previously said I don't assume independence or dependence of sets in the package, the package doesn't even assume that the fuzzy sets are probabilities! This is and will be the responsibility of the user, mainly because from the data stored the package can't make any educated guesses about the type of the relationships between sets. I'll ask a statistician about relaxing independence between sets. I agree that they have lots of applications within bioinformatics, that's why I wrote the package: to make it easier to use them. Thank you for all the feedback to make it even easier to use it. What do you think of the questions on the last paragraph on my previous comment? |
Hi @annakrystalli, @j23414, @arendsee. |
Yes, along the review I made several questions which I didn't get an answer yet. Also I don't know what was decided about reaching a statistician (from above):
|
Plan to take a look again this week, thanks for the changes. And thanks @annakrystalli for organizing |
I'm double-checking some of the math. I'll get back with you in a bit. |
OK, the math checks out. Also, the extended vignette clarifies the fuzziness relationship. I think this package will have a good future. There are a lot of neat directions you could take it. I think it will be good to share it with the community and get feedback. The package may incite a lot of debate, which is good. Once you and the community have hammered out the wrinkles and chosen the features and interface that is "best", maybe you can roll out a non-backwards compatible BaseSet2 package. So, my final thoughts: the package is useful as is and will get much better as an active community grows around it. |
The new version passes checks so far... took a while to build the advanced vignette but I can see that fuzzy values were added (good). Glad to see more explanations and links in README, Fuzzy, and Advanced. For In summary, can see how the package may be useful. Agree that it can continue to be refined with an active community. |
Approved! Thanks @llrs for submitting and @arendsee & @j23414 for your reviews! 🎉 🚀 To-dos:
Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @stefaniebutland in your reply. She will get in touch about timing and can answer any questions. We've put together an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here. |
Thanks for transferring @llrs ! I've made you admin again 👍 |
Many thanks @annakrystalli for approving the package. I already transferred the package and hope to finish all the other steps either today or tomorrow. Soon I'll begin my holidays and won't be able to contribute with a post for the blog, but I'd like to write one after September @stefaniebutland. But I'm also hoping to present on the Rstudio conference on January... not sure when I'll manage to write it up. |
No problem. Just ping when you are done. And enjoy your holidays! Well deserved 😎🏝 |
@llrs I just saw your tweet about your RStudio talk! This is wonderful. I'll mark my calendar to contact you in late September about a post. Enjoy your holidays! |
@annakrystalli I think I finished with all the required changes. Let me know if I missed anything. (I haven't submitted to CRAN yet. I'll wait until another submitted package is accepted to submit BaseSet to CRAN) |
Hey @llrs, everything looks good to me. A couple of things to note:
Otherwise, I'm happy to conclude the review and close the issue. Now it's time for my holiday too! 😎🏝🍹 |
Submitting Author: Lluís (@llrs)
Repository: llrs/BaseSet
Version submitted: 0.0.10
Editor: @annakrystalli
Reviewer 1: @arendsee
Reviewer 2: @j23414
Archive: TBD
Version accepted: TBD
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
The package implements methods to work on sets, doing intersection, union, complementary and other set operations in a "tidy" way. It also allows to import from several formats used in the life science world. Like the GMT and the GAF or the OBO format file for ontologies.
The idea is to use the package for working with sets and signatures of genes in scRNAseq or in pathways and ontologies but it might work with other fields.
There is the sets package which implements a more generalized approach, that can store functions or lists as an element of a set (while mine it only allows to store a character or factor), but it is harder to operate in a tidy/long way. Also the operations of intersection and union need to happen between two different objects, while TidySet objects (the class implemented in BaseSet) can store a single set or thousands of them.
In BaseSet is easier to operate and implement new fuzzy logic operations. It is developed openly on github compared to sets which I couldn't track how it is being developed.
The GSEABase partially implements this, but it doesn't allow to store fuzzy sets and it is also quite slow as it creates several classes for annotating each set. Neither does the BiocSets the package, which don't use the fuzzy set logic.
There is also the hierarchicalSets package that is focused on clustering of sets that are inside other sets and visualizations. However, BaseSet is focused on storing and manipulate sets including hierarchical sets.
Most of the replies are copied from #339, handeled by @melvidoni.
Technical checks
Confirm each of the following by checking the box. This package:
Publication options
JOSS Options
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
.MEE Options
Code of conduct
The text was updated successfully, but these errors were encountered: