Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vignette #12

Open
lightning-auriga opened this issue Mar 12, 2022 · 3 comments
Open

Vignette #12

lightning-auriga opened this issue Mar 12, 2022 · 3 comments

Comments

@lightning-auriga
Copy link

The vignette has two major issues:

  1. the first half of the document is basically an entirely separate white paper that goes into some presentation of the stats methodology of the package. however, it's a bit messy (markdown formatting off, variables and functions not always clearly defined, etc.), and also it seems like a strange place to put a ton of exposition about the package. it's not really a proof either, so it's a bit of a strange fit. if it's a vignette, then it should really be integrated with example content, so then for example it might make it easier for the reader to understand, for example, exactly where in the package they can see a representation of rule coverage being computed. standing alone, I'm not sure it exactly has its intended effect.

  2. in the second half, the worked examples, things are challenging in a different way. I think there is some desync between the example code and the text surrounding it. obviously, as mentioned in Non reproducible results #9, the numbers change, but additionally there are some more straightforward mismatches: e.g. text says use 2fold but function call says n_folds=5. I also get a ton of warnings on my system when rendering the vignette, and that may just be something off about my setup, but without it prerendered on CRAN that's all I have to work with. Warning text was, for example, Warning in private$.train(processed_task, trained_sublearners): Lrnr_gam_NULL_NULL_GCV.Cp failed with message: Error in private$.train(processed_task): Specified outcome type is unsupported by Lrnr_gam. It will be removed from the stack for the section Run CVtreeMLE

I'll add additional specific comments to the issue.

@lightning-auriga
Copy link
Author

Additional specific comments:

  • the link to the personal webpage at the top leads to a webpage with generic placeholder content
  • the vignette heavily cites van der Laan; this is perfectly reasonable, but at some point here or in the paper, additional citations from other sources would be appropriate to add, to help contextualize the work in the broader field
  • g(W) appears in Iterative Backfitting of Decision Trees without definition, and is probably a typo. I'd recommend doing a pass over all the variable definitions and formulae to make sure they're clear and typo-free, and rendered correctly in markdown
  • h(A,W) and f(A,W) appear without definition, after h and f had been defined marginally. I think I understand what you're going for with that, but given that, if it's what I think, it's really important for understanding, I think making sure really that whole section is clear is essential
  • the document makes several assertions of asymptotic performance, but doesn't define what asymptotes are being mentioned. I think being as clear as possible with this in particular will help the user better understand if, for example, they have any hope of consistent behavior with their particular dataset and model space
  • similarly, defining how exactly someone would figure out they need to increase CV folds would be very helpful
  • citations in text and reference content at the end of the document are poorly formatted
  • there are some sections that look like they maybe haven't been finished. an example would be the text block including formally established (citation)
  • when I ran the section Pooled TMLE Mixture Results, I ended up with a CI range of [2.593, 2.735], which doesn't cover 3. obviously that's possible even if things are well-behaved, but please check to be sure things are all working as expected with your example
  • folds: the n_folds parameter 5 and the folds in, for example, plot_mixture_results, are clearly referring to different things, as they do not match each other; but it's not clear from anything I saw in the package what the latter fold is referring to. please clarify

@blind-contours
Copy link
Owner

I have redone the entire vignette to match better practices. All the background has been moved to an Arxiv paper which I will publish once (hopefully) this package has been accepted with JOSS. All these issues should be resolved now. I still have some additions to make including better explanations of the marginal rules and plots. The vignette now focuses more on explaining the output and how it can be interpreted. Additionally the vignette uses synthetic data from the NIEHS that represents a real mixed exposure with known ground-truth. So we can simultaneously examine the output and compare interactions found to interactions built into the data.

@blind-contours
Copy link
Owner

I need to add some description of defining how exactly someone would figure out they need to increase CV folds which can be included still given changes to the vignette.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants