Vignette #12

lightning-auriga · 2022-03-12T20:46:02Z

The vignette has two major issues:

the first half of the document is basically an entirely separate white paper that goes into some presentation of the stats methodology of the package. however, it's a bit messy (markdown formatting off, variables and functions not always clearly defined, etc.), and also it seems like a strange place to put a ton of exposition about the package. it's not really a proof either, so it's a bit of a strange fit. if it's a vignette, then it should really be integrated with example content, so then for example it might make it easier for the reader to understand, for example, exactly where in the package they can see a representation of rule coverage being computed. standing alone, I'm not sure it exactly has its intended effect.
in the second half, the worked examples, things are challenging in a different way. I think there is some desync between the example code and the text surrounding it. obviously, as mentioned in Non reproducible results #9, the numbers change, but additionally there are some more straightforward mismatches: e.g. text says use 2fold but function call says n_folds=5. I also get a ton of warnings on my system when rendering the vignette, and that may just be something off about my setup, but without it prerendered on CRAN that's all I have to work with. Warning text was, for example, Warning in private$.train(processed_task, trained_sublearners): Lrnr_gam_NULL_NULL_GCV.Cp failed with message: Error in private$.train(processed_task): Specified outcome type is unsupported by Lrnr_gam. It will be removed from the stack for the section Run CVtreeMLE

I'll add additional specific comments to the issue.

The text was updated successfully, but these errors were encountered:

lightning-auriga · 2022-03-12T20:56:18Z

Additional specific comments:

the link to the personal webpage at the top leads to a webpage with generic placeholder content
the vignette heavily cites van der Laan; this is perfectly reasonable, but at some point here or in the paper, additional citations from other sources would be appropriate to add, to help contextualize the work in the broader field
g(W) appears in Iterative Backfitting of Decision Trees without definition, and is probably a typo. I'd recommend doing a pass over all the variable definitions and formulae to make sure they're clear and typo-free, and rendered correctly in markdown
h(A,W) and f(A,W) appear without definition, after h and f had been defined marginally. I think I understand what you're going for with that, but given that, if it's what I think, it's really important for understanding, I think making sure really that whole section is clear is essential
the document makes several assertions of asymptotic performance, but doesn't define what asymptotes are being mentioned. I think being as clear as possible with this in particular will help the user better understand if, for example, they have any hope of consistent behavior with their particular dataset and model space
similarly, defining how exactly someone would figure out they need to increase CV folds would be very helpful
citations in text and reference content at the end of the document are poorly formatted
there are some sections that look like they maybe haven't been finished. an example would be the text block including formally established (citation)
when I ran the section Pooled TMLE Mixture Results, I ended up with a CI range of [2.593, 2.735], which doesn't cover 3. obviously that's possible even if things are well-behaved, but please check to be sure things are all working as expected with your example
folds: the n_folds parameter 5 and the folds in, for example, plot_mixture_results, are clearly referring to different things, as they do not match each other; but it's not clear from anything I saw in the package what the latter fold is referring to. please clarify

blind-contours · 2022-07-28T00:02:24Z

I have redone the entire vignette to match better practices. All the background has been moved to an Arxiv paper which I will publish once (hopefully) this package has been accepted with JOSS. All these issues should be resolved now. I still have some additions to make including better explanations of the marginal rules and plots. The vignette now focuses more on explaining the output and how it can be interpreted. Additionally the vignette uses synthetic data from the NIEHS that represents a real mixed exposure with known ground-truth. So we can simultaneously examine the output and compare interactions found to interactions built into the data.

blind-contours · 2022-07-28T20:44:45Z

I need to add some description of defining how exactly someone would figure out they need to increase CV folds which can be included still given changes to the vignette.

lightning-auriga mentioned this issue Mar 12, 2022

[REVIEW]: CVtreeMLE: Efficient Estimation of Mixed Exposures using Data Adaptive Decision Trees and Cross-Validated Targeted Maximum Likelihood Estimation in R openjournals/joss-reviews#4181

Closed

40 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vignette #12

Vignette #12

lightning-auriga commented Mar 12, 2022

lightning-auriga commented Mar 12, 2022

blind-contours commented Jul 28, 2022

blind-contours commented Jul 28, 2022

Vignette #12

Vignette #12

Comments

lightning-auriga commented Mar 12, 2022

lightning-auriga commented Mar 12, 2022

blind-contours commented Jul 28, 2022

blind-contours commented Jul 28, 2022