-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataPackageR submission #230
Comments
Thanks for your submission @gfinak! 😺 Below are the results of my editorial checks and comments. Please tackle the changes, I'll then look for and assign reviewers. 🙏 Editor checks:
Editor comments
It is good practice to
✖ omit "Date" in DESCRIPTION. It is not
required and it gets invalid quite often. A build
date will be added to the package when you perform
`R CMD build` on it. ✖ add a "URL" field to DESCRIPTION. It
helps users find information about your package
online. If your package does not have a homepage,
add an URL to GitHub, or the CRAN package package
page.
✖ add a "BugReports" field to DESCRIPTION,
and point it to a bug tracker. Many online code
hosting services provide bug trackers for free,
https://github.com, https://gitlab.com, etc. Just run ✖ use '<-' for assignment instead of '='.
'<-' is the standard, and R users and developers
are used it and it is easier to read your code for
them if you use '<-'.
R\processData.R:241:31
R\processData.R:242:48
tests\testthat\test-skeleton.R:274:10
tests\testthat\test-skeleton.R:275:10
tests\testthat\test-skeleton.R:288:10 For that I recommend using ✖ avoid long code lines, it is bad for
readability. Also, many people prefer editor
windows that are about 80 characters wide. Try
make your lines shorter than 80 characters
R\processData.R:64:1
R\processData.R:142:1
R\processData.R:145:1
R\processData.R:236:1
R\processData.R:242:1
... and 10 more lines
I think ✖ avoid calling setwd(), it changes the
global environment. If you need it, consider using
on.exit() to restore the working directory.
R\processData.R:53:20
R\processData.R:62:9
R\processData.R:77:26
R\processData.R:88:26
R\processData.R:97:26
... and 7 more lines Well unless it's needed for the specific context of your package's use case! ✖ avoid sapply(), it is not type safe. It
might return a vector, or a list, depending on the
input data. Consider using vapply() instead.
R\digests.R:10:13
R\digests.R:11:13
R\load_save.R:25:5
R\processData.R:317:17
R\processData.R:330:13
✖ avoid 1:length(...), 1:nrow(...),
1:ncol(...), 1:NROW(...) and 1:NCOL(...)
expressions. They are error prone and result 1:0
if the expression on the right hand side is zero.
Use seq_len() or seq_along() instead.
R\parseDocumentation.R:16:20 ✖ not import packages as a whole, as this
can cause name clashes between the imported
packages. Instead, import only the specific
functions you need. See http://r-pkgs.had.co.nz/namespace.html#imports Ask me any question you might have! |
@maelle , thanks for the package review. I believe I've addressed all the issues with the latest commits. Greg |
Thanks @gfinak! I see much progress indeed! Here are a few more points from
It is good practice to
✖ write unit tests for all functions, and all
package code in general. 78% of code lines are
covered by test cases.
R/build.R:27:NA
R/build.R:29:NA
R/build.R:30:NA
R/build.R:32:NA
R/build.R:33:NA
... and 250 more lines Please increase the coverage a bit by testing
Please shorten the lines unless they contain URLs or so. In which case add
Can you confirm they're needed?
Please replace the use of
Unnecessary dependency? ✖ fix this R CMD check WARNING: LaTeX errors
when creating PDF version. This typically indicates
Rd problems. LaTeX errors found: ! LaTeX Error: File
`inconsolata.sty' not found. Type X to quit or
<RETURN> to proceed, or enter new name. (Default
extension: sty) ! Emergency stop. <read *> l.276 ^^M
! ==> Fatal error occurred, no output PDF file
produced! Not sure what that is, can you please investigate? |
@maelle |
yes it might be since I use tinytex? Will start looking for reviewers now. Thanks for all your efforts! |
Package Review
DocumentationThe package includes all the following forms of documentation:
Functionality
Final approval (post-review)
Estimated hours spent reviewing: 4 Review Comments
Relative to the potential impact, I believe the documentation is a bit modest and detail-oriented. It would be great to take a step back, explain the general concept of a data package, and sell
|
Dear @wlandau |
@gfinak I am glad this process helped. That has certainly been my experience with rOpenSci reviews. You have indeed addressed all the issues referenced in my review, including my later follow-up requests in the various threads. Pending approval from @karawoo and @maelle, I recommend the acceptance of |
@karawoo 👋 friendly reminder that your review is due on 2018-07-23 😺 |
Package Review
DocumentationThe package includes all the following forms of documentation:
Functionality
Final approval (post-review)
Estimated hours spent reviewing: 5.5 Review CommentsDataPackageR generates R package templates for processed datasets, and includes data processing code as vignettes to support reproducibility. Packaging raw data, processing code, and tidied results into an R package is one way to provide a reproducible record of data processing. DataPackageR is also designed to provide a convenient method of distributing smaller processed data when the original files are too large. Distributing processed data without the original runs somewhat counter to the goal of reproducibility; from that perspective I'd be hesitant to use DataPackageR for larger datasets unless they were publicly accessible outside of the package. I have unfortunately not been able to test many of the finer details of this package, because I've been unable to get some of the fundamental elements working locally (see more details in the In addition to resolving some of the issues described below, one of the biggest things I would like to see DataPackageR do is give guidance to the user on what they need to do next after using DataPackageR to set up their package. DataPackageR creates a lot of placeholder text in many different locations within the package, and for R users who have never created a package before (part of DataPackageR's target audience, according to the README) there is a lack of information about what steps come next. VignettesI can't build vignettes for DataPackageR locally: devtools::build_vignettes("~/projects/forks/DataPackageR/")
#> Building DataPackageR vignettes
#> * checking for file ‘/private/var/folders/48/xj0m1tkj5kd_smyyb1t9n2lc0000gq/T/Rtmp2zWYal/mtcars20/DESCRIPTION’ ... OK
#> * preparing ‘mtcars20’:
#> * checking DESCRIPTION meta-information ... OK
#> * checking for LF line-endings in source and make files and shell scripts
#> * checking for empty or unneeded directories
#> * looking to see if a ‘data/datalist’ file should be added
#> * building ‘mtcars20_1.0.tar.gz’
#>
#> * installing *source* package ‘mtcars20’ ...
#> ** R
#> ** data
#> ** inst
#> ** byte-compile and prepare package for lazy loading
#> ** help
#> *** installing help indices
#> ** building package indices
#> ** installing vignettes
#> ‘subsetCars.Rmd’ using ‘UTF-8’
#> ** testing if installed package can be loaded
#> * DONE (mtcars20)
#> Quitting from lines 205-215 (usingDataPackageR.Rmd)
#> Error: processing vignette 'usingDataPackageR.Rmd' failed with diagnostics:
#> namespace is already attached I get the same error with ExamplesThere are no examples for several of the exported functions in Package functionalityI've not been able to generate vignettes very successfully in the packages generated by DataPackageR. Below is a reprex (behind
Created on 2018-07-16 by the reprex package (v0.2.0). I also find that when using an R script rather than an Rmd in
The version in
I have tried adding a title to the Rmd files in both
Automated tests
Other comments
|
Thanks!
I'm out of town until the end of the month. I'll address the second review
when I return.
…On Tue, Jul 17, 2018, 01:07 Maëlle Salmon ***@***.***> wrote:
Thanks a lot for your thorough review @karawoo
<https://github.com/karawoo>! 😸
@gfinak <https://github.com/gfinak> now both reviews are in! 🙂
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#230 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABUkeWXJTuIrA2uDuHZlhly7uKMcTeHDks5uHZtcgaJpZM4UsXQY>
.
|
Thanks for the updated @gfinak! |
@karawoo
with a conditional attach of the I've added missing examples for the yaml functions. Regarding the failure to build vignettes for packages:
I can reproduce this, but find it's an issue that is resolved by restarting the R session after installing the package. I'm addressing this by reloading the built package in the current session after a successful call to I'm working on the remaining issues and will post as I make progress. |
A few more updates here for the sake of documentation.
I'm opening issues at http://github.com/RGLab/DataPackageR/issues for some of the remaining changes. |
@karawoo
As I understand there is a package maxygen that supports this, but it registers some callbacks with roxygen so that it can only support markdown roxygen2 markup or the regular roxygen2 markup. Switching between the two is not possible without an R session restart. I don't think it's something I'm going to support at the moment, but I'll keep the issue open since it would be simpler for the user to be able to document their data objects in their R scripts and Rmd files. |
I think you can enable Markdown in |
Oh, cool, thanks! |
Here is a summary of all the changes:
Thanks for taking the time to review, and let me know if there are any further changes or improvements. |
Thanks @gfinak! I'm on vacation at the moment but will try to look at this in more detail in a few days. |
No worries, no hurry! |
This is looking really great! Everything is working for me now. My only last comment is that it is difficult to run the test suite in an interactive session because of how many times it prompts me for description for the NEWS file. This might become irksome during development but as it doesn't really affect the package functionality and isn't user-facing, I've gone ahead and checked all the remaining checkboxes in my review. I recommend DataPackageR gets accepted into rOpenSci 👍 |
@karawoo that's great news. It's a good idea re: test suite, I'll see about having a package option to turn off interactive testing. |
add .onLoad to set `DataPackageR_interact` option to `interactive()`. Option set to FALSE in tests and back to default after tests complete. Addresses the final comments in ropensci/software-review#230
Approved! Thanks @gfinak for all your work and @karawoo @wlandau for your reviews! 😺 To-dos:
Should you want to awknowledge your reviewers in your package DESCRIPTION, you can do so by making them Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. (https://ropensci.org/blog/). If you are interested, @stefaniebutland will be in touch about content and timing. We've started putting together a gitbook with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here. |
Fine by me! Thanks! |
Add ropensci footer. Update references. ropensci/software-review#230
… appveyor according to instructions. ropensci/software-review#230
Run codemetar ropensci/software-review#230
That would be great, Greg. Thanks for all your hard work on the package. |
@maelle I've addressed the following:
|
Perfect! I had forgotten to recommend you also add this badge at the top of the README
Thanks! |
Summary
A package to reproducibly process raw data into packaged, analysis-ready data sets.
https://github.com/RGLab/DataPackageR
[e.g., "data extraction, because the package parses a scientific data file format"]
reproducibility, because the package provides a framework for reproducibly processing raw data into analysis-ready data sets in R data packages.
The target audience are data analysts, data scientists and any users working with diverse, large, raw data sets that need significant preprocessing to transform them into analysis-ready data sets. This processing may be time consuming and the raw data too large to include in a package. DataPackageR simplifies the process of ensuring that this data processing is done reproducibly by ensuring vignettes are constructed that track how data is processed, ensure data set objects are documented, verifies checksums of individual objects and bumps data sets versions automatically, and decouples the data transformation from the usual build and installation process. The latter is particularly useful when raw data cannot be shared with the package or if processing such data is too time consuming to be re-run each time the package is build and installed using the usual R CMD BUILD process. The tool is useful for preparing analysis-ready data for publication with manuscripts, or sharing it for collaborative data analysis.
yours differ or meet our criteria for best-in-category?
The drake and workflowr packages are similar, in that they allow one to build reproducible workflows. DataPackageR is different in that its aim is to provide tool to help users implement the ideas found in ropensci/rrrpkg and cboettig/template and elsewhere, using their existing code with minimal effort. That code may leverage tools like workflowr and drake, but does not have to. DataPackageR provides the infrastructure to automate building, and documentation, and tracking data provenance via automated construction of vignettes documenting the transformation of raw data sets to R data objects ready for analysis, and packaging those into R data packages that can be shared.
Requirements
Confirm each of the following by checking the box. This package:
Publication options
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
.Detail
Does
R CMD check
(ordevtools::check()
) succeed? Paste and describe any errors or warnings:Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
The package name uses camel case as it has been around for several years, used internally by our research group.
If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:
Suggested reviewers
Jenny Bryan (jennybc)
Carl Boettiger (cboettig)
Ted Laderas (laderast)
The text was updated successfully, but these errors were encountered: