Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore renv files and folders during checks #107

Closed
wants to merge 8 commits into from

Conversation

kevinrue
Copy link


name: Ignore renv files and folders during checks
about: Submit a pull request that resolves an BiocCheck issue #106
title: "[PR] Your pull request"
labels: ''
assignees: ''

If you are submitting a pull request to BiocCheck please follow the instructions outlined in this presentation. This presentation includes steps for forking, creating a branch for you to work on, and useful related information.

Prior to sending the pull request, please verify that R CMD build and R CMD check run without warnings or errors on the latest Bioconductor-devel (currently in May 2020 that would be Bioconductor 3.12) and complete the following steps:

  • Update the NEWS file (required)
  • Update the vignette file (required)
  • Add unit tests (optional)

The easiest way to obtain a working environment with Bioconductor-devel in your computer is to use the Bioconductor devel docker image as described in detail in the Bioconductor website.

For more questions, please get in touch with the Bioconductor core team through the Bioconductor Slack workspace.

@kevinrue
Copy link
Author

Hold on before consider a merge. I want to add an option to toggle this functionality on/off, as we probably want to detect (and complain) about "renv" files being pushed to <git.bioconductor.org>, while allowing users to disable the functionality on their local machine (when it crashes BiocCheck as for me).

I'll ping back when it's fully ready. The current version is functional in that it always ignores "renv" files.

@kevinrue
Copy link
Author

It looks ready to me now. Happy to hear feedback.

Example usages and outputs:

> BiocCheck::BiocCheck()
This is BiocCheck version 1.25.14. BiocCheck is a work in progress.
Output and severity of issues may change. Installing package...
* Checking for renv files
    * ERROR: Detected project library files generated by `renv`. On
      your computer, you can use `BiocCheck(`no-check-renv`=TRUE)`. If
      this error appears on the Bioconductor Build System, use `git rm
      --cached` to stop tracking the following directories and files:
        /Users/kevin/github/kevinrue/BiocPackageSofwareTemplate/renv
        /Users/kevin/github/kevinrue/BiocPackageSofwareTemplate/.Rprofile
        /Users/kevin/github/kevinrue/BiocPackageSofwareTemplate/renv.lock
> BiocCheck(`no-check-renv` = TRUE)
[... the usual ...]

Note that renv files are always ignored during the following other checks, as those checks try to parse R / Rmd / ... files also within the renv subdirectory, causing BiocCheck to crash (see #106):

  • checkLogicalUseFiles
  • checkSystemCall

@lshep
Copy link
Contributor

lshep commented Oct 20, 2020

I'm not really in favor of this pull request as it stands.

A few things :

I'm curious as to why this chokes on the coding practices section of code. The coding practice should be limited to the files in the R, man and vignette directory. The renv should not be included but it seems to have gotten the subdirectory. This would be a simple more elegant solution to fixing the initial issue with renv. renv I would imagine is not in or writing anything to those directories? so debugging to not have the subdirectory checked would be the first suggested change.

The renv files and functionality should be .gitignored or utilized on clean branches like we recommend for other additional outside functionality (github actions, pagedown files, etc) . I could see them being problematic if they ended up on a users system, no? wouldn't this be specific to an individual not portable to other systems?

The .gitignore will be especially important regarding the large file size. You added a section that skips renv for file size; which itself is fine for getting over the check however still problematic for Bioconductor. We impose a 5MB file size because of git requirements not as an arbitrary file size. Users utilizing this and not .gitignore renv will then still have issues when adding the repository officially to the Bioconductor server. As it stands any package that has files over 5MB files requires exceptions and additional help/process of a core team member -- and will likely cause issues for pushing to the git server for maintenance updates.

@kevinrue
Copy link
Author

Hi @lshep

I agree that this PR should be considered a draft, and I'm happy to edit as needed, following a discussion of what standard BiocCheck should ultimately enforce.

Starting with this point:

I'm curious as to why this chokes on the coding practices section of code.

The coding practices checks are given the package root directory, which lets some checks parse all subdirectories. Other checks are explicitly checking specific subdirectories (e.g. checkClassEqUsage checks only R/, vignettes/ subdirectories).
Conversely, for instance, checkLogicalUseFiles looks for any R script in any subdirectory under the package root directory. Clearly, this is intended to capture R source code, as well as unit tests, and any other R script stored anywhere in the package, e.g. in the inst subdirectory.

While it is technically possible to run separate commands to search for R script in an explicit list of target locations (R/, tests/, inst/), this is higher maintenance and would be blind to unexpected locations where users may save R scripts. Instead, I find my approach more explicit, if not more elegant: search everywhere in the package, and then remove all matches that start with renv/, which can be generalised to exclude more subdirectories if needed. Fair enough, the trawling does extra work looking in locations that will be later excluded, but ultimately, this has exactly the intended effect that you described:

The renv should not be included

From a more technical perspective, BiocCheck is choking because of some files in the renv/ subdirectory, that do not end with a newline character, see https://stat.ethz.ch/pipermail/r-help/2011-December/298222.html

Then,

renv I would imagine is not in or writing anything to those directories

Correct. renv is only writing:

  • .Rprofile and renv.lock at the root of the directory
  • all other files edited by renv are within the renv/ subdirectory

See https://rstudio.github.io/renv/articles/renv.html

Next,

so debugging to not have the subdirectory checked would be the first suggested change.

Not having the subdirectory checked is what this PR is about, at least on the developer's computer.
Then not having the renv subdirectory and files on the BBS is exactly the motivation for the ERROR message that I designed, specifically the part saying

If this error appears on the Bioconductor Build System, use `git rm --cached` to stop tracking the following directories and files:

I tried to keep the message reasonably short to fit in the BiocCheck report, but I'm open to rephrasing to make the ERROR clearer or more explicit.
In particular, I agree that git rm --cached is only a good first step to get those files out of the BBS if they were pushed by mistake (the original motivation for this PR), while adding those files to .gitignore is a more permanent solution to protect future commits.

So the ERROR message could be updated to something like:

    * ERROR: Detected project library files generated by `renv`. On
      your computer, you can use `BiocCheck(`no-check-renv`=TRUE)`. If
      this error appears on the Bioconductor Build System, use `git rm
      --cached` to stop tracking the following directories and files:
        /Users/kevin/github/kevinrue/BiocPackageSofwareTemplate/renv
        /Users/kevin/github/kevinrue/BiocPackageSofwareTemplate/.Rprofile
        /Users/kevin/github/kevinrue/BiocPackageSofwareTemplate/renv.lock
     then add the following entries to the package `.gitignore` file:
        .Rprofile
        renv
        renv.lock

Next,

I could see them being problematic if they ended up on a users system, no? wouldn't this be specific to an individual not portable to other systems?

Absolutely agreed.
Again, the whole point of this PR is to throw an ERROR if renv files ever make it into the Bioconductor git, while allowing developers to use an renv project library when working on their package locally (i.e. having renv files in their local package directory, but ignored by git).

Finally,

The .gitignore will be especially important regarding the large file size. You added a section that skips renv for file size; which itself is fine for getting over the check however still problematic for Bioconductor.

Again, the renv files should never make it to the Bioconductor repository. So those files should never contribute to the package size, or the individual file limit, on the BBS.
I thought that it would be OK to ignore renv files during the individual file size checks, since I have already added an explicit ERROR if renv files are detected at all. It seemed redundant to have both.
The "package total size" check still considers everything within the package directory, which would include renv if present.
Perhaps I can re-include "renv" files in the individual file size checks, so that the total package size and the individual file checks are working on the same set of files.
That said, size checks should still excluded "renv" files on local developer computers, as "renv" files can be expected there, even if they are gitignored.

Let me know what you think!

@kevinrue
Copy link
Author

Using version from commit f653c9f
Using https://github.com/kevinrue/BiocPackageSofwareTemplate to run BiocCheck (another work in progress).

Full check (intended for the BBS):

  • explicitly checking for "renv" files, and throwing an ERROR if they are detected + suggest adding to .gitignore
  • including "renv" files in the check for individual file size, throwing a WARNING like for any other file
  • skipping "renv" files for system() calls
> BiocCheck::BiocCheck()
This is BiocCheck version 1.25.14. BiocCheck is a work in progress. Output and severity of
issues may change. Installing package...
* Checking for renv files
    * ERROR: Detected project library files generated by `renv`. On your computer, you can
      use `BiocCheck(`no-check-renv`=TRUE)`. If this error appears on the Bioconductor Build
      System, use `git rm --cached` to stop tracking the following directories and files in
      the Bioconductor repository:
        /Users/kevin/github/kevinrue/BiocPackageSofwareTemplate/renv
        /Users/kevin/github/kevinrue/BiocPackageSofwareTemplate/.Rprofile
        /Users/kevin/github/kevinrue/BiocPackageSofwareTemplate/renv.lock
    Then add the following entries to the package `.gitignore` file:
        .Rprofile
        renv
        renv.lock
* Checking Package Dependencies...
* Checking if other packages can import this one...
* Checking to see if we understand object initialization...
* Checking for deprecated package usage...
* Checking for remote package usage...
* Checking version number...
* Checking version number validity...
    Package version 0.0.1; pre-release
* Checking R Version dependency...
* Checking package size...
        Skipped... only checked on source tarball
* Checking individual file sizes...
    * WARNING: The following files are over 5MB in size:
      'renv/library/R-4.0/x86_64-apple-darwin17.0/stringi/libs/icudt61l.dat
      renv/library/R-4.0/x86_64-apple-darwin17.0/stringi/libs/stringi.so'
* Checking biocViews...
* Checking that biocViews are present...
    * ERROR: No biocViews terms found.
See http://bioconductor.org/developers/how-to/biocViews/
* Checking build system compatibility...
* Checking for blank lines in DESCRIPTION...
* Checking if DESCRIPTION is well formatted...
* Checking for proper Description: field...
    * NOTE: The Description field in the DESCRIPTION is made up by less than 3 sentences.
      Please consider expanding this field, and structure it as a full paragraph
* Checking for whitespace in DESCRIPTION field names...
* Checking that Package field matches directory/tarball name...
* Checking for Version field...
* Checking for valid maintainer...
* Checking DESCRIPTION/NAMESPACE consistency...
* Checking vignette directory...
    This is an unknown type of package
    * ERROR: No 'vignettes' directory.
* Checking library calls...
* Checking for library/require of BiocPackageSofwareTemplate...
* Checking coding practice...
grep: /Users/kevin/github/kevinrue/BiocPackageSofwareTemplate/R/*: No such file or directory
* Checking parsed R code in R directory, examples, vignettes...
* Checking function lengths
* Checking man page documentation...
* Checking package NEWS...
    * NOTE: Consider adding a NEWS file, so your package news will be included in
      Bioconductor release announcements.
* Checking unit tests...
    * NOTE: Consider adding unit tests. We strongly encourage them. See
      http://bioconductor.org/developers/how-to/unitTesting-guidelines/.
* Checking skip_on_bioc() in tests...
* Checking formatting of DESCRIPTION, NAMESPACE, man pages, R source, and vignette source...
* Checking if package already exists in CRAN...
* Checking for bioc-devel mailing list subscription...
    * NOTE: Cannot determine whether maintainer is subscribed to the bioc-devel mailing list
      (requires admin credentials).  Subscribe here:
      https://stat.ethz.ch/mailman/listinfo/bioc-devel
* Checking for support site registration...
    Maintainer is registered at support site.


Summary:
ERROR count: 3
WARNING count: 1
NOTE count: 4
For detailed information about these checks, see the BiocCheck vignette, available at
https://bioconductor.org/packages/3.11/bioc/vignettes/BiocCheck/inst/doc/BiocCheck.html#interpreting-bioccheck-output
BiocCheck FAILED.
$error
[1] "Detected project library files generated by `renv`. On your computer, you can use `BiocCheck(`no-check-renv`=TRUE)`. If this error appears on the Bioconductor Build System, use `git rm --cached` to stop tracking the following directories and files in the Bioconductor repository:"
[2] "No biocViews terms found."                                                                                                                                                                                                                                                              
[3] "No 'vignettes' directory."                                                                                                                                                                                                                                                              

$warning
[1] "The following files are over 5MB in size: 'renv/library/R-4.0/x86_64-apple-darwin17.0/stringi/libs/icudt61l.dat renv/library/R-4.0/x86_64-apple-darwin17.0/stringi/libs/stringi.so'"

$note
[1] "The Description field in the DESCRIPTION is made up by less than 3 sentences. Please consider\nexpanding this field, and structure it as a full paragraph"                        
[2] "Consider adding a NEWS file, so your package news will be included in Bioconductor release announcements."                                                                        
[3] "Consider adding unit tests. We strongly encourage them. See\n  http://bioconductor.org/developers/how-to/unitTesting-guidelines/."                                                
[4] "Cannot determine whether maintainer is subscribed to the bioc-devel mailing list (requires\nadmin credentials).  Subscribe here: https://stat.ethz.ch/mailman/listinfo/bioc-devel"

Check ignoring "renv" files (intended for developer's computer):

  • ignore for "renv" files, i.e. do not throw an error if they are present
  • exclude "renv" files in the check for individual file size, do not throw any warning about their size
  • skip "renv" files for system() calls
> BiocCheck::BiocCheck(`no-check-renv`=TRUE)
This is BiocCheck version 1.25.14. BiocCheck is a work in progress. Output and severity of
issues may change. Installing package...
* Checking Package Dependencies...
* Checking if other packages can import this one...
* Checking to see if we understand object initialization...
* Checking for deprecated package usage...
* Checking for remote package usage...
* Checking version number...
* Checking version number validity...
    Package version 0.0.1; pre-release
* Checking R Version dependency...
* Checking package size...
        Skipped... only checked on source tarball
* Checking individual file sizes...
* Checking biocViews...
* Checking that biocViews are present...
    * ERROR: No biocViews terms found.
See http://bioconductor.org/developers/how-to/biocViews/
* Checking build system compatibility...
* Checking for blank lines in DESCRIPTION...
* Checking if DESCRIPTION is well formatted...
* Checking for proper Description: field...
    * NOTE: The Description field in the DESCRIPTION is made up by less than 3 sentences.
      Please consider expanding this field, and structure it as a full paragraph
* Checking for whitespace in DESCRIPTION field names...
* Checking that Package field matches directory/tarball name...
* Checking for Version field...
* Checking for valid maintainer...
* Checking DESCRIPTION/NAMESPACE consistency...
* Checking vignette directory...
    This is an unknown type of package
    * ERROR: No 'vignettes' directory.
* Checking library calls...
* Checking for library/require of BiocPackageSofwareTemplate...
* Checking coding practice...
grep: /Users/kevin/github/kevinrue/BiocPackageSofwareTemplate/R/*: No such file or directory
* Checking parsed R code in R directory, examples, vignettes...
* Checking function lengths
* Checking man page documentation...
* Checking package NEWS...
    * NOTE: Consider adding a NEWS file, so your package news will be included in
      Bioconductor release announcements.
* Checking unit tests...
    * NOTE: Consider adding unit tests. We strongly encourage them. See
      http://bioconductor.org/developers/how-to/unitTesting-guidelines/.
* Checking skip_on_bioc() in tests...
* Checking formatting of DESCRIPTION, NAMESPACE, man pages, R source, and vignette source...
* Checking if package already exists in CRAN...
* Checking for bioc-devel mailing list subscription...
    * NOTE: Cannot determine whether maintainer is subscribed to the bioc-devel mailing list
      (requires admin credentials).  Subscribe here:
      https://stat.ethz.ch/mailman/listinfo/bioc-devel
* Checking for support site registration...
    Maintainer is registered at support site.


Summary:
ERROR count: 2
WARNING count: 0
NOTE count: 4
For detailed information about these checks, see the BiocCheck vignette, available at
https://bioconductor.org/packages/3.11/bioc/vignettes/BiocCheck/inst/doc/BiocCheck.html#interpreting-bioccheck-output
BiocCheck FAILED.
$error
[1] "No biocViews terms found." "No 'vignettes' directory."

$warning
character(0)

$note
[1] "The Description field in the DESCRIPTION is made up by less than 3 sentences. Please consider\nexpanding this field, and structure it as a full paragraph"                        
[2] "Consider adding a NEWS file, so your package news will be included in Bioconductor release announcements."                                                                        
[3] "Consider adding unit tests. We strongly encourage them. See\n  http://bioconductor.org/developers/how-to/unitTesting-guidelines/."                                                
[4] "Cannot determine whether maintainer is subscribed to the bioc-devel mailing list (requires\nadmin credentials).  Subscribe here: https://stat.ethz.ch/mailman/listinfo/bioc-devel"

@mtmorgan
Copy link
Collaborator

A good place for more extended instructions is in the vignette, with the error message saying to look there for a more complete explanation.

@lshep
Copy link
Contributor

lshep commented Oct 21, 2020

The only coding practice check that doesn't check only R man or vignette seems to be the checkSystemCall which for consistency probably should be updated to the limited set of directories. If the checkSystemCall was updated for that it seems your issues would be resolved?
I agree then that maybe we should add a check for the renv directory or files but I think it should be in the location where we check for other system files and bad files? here

@kevinrue
Copy link
Author

checkLogicalUseFiles() also trawls the entire package directory pkgdir, searching for files with a whole lot of extensions that can contain R code.
This was actually my original motivation for the PR, as this function ultimately sources all the R code in the package in the findLogicalFile() function.
At the moment, this function does break down R scripts into those under "R/" and "other", which includes R scripts under "renv/".
But it also picks up all other extensions (manFiles, RNWFiles, RMDFiles) anywhere in the package, which again typically includes a number of Rmd files under "renv/").
This crashes BiocCheck, as some Rmd files under "renv" cannot be evaluated because they depend on a range of packages (e.g. emo in my issue #106)

See https://github.com/Bioconductor/BiocCheck/blob/master/R/util.R#L721

My PR ignores all files under the "renv" subdirectory, to avoid sourcing code that is not part of the package.

Again, this is all only to allow BiocCheck to run on the developer's computer, where "renv" files can be expected, using the no-check-renv=TRUE option

I agree that "renv" files should never be git tracked, (neither GitHub nor Bioconductor git), and that the vignette should educate developers about ignoring those files from the start, or untracking those files if they were added by accident.

@LiNk-NY LiNk-NY deleted the branch Bioconductor:master August 11, 2022 14:50
@LiNk-NY LiNk-NY closed this Aug 11, 2022
@kevinrue
Copy link
Author

I take it that's a "no" then :)

@hpages
Copy link
Contributor

hpages commented Aug 11, 2022

I also wonder what happened. A few words would have been nice.

@LiNk-NY
Copy link
Contributor

LiNk-NY commented Aug 12, 2022

Hi Kevin! @kevinrue

I was doing clean up of branches and deleted the master branch (in favor of the devel branch).
I guess that automatically closes the PR. Sorry!
The BiocCheck codebase has changed significantly since your PR. It will likely not merge without
conflicts.

Regarding the aims of the PR, I'm open to implementing checking of stray renv files in an
existing check.
There are about two packages that have an renv.lock file in them :
https://code.bioconductor.org/search/search?q=f%3Arenv%5C.lock%24

From what I gather, the issue is more related to BiocCheck not having a consistent way of checking files
and folders so that when renv is used, it crashes. It seems to me that updating that mechanism is the way
to go rather than making exceptions in further code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants