Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write batch correct results to scratch folder and delete pilot modules #309

Merged
merged 5 commits into from
Jan 26, 2023

Conversation

ewafula
Copy link

@ewafula ewafula commented Jan 24, 2023

Purpose/implementation Section

What scientific question is your analysis addressing?

Module result files are too large and are now all written locally to the OpenPedCan-analysis repository scratch directory (OpenPedCan-analysis/scratch/)

What was your approach?

update code to write results to the scratch/ folder

What GitHub issue does your pull request address?

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

NA

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

NA

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Results written locally to repo scratch folder

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

@ewafula
Copy link
Author

ewafula commented Jan 24, 2023

@aadamk, @jharenza suggested we update the module to avoid writing large files within the module, which subsequently get uploaded to GitHub. Module outputs have been problematic when cloning or syncing locally because they are being stored on the GitHub LFS. I have updated the two main code scripts to write output locally to the repo scratch/ folder, which is not committed to origin/dev. The code runs ok on my EC2 instance but it fails GA checks. I have uncommented the tumor-only analysis commands for TARGET NBL in the batch script to allow CI testing NBL batch correction. It is currently commented out in the merged code on the repo. Any suggestion on fixing the following error? Considering long term the module is going to be run on cavatica following the development of the CWL workflow, should we even be concerned with GA errors?

Reading in histologies file
Reading RSEM expected counts file
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels
Calls: model.matrix -> model.matrix.default -> contrasts<-
Execution halted

@jharenza
Copy link
Member

@aadamk, @jharenza suggested we update the module to avoid writing large files within the module, which subsequently get uploaded to GitHub. Module outputs have been problematic when cloning or syncing locally because they are being stored on the GitHub LFS. I have updated the two main code scripts to write output locally to the repo scratch/ folder, which is not committed to origin/dev. The code runs ok on my EC2 instance but it fails GA checks. I have uncommented the tumor-only analysis commands for TARGET NBL in the batch script to allow CI testing NBL batch correction. It is currently commented out in the merged code on the repo. Any suggestion on fixing the following error? Considering long term the module is going to be run on cavatica following the development of the CWL workflow, should we even be concerned with GA errors?

Reading in histologies file
Reading RSEM expected counts file
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels
Calls: model.matrix -> model.matrix.default -> contrasts<-
Execution halted

just a hunch - when you wrote to scratch maybe you need to call them back in from scratch for additional scripts?

@ewafula
Copy link
Author

ewafula commented Jan 25, 2023

@aadamk, @jharenza suggested we update the module to avoid writing large files within the module, which subsequently get uploaded to GitHub. Module outputs have been problematic when cloning or syncing locally because they are being stored on the GitHub LFS. I have updated the two main code scripts to write output locally to the repo scratch/ folder, which is not committed to origin/dev. The code runs ok on my EC2 instance but it fails GA checks. I have uncommented the tumor-only analysis commands for TARGET NBL in the batch script to allow CI testing NBL batch correction. It is currently commented out in the merged code on the repo. Any suggestion on fixing the following error? Considering long term the module is going to be run on cavatica following the development of the CWL workflow, should we even be concerned with GA errors?

Reading in histologies file
Reading RSEM expected counts file
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels
Calls: model.matrix -> model.matrix.default -> contrasts<-
Execution halted

just a hunch - when you wrote to scratch maybe you need to call them back in from scratch for additional scripts?

@jharenza, I am calling them back from scratch. Works ok when run on the full datasets on EC2. The GA error with the contrast function here seems to suggest that there is a predictor variable in the matrix without levels
https://www.statology.org/contrasts-applied-to-factors-with-2-or-more-levels/

@ewafula
Copy link
Author

ewafula commented Jan 25, 2023

@aadamk, @jharenza suggested we update the module to avoid writing large files within the module, which subsequently get uploaded to GitHub. Module outputs have been problematic when cloning or syncing locally because they are being stored on the GitHub LFS. I have updated the two main code scripts to write output locally to the repo scratch/ folder, which is not committed to origin/dev. The code runs ok on my EC2 instance but it fails GA checks. I have uncommented the tumor-only analysis commands for TARGET NBL in the batch script to allow CI testing NBL batch correction. It is currently commented out in the merged code on the repo. Any suggestion on fixing the following error? Considering long term the module is going to be run on cavatica following the development of the CWL workflow, should we even be concerned with GA errors?

Reading in histologies file
Reading RSEM expected counts file
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels
Calls: model.matrix -> model.matrix.default -> contrasts<-
Execution halted

just a hunch - when you wrote to scratch maybe you need to call them back in from scratch for additional scripts?

@jharenza, I am calling them back from scratch. Works ok when run on the full datasets on EC2. The GA error with the contrast function here seems to suggest that there is a predictor variable in the matrix without levels https://www.statology.org/contrasts-applied-to-factors-with-2-or-more-levels/

@aadamk, @jharenza clarified both of you had agreed to exclude NBL because v11 currently does not have subtyping. Maybe the contrast error in both the CI subset and the complete datasets will go away in v12 following subtyping. I'll exclude GA checking NBL with an if statement using a CI environmental variable as we do in other modules. After v12, we can run without setting the CI environmental variable.

@aadamk
Copy link

aadamk commented Jan 25, 2023

@aadamk, @jharenza clarified both of you had agreed to exclude NBL because v11 currently does not have subtyping. Maybe the contrast error in both the CI subset and the complete datasets will go away in v12 following subtyping. I'll exclude GA checking NBL with an if statement using a CI environmental variable as we do in other modules. After v12, we can run without setting the CI environmental variable.

hi @ewafula - yes, subtyping for nbl was moved over to pathology free text with plans to move it back in v12. as such, i commented out that code so that it could pass CI, though I your plan with env var sounds good to me. thank you.

@ewafula ewafula requested review from adilahiri and removed request for sangeetashukla January 26, 2023 15:59
@ewafula ewafula merged commit 1585c85 into dev Jan 26, 2023
@jharenza jharenza deleted the exclude-batch-correct-large-files branch February 11, 2023 02:37
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants