Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Github Actions & Quay Container building #85

Closed
jgallowa07 opened this issue Jan 24, 2022 · 2 comments
Closed

Github Actions & Quay Container building #85

jgallowa07 opened this issue Jan 24, 2022 · 2 comments

Comments

@jgallowa07
Copy link
Member

jgallowa07 commented Jan 24, 2022

I believe this error may be out of date. I will use this issue to document how my build experience is going - and potential solutions I have found so far.

background

To summarize the pipeline of things going on here, we have a github action here which generally does the following:

  1. checks out all the submodules
  2. builds the docker image
  3. push the built image to quay.io/matsengrp/linearham

Additionally, there is a build trigger in the quay repo which is triggered by push events. I believe this probably un-necessary as it's a duplication of image build efforts - for now it is toggled off anyway. So if the action builds, we should be good to go on the quay container.

Github actions

Problem 1: submodules

Github submodules failing -> as with here

the failing packages are attempting to clone via ssh in that log, whereas the ones that work are cloning via http, which is almost certainly the problem, but I can't figure out what's causing that

Potential Solution 1
We need to change the .gitmodules in partis to point towards the https accessable code. Currently, jared-test branch just points towards a partis I personally cloned and modified. @psathyrella can you handle making this change in the actual partis? This should be done in the branch which linearham is using as a submodule -> currently this
to do this update the .gitmodules file to look like this ->

Potential Solution 2
We could do this for ssh -> actions/checkout#116 (comment) .

problem 2: scripts/run_bootstrap_asr_ess.R

When running the test.sh in the built container, we get the following error

Rscript --slave --vanilla scripts/run_bootstrap_asr_ess.R output/cluster-0/lineage_KC576081.1/mcmciter25_mcmcthin1_tuneiter0_tunethin100_numrates4_rngseed0/lh_revbayes_run.trees output/cluster-0/lineage_KC576081.1/cluster_seqs.fasta 0.1 0.05 1 0 output/cluster-0/lineage_KC576081.1/mcmciter25_mcmcthin1_tuneiter0_tunethin100_numrates4_rngseed0/burninfrac0.1_subsampfrac0.05/linearham_run.trees output/cluster-0/lineage_KC576081.1/mcmciter25_mcmcthin1_tuneiter0_tunethin100_numrates4_rngseed0/burninfrac0.1_subsampfrac0.05/linearham_run.log output/cluster-0/lineage_KC576081.1/mcmciter25_mcmcthin1_tuneiter0_tunethin100_numrates4_rngseed0/burninfrac0.1_subsampfrac0.05/linearham_run.ess
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  NA/NaN/Inf in 'y'
Calls: <Anonymous> -> spectrum0.ar -> lm -> lm.fit

Up to this point, I have not figured out exactly what is causing this. I'm not even sure where the lm() function is being called.

Hacky solution

Simply comment out all Scons targets that involve this dependency. I'm no expert with Scons but it looks to be quite important in testing -> here's all that got commented out.

We'll certainly want to com back to this and figure out what the problem is at some point. For now, I've tested the gh action build and push with jared-test branch and we seem to be okay. Once the partis submodules are updated I can update my branch's partis, and submit a PR for a clean build on master (should be changed to main -> )

This was referenced Jan 24, 2022
@psathyrella
Copy link
Contributor

whoops, yes, #82 is resolved, i added a comment.

ok I've switched the partis .gitmodules to https.

@psathyrella
Copy link
Contributor

psathyrella commented Nov 6, 2022

I don't seem to have updated this issue at the time, but I uncommented those steps here a while ago since unfortunately they weren't optional.

The underlying issue (lm.fit()) was called by coda::effectieSize() in run_bootstrap_asr_ess.R, which was calculates ess values for every column in ess.data. The crash was caused by a single NaN/inf value at the top of the LHLogLikelihood column, which I'm guessing maybe was there since this test data has a very small number of steps or something? Note that this crash only happened on this test data, at least that I'm aware of. Anyway did not investigate, just removed rows (with printed warning) with any nan/inf values which seems to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants