Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Management #166

Open
wants to merge 132 commits into
base: develop
Choose a base branch
from
Open

Conversation

sacdallago
Copy link
Member

@sacdallago sacdallago commented Apr 19, 2018

This PR includes a couple of management things.

  • There is an extended version of the "compute_job" tracker, that @thomashopf started. Now it's an abstract class, and there are SQL and mongo extensions of it
  • There is a new object called "dumper", which basically takes in the outcfg after every stage and compares it with a list of "tracked_files" (similar to "archive"). If files are tracked, the dumper will copy the content of these file to the new location. The new location can be a local FS path or a mongo db. This also works by extending an abstract class.

Other improvements include the recursive unrolling of the config file, which can be delimited to a "max_depth", which defaults to 1 (aka. same behavior as before).

The changes have been tested against an old config file, and the pipeline runs just the same way, and outputs are generated just the same way, with the exception of the following:

Now, at the end of every stage, there will be two "outconfig" files generated: one is the original outcfg and the other one is the dumper-version. If no dumper is specified, these two files will be identical.

@b-schubert the reason why I have deviated from the original idea of implementing this as an additional stage is that I talked to @thomashopf and he gave me very valid points as to why that would be a bad idea. So, now this is a functionally equivalent version of that draft idea, but without the conceptual problem of adding a stage, which is not really a "scientific" relevant stage.

In terms of config, the new all optional fields that can be defined are (please, pay attention to the UPPER CASE text, it's important):

management:
  compute_job_type: local|sql|mongo
  [if compute_job_type=sql|mongo, MUST define:] compute_job_uri: <connector://user:pass@loc/DEFAULT_DB>

  dumper: local|mongo
  tracked_files: ["target_sequence_file", "ec_compared_all_file", "remapped_pdb_files", ...]
  [if dumper=local, CAN define, defualts to prefix:] dumper_storage_location: /path/to/filtered/results
  [if dumper=mongo, MUST define:] dumper_uri: <connector://user:pass@loc/any_db>

  [if compute_job_type=sql|mongo OR dumper=mongo, MUST define:] job_name: <unique_name>
  [if compute_job_type=sql|mongo, MUST define:] job_group: <any_name>

Additionally, I deleted the old configs which somehow reappeared after @aggreen 's PR a while ago, and updated them to use the short queue by default, as from a meeting with RC yesterday.


@thomashopf : https://mensfeld.pl/2014/01/using-multiple-mongodb-databases-instead-of-one-performance-check/

…ually have a protocol to write to db result files
@sacdallago
Copy link
Member Author

@thomashopf this is basically done, except there's a bunch of tests that are failing that I wasn't aware of existed. I'll look into those probably in the coming week and then, from my side, this PR is good to go. Please, review it when you get a chance... If there is no major problem, I'd be happy to get this one over with and consider any minor brush-ups for some future stage

@codecov
Copy link

codecov bot commented Jul 10, 2018

Codecov Report

❗ No coverage uploaded for pull request base (develop@51bfd03). Click here to learn what that means.
The diff coverage is 42.05%.

Impacted file tree graph

@@            Coverage Diff             @@
##             develop     #166   +/-   ##
==========================================
  Coverage           ?   37.43%           
==========================================
  Files              ?       77           
  Lines              ?     7626           
  Branches           ?        0           
==========================================
  Hits               ?     2855           
  Misses             ?     4771           
  Partials           ?        0
Impacted Files Coverage Δ
evcouplings/utils/__init__.py 88.23% <ø> (ø)
...ings/utils/management/metadata_tracker/__init__.py 100% <100%> (ø)
evcouplings/utils/management/__init__.py 100% <100%> (ø)
evcouplings/utils/management/file_handling.py 13.51% <13.51%> (ø)
.../management/results_tracker/ResultsTrackerMongo.py 15.74% <15.74%> (ø)
...anagement/metadata_tracker/MetadataTrackerMongo.py 17.94% <17.94%> (ø)
.../management/results_tracker/ResultsTrackerLocal.py 20.4% <20.4%> (ø)
...anagement/metadata_tracker/MetadataTrackerLocal.py 37.83% <37.83%> (ø)
...s/management/results_tracker/ResultsTrackerNull.py 60% <60%> (ø)
...agement/results_tracker/ResultsTrackerInterface.py 60% <60%> (ø)
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 51bfd03...b44b78c. Read the comment docs.

@sacdallago
Copy link
Member Author

Fixed the issues, I'm actually quite glad @b-schubert wrote those tests :D Def. good having them.

@thomashopf
Copy link
Contributor

Great, I'll have a look as soon as I can spare some time.

@sacdallago
Copy link
Member Author

Just pinging @thomashopf 🤓

I know it's 🏄🏻‍♂️ season :) :)

@thomashopf
Copy link
Contributor

thomashopf commented Jul 30, 2018 via email

@sacdallago
Copy link
Member Author

I hate to push but there are a lot of inconvenient name changes in this PR which if not integrated soon slow me down a lot on updating the current running version on evcouplings.org + database table names etc. 🙇🏻‍♂️

…aths with extensions (e.g. something.pdb). Mongo doesn't like ., so change that to underscore
…130388638c0fa_b0.2_1oxo_B_33.pdb are nested in remapped_pdb_files, so must apply key correction recursively
@sacdallago
Copy link
Member Author

sacdallago commented Dec 16, 2018

Santa's coming to town @thomashopf :) with a lot of code to review yay! 🙈

P.S.: these changes have been tested extensively in production via the web app for over 4 months now. Since one month I'm keeping track of the jobs submitted via web to see success rate etc. (also, the way I discovered bugs like #194 ), you can see all of the jobs since Nov 20 in /n/groups/marks/web/backend/runs/ if you have access to o2, there are over 55 jobs there now (including complex jobs and monomer jobs using many different combinations of parameters).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants