Skip to content
This repository has been archived by the owner on Jan 24, 2018. It is now read-only.

speed ups to rna quant ingest #1564

Merged
merged 4 commits into from
Feb 14, 2017
Merged

speed ups to rna quant ingest #1564

merged 4 commits into from
Feb 14, 2017

Conversation

ejacox
Copy link
Collaborator

@ejacox ejacox commented Feb 10, 2017

I made the expression id an auto increment field and sped up reading the expression data from the file. Closes #1549

@ejacox
Copy link
Collaborator Author

ejacox commented Feb 13, 2017

Added fix to oidc problem also to close #1566.

@codecov-io
Copy link

codecov-io commented Feb 13, 2017

Codecov Report

Merging #1564 into master will increase coverage by 0.99%.
The diff coverage is 67.64%.

@@            Coverage Diff             @@
##           master    #1564      +/-   ##
==========================================
+ Coverage   84.62%   85.61%   +0.99%     
==========================================
  Files          33       33              
  Lines        7166     7188      +22     
  Branches      897      898       +1     
==========================================
+ Hits         6064     6154      +90     
+ Misses        937      851      -86     
- Partials      165      183      +18
Impacted Files Coverage Δ
ga4gh/server/repo/rnaseq2ga.py 62.43% <67.64%> (+45.66%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c5306d5...9d74bb1. Read the comment docs.

@david4096 david4096 self-requested a review February 13, 2017 21:59
Copy link
Member

@david4096 david4096 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks like the speedups come from removing the dict deserialization step and autoincrementing the ID field.

@ejacox
Copy link
Collaborator Author

ejacox commented Feb 13, 2017

@david4096 I removed uuid id generation, which was very slow. The auto increment was just for convenience. The ids can be generated by incrementing a counter in the code, too. On reflection, this will need to be changed to assign ids incrementally by RnaQunatifiction in order to have reproducible ids. The primary key will need to be changed to be the id column and the rna_quantification_id.

@ejacox
Copy link
Collaborator Author

ejacox commented Feb 14, 2017

I changed the id as described above and added an ingest test to increase the code coverage.

@@ -157,6 +190,7 @@ def writeExpression(self, rnaQuantificationId, quantfilename,
rawCount, score, units, confidenceLow,
confidenceHi)
self._db.addExpression(datafields)
expressionId += 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Do you think this same idea will work for adding expressions in parallel (ignoring sqlite)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, since each file will be processed within it's own process/thread.


testTsvFile = os.path.join(
paths.testDataDir,
"datasets/dataset1/rnaQuant/rsem_test_data.tsv")
Copy link
Member

@david4096 david4096 Feb 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is really just guaranteeing the whole this doesn't go belly up, which is a great improvement! Also makes me think we could add it to the https://github.com/ga4gh/server/blob/master/scripts/build_test_data.py

@david4096 david4096 merged commit 6e83125 into ga4gh:master Feb 14, 2017
@ejacox ejacox mentioned this pull request Feb 14, 2017
andrewjesaitis pushed a commit to andrewjesaitis/server that referenced this pull request Feb 14, 2017
* speed ups to rna quant ingest

* fixed oidc problem

* added tests and changed expression ids to be unique within rnaquant

* flake fixes
@ejacox ejacox mentioned this pull request Feb 15, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants