speed ups to rna quant ingest #1564

ejacox · 2017-02-10T16:39:22Z

I made the expression id an auto increment field and sped up reading the expression data from the file. Closes #1549

ejacox · 2017-02-13T19:24:29Z

Added fix to oidc problem also to close #1566.

codecov-io · 2017-02-13T19:41:39Z

Codecov Report

Merging #1564 into master will increase coverage by 0.99%.
The diff coverage is 67.64%.

@@            Coverage Diff             @@
##           master    #1564      +/-   ##
==========================================
+ Coverage   84.62%   85.61%   +0.99%     
==========================================
  Files          33       33              
  Lines        7166     7188      +22     
  Branches      897      898       +1     
==========================================
+ Hits         6064     6154      +90     
+ Misses        937      851      -86     
- Partials      165      183      +18

Impacted Files	Coverage Δ
ga4gh/server/repo/rnaseq2ga.py	`62.43% <67.64%> (+45.66%)`	✅

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c5306d5...9d74bb1. Read the comment docs.

david4096

Thanks, looks like the speedups come from removing the dict deserialization step and autoincrementing the ID field.

ejacox · 2017-02-13T23:08:29Z

@david4096 I removed uuid id generation, which was very slow. The auto increment was just for convenience. The ids can be generated by incrementing a counter in the code, too. On reflection, this will need to be changed to assign ids incrementally by RnaQunatifiction in order to have reproducible ids. The primary key will need to be changed to be the id column and the rna_quantification_id.

ejacox · 2017-02-14T01:11:38Z

I changed the id as described above and added an ingest test to increase the code coverage.

david4096 · 2017-02-14T01:44:01Z

ga4gh/server/repo/rnaseq2ga.py

@@ -157,6 +190,7 @@ def writeExpression(self, rnaQuantificationId, quantfilename,
                              rawCount, score, units, confidenceLow,
                              confidenceHi)
                self._db.addExpression(datafields)
+                expressionId += 1


Cool! Do you think this same idea will work for adding expressions in parallel (ignoring sqlite)?

Yes, since each file will be processed within it's own process/thread.

david4096 · 2017-02-14T01:44:56Z

tests/datadriven/test_rna_quantification.py

+
+        testTsvFile = os.path.join(
+                            paths.testDataDir,
+                            "datasets/dataset1/rnaQuant/rsem_test_data.tsv")


This test is really just guaranteeing the whole this doesn't go belly up, which is a great improvement! Also makes me think we could add it to the https://github.com/ga4gh/server/blob/master/scripts/build_test_data.py

* speed ups to rna quant ingest * fixed oidc problem * added tests and changed expression ids to be unique within rnaquant * flake fixes

ejacox added 2 commits February 9, 2017 15:53

speed ups to rna quant ingest

b4f795c

fixed oidc problem

85566b2

david4096 self-requested a review February 13, 2017 21:59

david4096 approved these changes Feb 13, 2017

View reviewed changes

ejacox added 2 commits February 13, 2017 16:40

added tests and changed expression ids to be unique within rnaquant

1f2cc83

flake fixes

9d74bb1

david4096 reviewed Feb 14, 2017

View reviewed changes

david4096 merged commit 6e83125 into ga4gh:master Feb 14, 2017

ejacox mentioned this pull request Feb 14, 2017

Master build failling #1566

Closed

andrewjesaitis pushed a commit to andrewjesaitis/server that referenced this pull request Feb 14, 2017

speed ups to rna quant ingest (ga4gh#1564)

999969a

* speed ups to rna quant ingest * fixed oidc problem * added tests and changed expression ids to be unique within rnaquant * flake fixes

ejacox mentioned this pull request Feb 15, 2017

fixed problem #1572

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speed ups to rna quant ingest #1564

speed ups to rna quant ingest #1564

ejacox commented Feb 10, 2017

ejacox commented Feb 13, 2017

codecov-io commented Feb 13, 2017 •

edited

Loading

david4096 left a comment

ejacox commented Feb 13, 2017

ejacox commented Feb 14, 2017

david4096 Feb 14, 2017

ejacox Feb 14, 2017

david4096 Feb 14, 2017 •

edited

Loading

speed ups to rna quant ingest #1564

speed ups to rna quant ingest #1564

Conversation

ejacox commented Feb 10, 2017

ejacox commented Feb 13, 2017

codecov-io commented Feb 13, 2017 • edited Loading

Codecov Report

david4096 left a comment

Choose a reason for hiding this comment

ejacox commented Feb 13, 2017

ejacox commented Feb 14, 2017

david4096 Feb 14, 2017

Choose a reason for hiding this comment

ejacox Feb 14, 2017

Choose a reason for hiding this comment

david4096 Feb 14, 2017 • edited Loading

Choose a reason for hiding this comment

codecov-io commented Feb 13, 2017 •

edited

Loading

david4096 Feb 14, 2017 •

edited

Loading