Ongoing transition to DB backend (SQLite) #184

hendrikweisser · 2021-03-11T18:28:26Z

Changes to Whitelist.py coming up next.

…ror in DataStore call ('add_read' renamed to 'store_read')

…pore into new_data_store

hendrikweisser · 2021-03-12T19:44:17Z

Whitelist is updated and seems to work, but the read references are now database keys, so SampComp needs updating as well.

…Whitelist' accordingly

tleonardi

Hi @hendrikweisser,
thanks a lot for all these updates, great job!
I only have one change to suggest: would it be possible to move the DB logic from Whitelist to DataStore and then from Whitelist only call DataStore methods?
This way everything DB related is in DataStore, and Whitlist becomes agnostic of the storage backend. In the future if we were to change storage backend or add alternative ones we would only have to modify/rewrite the DataStore without having to touch Whitlist, SampComp etc.
What do you think?

hendrikweisser · 2021-03-17T12:20:54Z

Hi @tleonardi!
Totally agree about moving the DB logic. In fact I've already done that, after I realised that much of the same functionality would be needed in SampComp. I just wasn't sure about whether it should go into DataStore; for the moment I've moved it to a new class DatabaseWrapper. Is DataStore intended to specifically handle storing of data? It currently creates the database if that doesn't exist and adds tables, which isn't ideal for just read access.

Do we want one class that handles general DB functionality? Or one lower-level/read access class plus DataStore for writing? (My intention was to adapt DataStore to use DatabaseWrapper, but perhaps combining them into one class would indeed be cleaner.)

I'll push my updates from last night so you can have a look.

hendrikweisser · 2021-03-17T12:39:10Z

One more point: I've implemented the read-level filtering in Whitelist via constraints in the database query. So the database and filtering are very closely linked. I think it makes sense to keep this code in Whitelist, but it's not agnostic to the storage backend. We could move the code to the DB class, but then that class will contain quite a bit of the filtering logic. (I assume that it's more efficient to filter during the query than after it, but haven't tested that.)

…ot yet tested)

hendrikweisser · 2021-03-17T20:23:51Z

I went ahead and put all the database code in DataStore. I still need to test it and probably fix some bugs, but I wanted to avoid you looking at an outdated version.

hendrikweisser · 2021-03-23T20:03:57Z

I've made some more changes. SampComp now works with data from the SQLite DB, including the statistical tests in txCompare. It fails when SampCompDB gets involved to handle the results, but the plan was anyway to replace SampCompDB and shelve with more SQLite. That I still need to do.

nanocompore/SampComp.py

…Comp'

…elements)

…ing during table creation

…statistical test per k-mer

… based on processing options

hendrikweisser · 2021-07-23T19:30:08Z

I just pushed more udpates. In principle, the whole pipeline (Eventalign_collapse -> Whitelist -> SampComp -> save_report) should work now. Combining adjacent p-values in a sequence context and multiple testing correction are supported and (lightly) tested, but a lot of the other options aren't tested and may still need updating. In the export class (now called PostProcess - should probably rename to DataExport), save_to_bed isn't updated yet. __main__.py isn't updated yet, either.

hendrikweisser · 2021-07-23T19:31:04Z

Oh, SampCompDB.py should be obsolete now and can probably be removed.

…aso!)

…tion optional

hendrikweisser · 2021-08-17T10:51:12Z

@tleonardi: As discussed I've updated the CLI options. Input files (e.g. "-i") must be specified as full paths, but for output files there's the option to specify a directory ("-d") and use the default filename. I've commented out the YAML input option for SampComp, but if you think it's useful just put it back in. There are some other changes (e.g. due to the simplification of tests performed by SampComp) but they should be straightforward.

I've done some light testing and it works on my example data. (Unfortunately the problem that TxComp spawns way too many threads persists for me.)

…out 20%)

hendrikweisser added 7 commits March 9, 2021 13:57

DataStore: small fix in SQL query, add to-do comment

78e923b

Eventalign_collapse: add flag for writing to DB (or TSV file), fix er…

e5d7621

…ror in DataStore call ('add_read' renamed to 'store_read')

Merge branch 'new_data_store' of https://github.com/tleonardi/nanocom…

51a87e4

…pore into new_data_store

better log message in 'DataStore.__init__'

d7edde9

Merge branch 'new_data_store' of https://github.com/tleonardi/nanocom…

ba2da54

…pore into new_data_store

coding style (added spaces for readability)

afe005c

store read-level kmer stats in database (needed for whitelisting)

7f60da1

hendrikweisser requested review from a-slide and tleonardi as code owners March 11, 2021 18:28

Whitelist: read data from SQLite, filter reads during query

da7eb4b

hendrikweisser added 2 commits March 16, 2021 19:54

add function to check validity of sample dictionary to 'common.py'

2fefe7c

new class 'DatabaseWrapper' for reusable DB interaction code; adapt '…

8f52c1c

…Whitelist' accordingly

tleonardi reviewed Mar 17, 2021

View reviewed changes

hendrikweisser added 2 commits March 17, 2021 13:24

Whitelist: fix filtering condition for sample subsets

6121376

consolidate database code in 'DataStore', remove 'DatabaseWrapper' (n…

989af13

…ot yet tested)

hendrikweisser added 5 commits March 23, 2021 19:47

DataStore: small fixes, move 'DBCreateMode' (enum) to top level

6dbbdb2

Eventalign_collapse: small fix in 'DataStore' call

031a995

Whitelist: updated 'DataStore' call

9cc0806

SampComp: get data from SQLite DB; some refactoring

a1cb349

TxComp: update to changes in 'SampComp'; some refactoring

fd0d446

hendrikweisser changed the title ~~Add read-level kmer stats to database (for whitelisting)~~ Ongoing transition to DB backend (SQLite) Mar 23, 2021

tleonardi reviewed Mar 24, 2021

View reviewed changes

nanocompore/SampComp.py Outdated Show resolved Hide resolved

hendrikweisser added 3 commits March 29, 2021 20:11

refactor DataStore, create child classes 'DS_EventAlign' and 'DS_Samp…

69563e0

…Comp'

use 'DataStore_EventAlign' in 'Eventalign_collapse' and 'Whitelist'

b1ad899

TxComp: simplify 'txCompare' results data structure (remove 'lowCov' …

b8de4e9

…elements)

hendrikweisser added 15 commits April 6, 2021 21:19

SampComp: remove unused parameters, write output to SQLite

24881a4

TxComp: coding style - added whitespace

fa0ec95

DataStore: split 'gmm_results' SQL table into two; improve error logg…

112e50d

…ing during table creation

add new class 'PostProcess' for data export etc. (work in progress)

6d00f2e

SampComp/TxComp/DataStore: limit to one univariate and one GMM-based …

d8324c7

…statistical test per k-mer

DataStore: improve definition of SQL tables

a7e9269

TxComp: cosmetic changes

5b35d30

DataStore/SampComp: add DB columns for adj. p-values, adapt DB schema…

833cc2e

… based on processing options

TxComp: combine collection of functions into class 'TxComp'

2a35003

DataStore: small bug fixes (add 'self' for method calls)

3ff8038

SampComp: use new 'TxComp' class, simplify parameter handling

6270cdc

TxComp/DataStore: bug fixes (use of 'sequence_context')

bd3499b

SampComp: add multiple testing correction, remove 'shelve' export

39c5009

PostProcess: implement 'save_report' for SQLite data

c6faa5e

PostProcess: remove 'save_shift_stats' (now included in 'save_report')

bc3e8ba

hendrikweisser added 7 commits August 12, 2021 16:30

Eventalign_collapse: remove TSV output option, simplify parameters

479b443

SampComp: remove irrelevant data from output queue tuple (thanks Tomm…

543653d

…aso!)

common: update function to build dict. with sample information

1721746

DataStore: add to-do comment

fc3f0bb

main: update command line options

3219601

main: fix PostProcess (TSV export) usage

e7ef11e

main: update CLI documentation (minimal examples), make report genera…

f6d82ee

…tion optional

hendrikweisser added 4 commits August 25, 2021 14:20

Eventalign_collapse: small optimization (input file reading)

75193d7

SuperParser: add spaces (coding style)

d967101

Whitelist: add to-do comment

5980637

DataStore: reduce file size of 'eventalign_collapse' output DB (by ab…

314bf1f

…out 20%)

hendrikweisser mentioned this pull request Oct 13, 2021

Transition to SQLite backend - use transcript-specific databases #192

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ongoing transition to DB backend (SQLite) #184

Ongoing transition to DB backend (SQLite) #184

hendrikweisser commented Mar 11, 2021

hendrikweisser commented Mar 12, 2021

tleonardi left a comment

hendrikweisser commented Mar 17, 2021

hendrikweisser commented Mar 17, 2021

hendrikweisser commented Mar 17, 2021

hendrikweisser commented Mar 23, 2021

hendrikweisser commented Jul 23, 2021

hendrikweisser commented Jul 23, 2021

hendrikweisser commented Aug 17, 2021

Ongoing transition to DB backend (SQLite) #184

Are you sure you want to change the base?

Ongoing transition to DB backend (SQLite) #184

Conversation

hendrikweisser commented Mar 11, 2021

hendrikweisser commented Mar 12, 2021

tleonardi left a comment

Choose a reason for hiding this comment

hendrikweisser commented Mar 17, 2021

hendrikweisser commented Mar 17, 2021

hendrikweisser commented Mar 17, 2021

hendrikweisser commented Mar 23, 2021

hendrikweisser commented Jul 23, 2021

hendrikweisser commented Jul 23, 2021

hendrikweisser commented Aug 17, 2021