Create exploded association exports that traverses closures #588
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The first step in this process ended up being converting the existing tsv export code from what we inherited from the old stack using Solr, to much simpler export from duckdb.
The second step is configuring the additional export of the "blow up" of joining through the closure table on either side of an association.
For genes, I'm running into genes not being present in the closure table. I think it would be a smart step in closurizer to add self-associations to the closure able (reflexivity) for any node that we don't already have in the closure. In practice that mostly means that genes need to be a subclass of themselves, I think. Maybe another predicate would make sense.
For disease, it's just about looking at the size of the combinatorial explosion, since it's happening on both sides. Right now it looks like it's producing on the order of 30M edges for gene to phenotype and 15m for disease to phenotype (fewer direct associations, but combinatorial blow up of both diseases and phenotypes).