Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create exploded association exports that traverses closures #588

Merged
merged 5 commits into from
Jun 7, 2024

Conversation

kevinschaper
Copy link
Member

The first step in this process ended up being converting the existing tsv export code from what we inherited from the old stack using Solr, to much simpler export from duckdb.

The second step is configuring the additional export of the "blow up" of joining through the closure table on either side of an association.

For genes, I'm running into genes not being present in the closure table. I think it would be a smart step in closurizer to add self-associations to the closure able (reflexivity) for any node that we don't already have in the closure. In practice that mostly means that genes need to be a subclass of themselves, I think. Maybe another predicate would make sense.

For disease, it's just about looking at the size of the combinatorial explosion, since it's happening on both sides. Right now it looks like it's producing on the order of 30M edges for gene to phenotype and 15m for disease to phenotype (fewer direct associations, but combinatorial blow up of both diseases and phenotypes).

@kevinschaper kevinschaper marked this pull request as ready for review June 5, 2024 16:56
@kevinschaper
Copy link
Member Author

This ended up being a bit more about generally converting to duckdb for the exports and secondarily adding the exploded exports.

@kevinschaper kevinschaper merged commit 2cbd1b1 into main Jun 7, 2024
2 checks passed
@kevinschaper kevinschaper deleted the duckdb-export branch June 7, 2024 03:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants