Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor API calls within LineageComparisonComponent #586

Open
flaneuse opened this issue Nov 19, 2022 · 0 comments
Open

Refactor API calls within LineageComparisonComponent #586

flaneuse opened this issue Nov 19, 2022 · 0 comments

Comments

@flaneuse
Copy link
Collaborator

Right now, the API calls on LineageComparisonComponent (outbreak.info/compare-lineages) are very large, moving large amounts of data which is slow. As a result, our API backend often crashes when there are too many requests to this endpoint, as large amounts of data get shuttled around.

To create the heatmap on the page, the function getLineagesComparison calls getCharacteristicMutations(apiurl, lineage, 0, true, includeSublineages), which gives all mutations within a lineage, and then filters it to any mutation which appears in the lineage at a prevalence greater than the frequency threshold (default = 0.75). This step is necessary, because if you set frequency = 0.75, you would be missing data for mutations which exist in the lineage below the threshold:

Incorrect: missing cells for B.1.427 x A67V, B.1.427 x DEL69/70, B.1.427 x T95O, etc., which implies those mutations have not been found in the lineage, as opposed to "have been found, but at low prevalence":
Screenshot 2022-11-18 at 12 25 36 PM

Correct but super slow, since the frequency=0 query is HUGE.
Screenshot 2022-11-18 at 12 26 07 PM

To improve this, we could first get all the mutations which exist in the lineages above that threshold, then calculate the mutation prevalence in each lineage.

For instance, BA.3 and B.1.427 Comparison page:

  1. The initial API call should identify the mutations which occur in either of those lineages (BA.3 or B.1.427) at 75% or greater. This should identify the following set of mutations for each, just looking at gene == "S":
BA.3: ['s:g142d', 's:n211i', 's:d614g', 's:h655y', 's:n679k', 's:a67v', 's:del69/70', 's:n969k', 's:q954h', 's:d796y', 's:p681h', 's:del143/145', 's:del212/212', 's:t95i', 's:n764k'],
B.1.427: ['s:d614g', 's:l452r', 's:s13i', 's:w152c']
  1. Then, you can call https://api.outbreak.info/genomics/mutations-by-lineage with mutations as each of the mutations and pango_lineage as each of the lineages. (e.g. https://api.outbreak.info/genomics/mutations-by-lineage?mutations=S:A67V&pangolin_lineage=BA.3). You can combine mutations by AND to loop over each of them simultaneously -- however, mutations that don't exist within the lineage (like S:S13I in BA.3) will cause the entire API call to fail with a status code of 500.

First steps:

  • Profile if this approach would actually improve speed for a realistic set of lineages (for instance, the default set of lineages on outbreak.info/compare-lineages)
  • If so, implement it in the front-end.
  • Alternative approaches are welcome too.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant