Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading large JSON data causes error because of Firefox threshold #13

Closed
eharkins opened this issue Aug 20, 2018 · 5 comments
Closed

Loading large JSON data causes error because of Firefox threshold #13

eharkins opened this issue Aug 20, 2018 · 5 comments
Assignees
Labels
data-in Change in shape of data going into app scale Having to do with scaling out to more data

Comments

@eharkins
Copy link
Contributor

jupyterlab/jupyterlab#4015

@metasoarous
Copy link
Member

metasoarous commented Aug 20, 2018

I see a few things that might help resolve this:

  • Right now the json we spit out is pretty printed into the file, which will make the string that gets loaded up in JS way bigger than it needs to be. Changing this in the build_olmsted_data.py script of cft will likely greatly improve this.
  • There may also be a way to zip/compress the json data which may resolve the issue.
  • Right now for seed-lineage-pruning reconstructions, we end up sending along sequence metadata for all of the sequences, not just the sequences chosen as representatives for the seed lineage. For minadcl, we do an aggregation step that leaves us with metadata only for the representative sequences, but its merely an oversight that because this step isn't necessary for the seed-lineage trees, we end up not reducing the dimensionality of the metadata. This should be a relatively easy fix in the cft pipeline.
  • Ultimately, breaking up the data payloads will be the real silver bullet here. The fundamental problem is that we're sending over not just all of the clonal families, but all of their trees, sequences, and sequence metadata. If instead we only loaded this additional data lazily as folks click on specific trees, this problem would be largely mitigated. The cost of doing this is a) much more architectural complexity loading data on demand, and b) tree/alignment details won't be able to render instantly if we do this because they'll have to wait for the data to get in. My suggestion is that we push this off as long as we can, and see where we end up.

@metasoarous
Copy link
Member

@eharkins If you are keen to get your fingers dirty with some data processing work, you could take a stab at nailing the third of the steps above (filtering seed-lineage downsampled metadata).

@eharkins eharkins self-assigned this Aug 22, 2018
@metasoarous metasoarous added scale Having to do with scaling out to more data data-in Change in shape of data going into app labels Sep 13, 2018
metasoarous pushed a commit to matsengrp/cft that referenced this issue Sep 25, 2018
…249)

* added height calculation for evenly spaces leaves

* floating point division

* adding nt sequences for Olmsted#17

* github.com/matsengrp/olmsted/issues/13

* including multiplicity for Olmsted(11)

* tabs -> spaces

* nt seqs dict using tripl lookup instead of fasta parser

* comment white space
@eharkins
Copy link
Contributor Author

Should we close this for now / now that matsengrp/cft#249 has been merged? Don't see an error on Firefox but I also never checked to see if I could reproduce the error in the first place. @metasoarous?

@metasoarous
Copy link
Member

I guess for now let's close this, since the issue isn't pressing anymore now that matsengrp/cft#249 has been merged. One of these issues has already broken off into #42. The other two could potentially be helpful to consider eventually, but aren't high priority.

metasoarous pushed a commit to matsengrp/cft that referenced this issue Oct 8, 2018
* added height calculation for evenly spaces leaves

* floating point division

* adding nt sequences for Olmsted#17

* github.com/matsengrp/olmsted/issues/13

* including multiplicity for Olmsted(11)

* tabs -> spaces

* nt seqs dict using tripl lookup instead of fasta parser

* comment white space

* first try on #250; using ecgtheow script to color pruned nodes

* set prune_count back to default
@metasoarous
Copy link
Member

Since increasing the number of clonal families sampled per unseeded sample, we're now having this issue again. The best solution is probably to #42 (split up clonal family details into separate files, which only get loaded once the given clonal family is selected). This will add a bit of delay to those detail views loading up for the first time, but will also reduce the amount of time it takes for the initial dataset load, and solve the FF loading problem for large datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-in Change in shape of data going into app scale Having to do with scaling out to more data
Projects
None yet
Development

No branches or pull requests

2 participants