Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store sample sheet render tables with sodarcache #1509

Closed
8 tasks done
mikkonie opened this issue Oct 31, 2022 · 2 comments
Closed
8 tasks done

Store sample sheet render tables with sodarcache #1509

mikkonie opened this issue Oct 31, 2022 · 2 comments
Assignees
Labels
app: samplesheets Issue in the samplesheets app feature Requested feature or enhancement
Milestone

Comments

@mikkonie
Copy link
Contributor

mikkonie commented Oct 31, 2022

Recently I've observed several scaling issues with very large studies (5000+ samples). The major bottleneck is the building of study tables, which can take a very long time for such complex graphs.

This can be somewhat improved via smaller optimizations such as #708. However, a relatively simple but a more substantial improvement would be to store the render tables in the database using sodarcache. Thus we could skip the building part whenever the sheets or related apps have not been altered.

As added bonus, these cached tables could easily be used in loading partial table data into the vue app, which would improve performance even further. Also, they can be used for e.g. cache updates and other similar functionality, meaning we would only have to do the potentially slow rendering only once between sheet edits.

I've written out a spec for a simple but working implementation below. The downside of this is that if someone is doing a lot of edits to the same study many other users are browsing, this creates a bunch of redundant database operations for deleting and rebuilding the cached data. The alternative is to store some kind of a state for "currently/recently edited", but that can be complex as in the current implementation, we can't really force a user to click on "finish editing".

Spec

  • Refactor build_study_tables() parameters regarding different versions (see below)
  • Create a new method for getting the render tables
    • Study specific, called in place of build_study_tables()
    • IF found in cache, return that
    • ELSE build tables from cache, store in cache, return
    • TBD: How to handle variables: edit, use_config, ui
      • See 1st comment
  • Call new method whenever tables are currently retrieved
    • TBD: Any exceptions?
      • See 1st comment
  • When performing a successful edit, delete cached items (both for browsing or editing)
    • The next user to retrieve the tables has to rebuild them in cache
  • Add a management command for caching sheets for all projects or a specific project

Tasks

  • Refactor build_study_tables() parameters
  • Set up get/render method
  • Set up method use
  • Add cache clearing for edits
  • Add management command
  • Test, test, test
  • Update docs where needed
  • Unify SampleSheetTableBuilder init in modules
@mikkonie mikkonie added feature Requested feature or enhancement tbd Comments wanted, spec/schedule/prioritization to be decided, etc. app: samplesheets Issue in the samplesheets app labels Oct 31, 2022
@mikkonie mikkonie self-assigned this Oct 31, 2022
@mikkonie mikkonie removed the tbd Comments wanted, spec/schedule/prioritization to be decided, etc. label Dec 1, 2022
@mikkonie mikkonie added this to the v0.13.0 milestone Dec 1, 2022
@mikkonie
Copy link
Contributor Author

mikkonie commented Dec 1, 2022

The first thing is to look at build_study_tables() and try to simplify the "versioning" of render tables.

At the moment we have 3 args: edit, ui and use_config. Some facts:

  • use_config is only used within samplesheets.sheet_config
  • edit and ui appear in all combinations

Looking into the Modifiers

  • ui=True
    • Add UI specific header fields (e.g. max lengths)
    • Set column type to NUMERIC if existing content
      • Not sure if this is even needed anymore with the sheet config feature?
    • Add col_last_vis to table
    • This was added to speed up rendering for internal table builds
      • If we use the cache, that won't be an issue
    • I'm not seeing any conflicts caused by these modifications anywhere where we currently set ui=False
    • Conclusion: We should be able to remove this arg and always render the UI specific things when building
  • edit=True
    • Add header type, node UUIDs and object headers
    • If we would change to only providing the UUID for the first element of a node, there wouldn't really be a meaningful difference (see Optimize JSON data provided to samplesheets Vue app #708)
      • This has now been done and the percentual difference between sheet sizes is not large
    • Ontology term accession is NOT replaced with user friendly URL
      • TODO: Convert this in client
        • Add SHEETS_ONTOLOGY_URL_TEMPLATE and SHEETS_ONTOLOGY_URL_SKIP into context
        • Implement conversion in Vue app when populating grid
        • Update Vue app unit test data
  • use_config=False
    • Only used in samplesheets.sheet_config
    • If false, field configs won't be set from sheet config
    • In build_display_config(), this setting is meaningless (if render time is not an issue)
      • We can call the future get_study_tables() method instead
    • Leave the remaining build_study_tables() calls with use_config=False as is
      • First import and restoring of a large sheet may be a bit slow because of this, but we can live with that
      • We need to make sure to delete a possible cached version after the sheet config is built or updated

@mikkonie
Copy link
Contributor Author

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
app: samplesheets Issue in the samplesheets app feature Requested feature or enhancement
Projects
None yet
Development

No branches or pull requests

1 participant