Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poc: add databases in datasource #3735

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

Light2Dark
Copy link
Collaborator

@Light2Dark Light2Dark commented Feb 9, 2025

📝 Summary

image

TODOs:

  • the frontend, datasources-panel.tsx needs work, quite tricky :/
  • need to filter variables and remove old db,tables
  • add tests, fix typing
  • stress test with large db
  • test with different db's. The worst thing to do is it hangs when there's an error in the bg. It should fail and return no data

🔍 Description of Changes

Experimenting a little, I find an optimal request solution is something like below. Make an initial req for db, schemas and table list, upon expanding a table, make another request to get column info, PKs, indexes. This avoids too long of an initial request time while fetching sufficient info. With snowflake sample db (7 schemas, 10+ tables each), initial request takes about 3s. If I fetched all data, this would take ~25s

image

📋 Checklist

  • I have read the contributor guidelines.
  • For large changes, or changes that affect the public API: this change was discussed or approved through an issue, on Discord, or the community discussions (Please provide a link if applicable).
  • I have added tests for the changes made.
  • I have run the code and verified that it works as expected.

📜 Reviewers

@akshayka OR @mscolnick

Copy link

vercel bot commented Feb 9, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
marimo-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 15, 2025 4:43pm
marimo-storybook ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 15, 2025 4:43pm

@mscolnick
Copy link
Contributor

@Light2Dark, a user on Discord had 100+ tables. We don't need to lazy load tables too just yet, but could be worth keeping a lookout / design in such a way to support it later. I do think by avoiding loading the columns its only 2 round trips (instead of a fan-out).

@@ -15,3 +21,13 @@ export const FUNCTIONS_REGISTRY = new DeferredRequestRegistry<
...req,
});
});

export const PreviewSQLTable = new DeferredRequestRegistry<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me know if its worth wrapping DeferredRequestRegistry in an LRU cache. happy to write a class for that if you'd want

marimo/_data/models.py Show resolved Hide resolved
marimo/_runtime/runner/hooks_post_execution.py Outdated Show resolved Hide resolved
marimo/_server/api/endpoints/datasources.py Show resolved Hide resolved
@@ -472,6 +488,38 @@ components:
- time
- unknown
type: string
Database:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding the new objects to marimo/_cli/development/commands.py should result in a cleaner yaml file

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be kinda clean already, the objects show up nicely 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh shoot, sorry i misread this

@Light2Dark
Copy link
Collaborator Author

Light2Dark commented Feb 13, 2025

hey @mscolnick, the frontend seems a little tricky. Is there a reason you're putting the following state in jotai? Instead of local to a component

function initialState(): DatasetsState {
  return {
    tables: [],
    expandedTables: new Set(),
    expandedColumns: new Set(),
    columnsPreviews: new Map(),
  };
}

referring to the expanded state

@mscolnick
Copy link
Contributor

@Light2Dark, jotai was originally used because request and responses came from different places. if you want to useAsyncData and keep the data local, that is fine. but we would need to handle caching too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants