Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial "headless" / DuckDB-based in-memory Data Explorer backend implementation as vscode extension #4963

Closed
wesm opened this issue Oct 9, 2024 · 1 comment
Assignees
Labels
area: data explorer Issues related to Data Explorer category. enhancement New feature or request

Comments

@wesm
Copy link
Contributor

wesm commented Oct 9, 2024

Initial spike for epic #2187 to be able to open Parquet and CSV/TSV files by clicking on the file explorer in the user's workspace.

@wesm wesm self-assigned this Oct 9, 2024
@wesm wesm added enhancement New feature or request area: data explorer Issues related to Data Explorer category. labels Oct 9, 2024
@petetronic petetronic added this to the Release Candidate milestone Oct 10, 2024
wesm added a commit that referenced this issue Oct 17, 2024
…kdb-wasm to provide "headless" data explorer backend (#4964)

For epic #2187, addresses #4963. 

This provides a new built-in positron-duckdb extension that loads
duckdb-wasm in a web worker and provides an RPC endpoint using VSCode's
command service for fulfilling Data Explorer requests. Only getting
schemas, data values, and null count summary statistics are supported
right now. So follow on work includes:

- Numeric formatting and string truncation (respecting the passed
FormatOptions)
- Row filtering
- Sorting
- Detailed summary statistics
- Histograms and frequency tables for sparklines

There are some rough edges, for example if you click on a file before
the extension is fully loaded at application startup, it will fail, so I
will need to consult others on how to fix that.

Lastly, I have checked in some small (~10K total) data files to use in
the extension tests (`yarn test-extension -l positron-duckdb`) and added
exclusions to hygiene.js so that pre-commit checks do not complain about
them. I'm not sure if there is a better way to handle this.

Other notes:

- Added code to comms/generate-comms.ts to generate interfaces
containing all the parameters for each RPC, same as there already is for
Rust and Python, which was needed to provide a fully formed command
protocol to communicate with the extension. We can potentially look at
further improving the TypeScript code generation.
- I copied the interface stubs needed into an interfaces.ts file in the
extension. Maybe it's possible to cross-import from the main codebase
into the extension but I do not know the right incantation of
tsconfig.json/package.json configurations to do this.

In action

https://github.com/user-attachments/assets/70dabb96-6330-49e4-8db1-10293c331051

### QA Notes

You can click on .parquet, .csv, or .tsv files in the file explorer
after Positron has loaded to open the data explorer.

---------

Co-authored-by: Jonathan McPherson <jonathan@rstudio.com>
@testlabauto
Copy link
Contributor

Verified Fixed

Positron Version(s) : 2024.11.0-65
OS Version          : OSX

Test scenario(s)

Verified with flights.parquet and flights.parquet saved as csv. Also checked tsvs.

Link(s) to TestRail test cases run or created:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: data explorer Issues related to Data Explorer category. enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants