Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DuckDB support for read_json #1007

Closed
paulsm opened this issue Mar 7, 2024 · 11 comments
Closed

DuckDB support for read_json #1007

paulsm opened this issue Mar 7, 2024 · 11 comments
Labels
enhancement New feature or request

Comments

@paulsm
Copy link

paulsm commented Mar 7, 2024

The Observable Framework documentation explains how to use the read_parquet or read_csv DuckDB functions to consume from endpoints that expose those data formats. It would be useful to extend this to support the DuckDB json extension's read_json function also, use of which currently fails with this error message:

Error: Catalog Error: Table Function with name "read_json" is not in the catalog, but it exists in the json extension.

Please try installing and loading the json extension by running:
INSTALL json;
LOAD json;

Alternatively, consider enabling auto-install and auto-load by running:
SET autoinstall_known_extensions=1;
SET autoload_known_extensions=1;
@paulsm paulsm added the enhancement New feature or request label Mar 7, 2024
@paulsm
Copy link
Author

paulsm commented Mar 7, 2024

The error goes away if I upgrade the code that resolves this dependency version on @duckdb/duckdb-wasm to 1.28.1-dev159.0 instead of the current 1.28.0.

@mbostock
Copy link
Member

mbostock commented Mar 7, 2024

Note that you should be able to use the latest prerelease version of DuckDB by saying:

import * as duckdb from "npm:@duckdb/duckdb-wasm@next";

You can also initialize your npm cache to a specific version (an undocumented feature; you’ll need to restart the preview server after doing this):

rm -rf docs/.observablehq/cache/_npm/@duckdb
mkdir docs/.observablehq/cache/_npm/@duckdb/duckdb-wasm@1.28.1-dev159.0

We should offer a more explicit way to change the default resolution in the project config. I’ll file an issue for that. I also filed duckdb/duckdb-wasm#1661 to track the current issue with DuckDB’s latest tag which if fixed would mean we don’t need to pin to 1.28.0.

@Fil
Copy link
Contributor

Fil commented Mar 7, 2024

Additionally, when this becomes possible (probably with the next stable release of DuckDB), we would want sql: table: source.json to work out of the box (without requiring the user to specify INSTALL json). [Sorry this is clearly wrong, see Mike's post below] (I'd also want an option to specify (from front matter?) that I want to INSTALL spatial.)

@arnicas
Copy link

arnicas commented Mar 7, 2024

oh good to know, i just tried json and got very confused.

Additionally, when this becomes possible (probably with the next stable release of DuckDB), we would want sql: table: source.json to work out of the box (without requiring the user to specify INSTALL json). (I'd also want an option to specify (from front matter?) that I want to INSTALL spatial.)

@mbostock
Copy link
Member

mbostock commented Mar 7, 2024

Just to be clear, JSON already works out of the box with 1.28.0:

---
sql:
  foo: foo.json
---

```sql echo
SELECT * FROM foo
```

What doesn’t work is read_json because this requires the json extension, e.g.,

```js echo
const db = await DuckDBClient.of();
const url = await FileAttachment("foo.json").url();
await db.query(`CREATE VIEW foo AS SELECT * FROM read_json('${url}', format = 'array', columns = {foo: 'UBIGINT', bar: 'VARCHAR'});`);
const sql = db.sql.bind(db);
```

```sql echo
SELECT * FROM foo
```

But you shouldn’t need read_json if you just want to read a JSON file.

@Fil
Copy link
Contributor

Fil commented Mar 7, 2024

Should we close this issue? We don't really have to support read_json.

@mbostock
Copy link
Member

mbostock commented Mar 7, 2024

I would like to hear from @arnicas @paulsm how they encountered this issue. What were they trying to do with read_json?

@paulsm
Copy link
Author

paulsm commented Mar 7, 2024

@mbostock - my use case is to use the read_json feature of DuckDB to present information provided by a data source accessible to DuckDB at runtime, not to data loaders at build time. I think the issue can be closed in anticipation of the ability to use a more recent version of duckdb-wasm by default, or because you've already described how to use or pin to a specific version of it, which resolves the problem.

@mbostock
Copy link
Member

mbostock commented Mar 7, 2024

Thanks @paulsm.

information provided by a data source accessible to DuckDB at runtime

Meaning read_json from an externally-hosted URL, rather than a local file (whether static or generated by data loader)? Makes sense! You could also workaround it by fetching the file and passing it to DuckDBClient.of().

const db = await DuckDBClient.of({foo: fetch(url).then((r) => r.json())});
const sql = db.sql.bind(db);

We’re also working on adding support for external URLs in the front matter (but that won’t help if your URL is dynamic).

@paulsm
Copy link
Author

paulsm commented Mar 7, 2024

@mbostock - that's right: an external source, not an internal one, and one for which the read_json function does provide access when using DuckDB in any other context. I've got a number of workarounds, but the simple update to a more current release of duckdb-wasm in Observable Framework is sufficient to bring the support for the function to a state consistent with other read_xxx functions in DuckDB.

@mbostock
Copy link
Member

Externally-hosted sources for DuckDB were implemented in #1098. The remaining issues here are covered by DuckDB extensions #1070 and using the next prerelease version of DuckDB #1009.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants