Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duckdb 1.29.0; self-host extensions #1734

Merged
merged 47 commits into from
Nov 2, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
c70e1bc
explicit duckdb 1.29.0; self-host core extensions; document
Fil Oct 8, 2024
0029c8c
configure which extensions are self-hosted
Fil Oct 10, 2024
feeaad8
Merge branch 'main' into fil/duckdb-wasm-1.29
Fil Oct 10, 2024
33aa5cb
hash extensions
Fil Oct 10, 2024
543f823
better docs
Fil Oct 10, 2024
7475589
cleaner duckdb manifest — now works in scripts and embeds
Fil Oct 11, 2024
47b6bd0
restructure code, extensible manifest
Fil Oct 11, 2024
abd0380
test, documentation
Fil Oct 11, 2024
7ac5d1d
much nicer config
Fil Oct 11, 2024
0adcb36
document config
Fil Oct 11, 2024
5365371
add support for mvp, clean config & documentation
Fil Oct 11, 2024
1fdf717
parametrized the initial LOAD in DuckDBClient
Fil Oct 11, 2024
bc712c3
tests
Fil Oct 11, 2024
2fb2878
bake-in the extensions manifest
Fil Oct 11, 2024
bc49674
fix test
Fil Oct 11, 2024
9a13f2a
don't activate spatial on the documentation
Fil Oct 11, 2024
e2c8b6c
Merge branch 'main' into fil/duckdb-wasm-1.29
Fil Oct 14, 2024
4a5128d
refactor: hash individual extensions, include the list of platforms i…
Fil Oct 14, 2024
13f892c
don't copy extensions twice
Fil Oct 14, 2024
8bb2866
Merge branch 'main' into fil/duckdb-wasm-1.29
Fil Oct 18, 2024
43ef6eb
Merge branch 'main' into fil/duckdb-wasm-1.29
Fil Oct 19, 2024
6764969
Merge branch 'main' into fil/duckdb-wasm-1.29
mbostock Oct 20, 2024
d72f0c3
Update src/duckdb.ts
Fil Oct 20, 2024
d6fc020
remove DuckDBClientReport utility
Fil Oct 21, 2024
69f25a2
renames
Fil Oct 21, 2024
30788e3
p for platform
Fil Oct 21, 2024
710f36a
centralize DUCKDBWASMVERSION and DUCKDBVERSION
Fil Oct 21, 2024
4f58100
clearer
Fil Oct 21, 2024
a8cfdcd
better config; manifest.extensions now lists individual extensions on…
Fil Oct 21, 2024
490d969
validate extension names; centralize DUCKDBBUNDLES
Fil Oct 21, 2024
aaff8f8
fix tests
Fil Oct 21, 2024
bc39bbe
Merge branch 'main' into fil/duckdb-wasm-1.29
Fil Oct 30, 2024
8bd0972
copy edit
Fil Oct 30, 2024
b90c22a
support loading non-self-hosted extensions
Fil Oct 30, 2024
b37be07
test duckdb config normalization & defaults
Fil Oct 30, 2024
9abaf57
documentation
Fil Oct 30, 2024
ccc0073
typography
Fil Oct 30, 2024
26c7a6f
doc
Fil Oct 31, 2024
4416dd3
Merge branch 'main' into fil/duckdb-wasm-1.29
mbostock Nov 1, 2024
7704416
use view for <50MB
mbostock Nov 1, 2024
1dde616
docs, shorthand, etc.
mbostock Nov 1, 2024
0491966
annotate fixes
mbostock Nov 1, 2024
be26385
disable telemetry on annotate tests, too
mbostock Nov 1, 2024
a23d3e4
tidier duckdb manifest
mbostock Nov 1, 2024
c753728
Merge branch 'main' into fil/duckdb-wasm-1.29
mbostock Nov 1, 2024
6e828c9
remove todo
mbostock Nov 1, 2024
365dbe3
more robust duckdb: scheme
mbostock Nov 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions docs/lib/duckdb.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,3 +105,61 @@ const sql = DuckDBClient.sql({quakes: `https://earthquake.usgs.gov/earthquakes/f
```sql echo
SELECT * FROM quakes ORDER BY updated DESC;
```

## Extensions

DuckDB has a flexible extension mechanism that allows for dynamically loading extensions. These may extend DuckDB's functionality by providing support for additional file formats, introducing new types, and domain-specific functionality.

### Built-in extensions

The built-in extensions are statically linked to the default bundle. In other words, they are immediately available to use. Currently this includes "httpfs" (and others?).

### Installing extensions

Installing an extension, in DuckDB-wasm, references the source file or extensions repository that holds it. Thus, you can specify:

```sql echo run=false
INSTALL h3 FROM community;
LOAD h3;
SELECT format('{:x}', h3_latlng_to_cell(37.77, -122.43, 9)) AS cell_id;
```

Beyond the official extensions repositories (with core extensions at `https://extensions.duckdb.org` and community extensions at `https://community.duckdb.org`), you can install an extension from an explicit URL:

```sql echo run=false
INSTALL custom FROM 'https://example.com/v1.1.1/wasm_mvp/custom.wasm';
```

### Self-hosted core extensions

Framework downloads a copy of the [core extensions](https://duckdb.org/2023/12/18/duckdb-extensions-in-wasm.html), and the DuckDBClient installs them by default. This ensures that all the common extensions ("json", "inet", "spatial", etc.), are self-hosted.

You can however override this (for example, if you need to test something against a new version of an extension), and install explicitly:

```sql echo run=false
INSTALL json FROM core;
-- use JSON features
```

### Loading extensions

Loading an extension actually downloads the build and makes its features available in subsequent queries. You can load an extension explicitly like so:

```sql echo run=false
LOAD spatial;
SELECT ST_Area('POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))'::GEOMETRY) as area;
```

Many of the core extensions are auto-loaded when their functions are used in a query. For example, the query below transparently loads the self hosted "json" extension:

```sql echo run=false
SELECT bbox FROM read_json('https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson');
```

Similarly, this query transparently loads the self-hosted "inet" extension:

```sql echo
SELECT '127.0.0.1'::INET AS ipv4, '2001:db8:3c4d::/48'::INET AS ipv6;
```

These features are tied to DuckDB wasm’s 1.29 version, and strongly dependent on its development cycle.
2 changes: 2 additions & 0 deletions docs/sql.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,3 +205,5 @@ Inputs.table(await sql([`SELECT * FROM gaia WHERE source_id IN (${[source_ids]})
When interpolating values into SQL queries, be careful to avoid [SQL injection](https://en.wikipedia.org/wiki/SQL_injection) by properly escaping or sanitizing user input. The example above is safe only because `source_ids` are known to be numeric.

</div>

For more information, see [DuckDB: extensions](./lib/duckdb#extensions).
35 changes: 35 additions & 0 deletions src/client/stdlib/duckdb.js
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,40 @@ Object.defineProperty(DuckDBClient.prototype, "dialect", {
value: "duckdb"
});

async function installLocalExtensions(database) {
const repo = new URL("../../_npm/extensions.duckdb.org", import.meta.url).href;
const connection = await database.connect();
await connection.query(
[
// "arrow",
"autocomplete",
// "aws",
// "azure",
// "delta",
// "excel",
"fts",
// "httpfs",
// "iceberg",
"icu",
"inet",
// "jmalloc",
"json",
// "motherduck",
"parquet",
// "postgres_scanner",
"spatial",
"sqlite_scanner",
"substrait",
"tpcds",
"tpch",
"vss"
]
.map((ext) => `INSTALL ${ext} FROM '${repo}';`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do these paths get content-hashed and/or versioned (for immutable caching)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe they are versioned by the 1.1.1 in their path, but I'm not sure of what happens server-side in duckdb-land. I'm talking with @carlopi to understand this better.

.join("\n")
);
// await connection.query(`SET custom_extension_repository = '${repo}';`);
}

async function insertSource(database, name, source) {
source = await source;
if (isFileAttachment(source)) return insertFile(database, name, source);
Expand Down Expand Up @@ -309,6 +343,7 @@ async function createDuckDB() {
const worker = await duckdb.createWorker(bundle.mainWorker);
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(bundle.mainModule);
await installLocalExtensions(db);
return db;
}

Expand Down
29 changes: 29 additions & 0 deletions src/libraries.ts
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,35 @@ export function getImplicitDownloads(imports: Iterable<string>): Set<string> {
implicits.add("npm:@duckdb/duckdb-wasm/dist/duckdb-browser-mvp.worker.js");
implicits.add("npm:@duckdb/duckdb-wasm/dist/duckdb-eh.wasm");
implicits.add("npm:@duckdb/duckdb-wasm/dist/duckdb-browser-eh.worker.js");
// Ref. https://github.com/duckdb/duckdb-wasm/releases/tag/v1.29.0
for (const extension of [
// "arrow",
"autocomplete",
// "aws",
// "azure",
// "delta",
// "excel",
"fts",
// "httpfs",
// "iceberg",
"icu",
"inet",
// "jmalloc",
"json",
// "motherduck",
"parquet",
// "postgres_scanner",
"spatial",
"sqlite_scanner",
"substrait",
"tpcds",
"tpch",
"vss"
]) {
for (const platform of ["eh", "mvp"]) {
implicits.add(`https://extensions.duckdb.org/v1.1.1/wasm_${platform}/${extension}.duckdb_extension.wasm`);
}
}
}
if (set.has("npm:@observablehq/sqlite")) {
implicits.add("npm:sql.js/dist/sql-wasm.js");
Expand Down
34 changes: 30 additions & 4 deletions src/npm.ts
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ export async function getDependencyResolver(
(name === "arquero" || name === "@uwdata/mosaic-core" || name === "@duckdb/duckdb-wasm") && depName === "apache-arrow" // prettier-ignore
? "latest" // force Arquero, Mosaic & DuckDB-Wasm to use the (same) latest version of Arrow
: name === "@uwdata/mosaic-core" && depName === "@duckdb/duckdb-wasm"
? "1.28.0" // force Mosaic to use the latest (stable) version of DuckDB-Wasm
? "1.29.0" // force Mosaic to use the latest (stable) version of DuckDB-Wasm
mbostock marked this conversation as resolved.
Show resolved Hide resolved
: pkg.dependencies?.[depName] ??
pkg.devDependencies?.[depName] ??
pkg.peerDependencies?.[depName] ??
Expand Down Expand Up @@ -248,9 +248,7 @@ async function resolveNpmVersion(root: string, {name, range}: NpmSpecifier): Pro
export async function resolveNpmImport(root: string, specifier: string): Promise<string> {
const {
name,
range = name === "@duckdb/duckdb-wasm"
? "1.28.0" // https://github.com/duckdb/duckdb-wasm/issues/1561
: undefined,
range = name === "@duckdb/duckdb-wasm" ? "1.29.0" : undefined,
mbostock marked this conversation as resolved.
Show resolved Hide resolved
path = name === "mermaid"
? "dist/mermaid.esm.min.mjs/+esm"
: name === "echarts"
Expand Down Expand Up @@ -316,3 +314,31 @@ export function fromJsDelivrPath(path: string): string {
const subpath = parts.slice(i).join("/"); // "+esm" or "lite/+esm" or "lite.js/+esm"
return `/_npm/${namever}/${subpath === "+esm" ? "_esm.js" : subpath.replace(/\/\+esm$/, "._esm.js")}`;
}

const downloadRequests = new Map<string, Promise<string>>();

/**
* Given a URL such as
* https://extensions.duckdb.org/v1.1.1/wasm_eh/parquet.duckdb_extension.wasm,
* returns the corresponding local path such as
* _npm/extensions.duckdb.org/v1.1.1/wasm_eh/parquet.duckdb_extension.wasm
*/
export async function resolveDuckDBDownload(root: string, href: string): Promise<string> {
if (!href.startsWith("https://extensions.duckdb.org")) throw new Error(`invalid download path: ${href}`);
const path = "/_npm/" + href.slice("https://".length);
const outputPath = join(root, ".observablehq", "cache", "_npm", href.slice("https://".length));
if (existsSync(outputPath)) return path;
let promise = downloadRequests.get(outputPath);
if (promise) return promise; // coalesce concurrent requests
promise = (async () => {
console.log(`download: ${href} ${faint("→")} ${outputPath}`);
const response = await fetch(href);
if (!response.ok) throw new Error(`unable to fetch: ${href}`);
await mkdir(dirname(outputPath), {recursive: true});
await writeFile(outputPath, Buffer.from(await response.arrayBuffer()));
return path;
})();
promise.catch(console.error).then(() => downloadRequests.delete(outputPath));
downloadRequests.set(outputPath, promise);
return promise;
}
5 changes: 5 additions & 0 deletions src/resolvers.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ import type {LoaderResolver} from "./loader.js";
import type {MarkdownPage} from "./markdown.js";
import {extractNodeSpecifier, resolveNodeImport, resolveNodeImports} from "./node.js";
import {extractNpmSpecifier, populateNpmCache, resolveNpmImport, resolveNpmImports} from "./npm.js";
import {resolveDuckDBDownload} from "./npm.js";
import {isAssetPath, isPathImport, parseRelativeUrl, relativePath, resolveLocalPath, resolvePath} from "./path.js";

export interface Resolvers {
Expand Down Expand Up @@ -367,6 +368,10 @@ async function resolveResolvers(
const path = await resolveNpmImport(root, specifier.slice("npm:".length));
resolutions.set(specifier, path);
await populateNpmCache(root, path);
} else if (specifier.startsWith("https://extensions.duckdb.org/")) {
const path = await resolveDuckDBDownload(root, specifier);
resolutions.set(specifier, path);
await populateNpmCache(root, path);
} else if (!specifier.startsWith("observablehq:")) {
throw new Error(`unhandled implicit download: ${specifier}`);
}
Expand Down
26 changes: 25 additions & 1 deletion test/libraries-test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,31 @@ describe("getImplicitDownloads(imports)", () => {
"npm:@duckdb/duckdb-wasm/dist/duckdb-mvp.wasm",
"npm:@duckdb/duckdb-wasm/dist/duckdb-browser-mvp.worker.js",
"npm:@duckdb/duckdb-wasm/dist/duckdb-eh.wasm",
"npm:@duckdb/duckdb-wasm/dist/duckdb-browser-eh.worker.js"
"npm:@duckdb/duckdb-wasm/dist/duckdb-browser-eh.worker.js",
"https://extensions.duckdb.org/v1.1.1/wasm_eh/autocomplete.duckdb_extension.wasm",
"https://extensions.duckdb.org/v1.1.1/wasm_eh/fts.duckdb_extension.wasm",
"https://extensions.duckdb.org/v1.1.1/wasm_eh/icu.duckdb_extension.wasm",
"https://extensions.duckdb.org/v1.1.1/wasm_eh/inet.duckdb_extension.wasm",
"https://extensions.duckdb.org/v1.1.1/wasm_eh/json.duckdb_extension.wasm",
"https://extensions.duckdb.org/v1.1.1/wasm_eh/parquet.duckdb_extension.wasm",
"https://extensions.duckdb.org/v1.1.1/wasm_eh/spatial.duckdb_extension.wasm",
"https://extensions.duckdb.org/v1.1.1/wasm_eh/sqlite_scanner.duckdb_extension.wasm",
"https://extensions.duckdb.org/v1.1.1/wasm_eh/substrait.duckdb_extension.wasm",
"https://extensions.duckdb.org/v1.1.1/wasm_eh/tpcds.duckdb_extension.wasm",
"https://extensions.duckdb.org/v1.1.1/wasm_eh/tpch.duckdb_extension.wasm",
"https://extensions.duckdb.org/v1.1.1/wasm_eh/vss.duckdb_extension.wasm",
"https://extensions.duckdb.org/v1.1.1/wasm_mvp/autocomplete.duckdb_extension.wasm",
"https://extensions.duckdb.org/v1.1.1/wasm_mvp/fts.duckdb_extension.wasm",
"https://extensions.duckdb.org/v1.1.1/wasm_mvp/icu.duckdb_extension.wasm",
"https://extensions.duckdb.org/v1.1.1/wasm_mvp/inet.duckdb_extension.wasm",
"https://extensions.duckdb.org/v1.1.1/wasm_mvp/json.duckdb_extension.wasm",
"https://extensions.duckdb.org/v1.1.1/wasm_mvp/parquet.duckdb_extension.wasm",
"https://extensions.duckdb.org/v1.1.1/wasm_mvp/spatial.duckdb_extension.wasm",
"https://extensions.duckdb.org/v1.1.1/wasm_mvp/sqlite_scanner.duckdb_extension.wasm",
"https://extensions.duckdb.org/v1.1.1/wasm_mvp/substrait.duckdb_extension.wasm",
"https://extensions.duckdb.org/v1.1.1/wasm_mvp/tpcds.duckdb_extension.wasm",
"https://extensions.duckdb.org/v1.1.1/wasm_mvp/tpch.duckdb_extension.wasm",
"https://extensions.duckdb.org/v1.1.1/wasm_mvp/vss.duckdb_extension.wasm"
])
);
assert.deepStrictEqual(
Expand Down