Export improvements #400

seidtgeist · 2020-05-28T14:15:53Z

Based on #302 I’ve been trying to improve export, and this has lead me on a path™. Have you considered using one of these options?

VACUUM INTO a temporary file that’s read, unlinked and returned
Using sqlite’s Online Backup API to create a copy that’s returned
Duplicating and returning the working memory via deserialize

In this PR, I’m giving (1) a shot. I’m very interested to hear what you have to say. Ultimately, option (2) might be better suited for a generic export function since it doesn’t incur a VACUUM, and I will most likely give that a try as well.

On a related note: Since the current filesystem is MEMFS, have you considered just using an in-memory database instead? Right now, sqlite believes it’s writing to a persistent filesystem, but MEMFS is backed by memory. Since everything’s happening in-memory anyway, why not use sqlite’s native memory-backed mechanics and drop the MEMFS indirection along the way?

I have no experience with IDBFS (#397), and boy would I prefer if we had something like NativeIO.

Some notes to self:

Add test for Exporting database resets pragmas to their default values #159
Test using bound statements after export
Test using JS function after export

seidtgeist · 2020-05-28T15:42:55Z

Regarding the test failure:

Only the sql-asm.js target fails, all other builds (wasm, asm.js debug, workers) pass
It’s breaking because getTempRet0 is undefined when called
AFAIK getTempRet0 should be provided by emcc since it’s being called

Looking through emscripten issues it looks like there have been various fun moments with getTempRet0/setTempRet0 over the years 😅

lovasoa · 2020-05-28T15:59:18Z

The issue55 test is the one in particular that tests this part of the code, I would like to make sure it passes before merging.

Since the current filesystem is MEMFS, have you considered just using an in-memory database instead?

👍 This seems like the most promising and logical thing to me. This should allow us to build with -s FILESYSTEM=0, and hopefully reduce the size of the distributed bundle, without loosing any functionality, just using sqlite3_serialize and sqlite3_deserialize.

seidtgeist · 2020-05-28T16:09:19Z

The issue55 test is the one in particular that tests this part of the code, I would like to make sure it passes before merging.

Yeah, I’m trying to figure out how to make emscripten include env.getTempRet0 in the sql-asm.js target output. If I copy the implementation from one of the other files it passes.

Also, VACUUM can take some time and possibly lock up UI or workers. Do you have any concerns with changing the implementation in such a way?

👍 This seems like the most promising and logical thing to me. This should allow us to build with -s FILESYSTEM=0, and hopefully reduce the size of the distributed bundle, without loosing any functionality, just using sqlite3_serialize and sqlite3_deserialize.

I’ll give this a try, too. Having way too much fun with things!

seidtgeist · 2020-05-28T18:03:41Z

@kripken Since this was/is your baby, do you have any opinions on exporting and in-memory vs filesystem? And if you have any getTempRet0 tricks up your sleeve... 😜

kripken · 2020-05-28T18:51:21Z

I think going directly to memory might be better than using a filesystem, yeah. Could be much faster to just malloc some room and use that.

For tempRet0, if you need it you can export it to make sure it exists

-s "EXTRA_EXPORTED_RUNTIME_METHODS=['getTempRet0']"

Taytay · 2020-07-16T14:39:08Z

For what it's worth, I would love to avoid the overhead of the virtual FS and just tell SQLite that it's writing to memory. Thank you for exploring this @seidtgeist!

Taytay · 2021-03-12T21:23:01Z

@seidtgeist, did you do any experimentation with the memory-based option that would allow us to move away from a virtual FS? I No pressure! I just didn't want to duplicate work if I were to dive in.

seidtgeist · 2021-03-12T21:37:02Z

@Taytay I have not, but there’s a custom VFS in deno-sqlite that might interest you: https://github.com/dyedgreen/deno-sqlite/blob/master/build/src/vfs.c

rhashimoto · 2021-04-10T16:11:33Z

Since the current filesystem is MEMFS, have you considered just using an in-memory database instead?

👍 This seems like the most promising and logical thing to me. This should allow us to build with -s FILESYSTEM=0, and hopefully reduce the size of the distributed bundle, without loosing any functionality, just using sqlite3_serialize and sqlite3_deserialize.

One drawback of that is export will use more RAM, potentially doubling peak WASM memory usage. An application could reach a point where the database can no longer be exported because there is not enough contiguous WASM memory for sqlite3_deserialize. I don't think this is an issue with the current implementation of export.

seidtgeist · 2021-04-12T13:53:42Z

One drawback of that is export will use more RAM, potentially doubling peak WASM memory usage. An application could reach a point where the database can no longer be exported because there is not enough contiguous WASM memory for sqlite3_deserialize. I don't think this is an issue with the current implementation of export.

I agree. I have since learned more about sqlite and changed my opinion on this topic:

There is great utility in having a filesystem (totally aware of the irony expressed) exposed to sqlite, such as attaching multiple databases, vacuuming into a new file and other useful features that rely on a filesystem being present.

Areas worth investigating:

Choosing VFS compile time flags accommodating an in-memory filesystem or IDB
A custom VFS accommodating the same

rhashimoto · 2021-04-12T17:01:58Z

Areas worth investigating:

Choosing VFS compile time flags accommodating an in-memory filesystem or IDB

A custom VFS accommodating the same

I recently built a SQLite VFS that uses IndexedDB as a block device. It doesn't support multiple connections so the locking calls are just stubs. It works, but I don't have good tests or performance measures for it so I would call it a proof of concept.

The main obstacle is that the VFS model is synchronous while IndexedDB is not, which I bridge with Asyncify. This creates two issues:

There's a performance hit. SQLite is effectively a worst case performance scenario for using Asyncify. I'm not sure whether the added slowdowns within SQLite itself will be dominated by IndexedDB throughput for typical usage, though.
It's incompatible with the existing SQL.js synchronous API (I wrote my own rudimentary API). Using asynchronous storage propagates the asynchronicity to the application code. It can't be a drop-in replacement unless you're just mirroring memory, like IDBFS.

Getting a synchronous Storage Foundation API would bypass both issues...but it isn't clear when that would arrive, if ever. Note that IndexedDB originally had a synchronous API that ultimately was removed. Also I don't know of any existing or proposed way to implement synchronous locks, which might mean asynchronicity would still be required to support multiple connections.

seidtgeist · 2021-04-12T17:35:44Z

I recently built a SQLite VFS that uses IndexedDB as a block device. It doesn't support multiple connections so the locking calls are just stubs. It works, but I don't have good tests or performance measures for it so I would call it a proof of concept.

That’s exciting! Would you be able to share it? It would help me with writing a memory VFS without needing to through the FS module.

The main obstacle is that the VFS model is synchronous while IndexedDB is not, which I bridge with Asyncify. This creates two issues:

There's a performance hit. SQLite is effectively a worst case performance scenario for using Asyncify. I'm not sure whether the added slowdowns within SQLite itself will be dominated by IndexedDB throughput for typical usage, though.

It's incompatible with the existing SQL.js synchronous API (I wrote my own rudimentary API). Using asynchronous storage propagates the asynchronicity to the application code. It can't be a drop-in replacement unless you're just mirroring memory, like IDBFS.

Getting a synchronous Storage Foundation API would bypass both issues...but it isn't clear when that would arrive, if ever. Note that IndexedDB originally had a synchronous API that ultimately was removed. Also I don't know of any existing or proposed way to implement synchronous locks, which might mean asynchronicity would still be required to support multiple connections.

Ok, I’ve been wondering about this. Since I don’t expect a sync filesystem API to arrive ever (persistence? sync? low level? lol), would it be more reasonable to keep files in memory and persist them to in a separate process?

rhashimoto · 2021-04-12T22:42:03Z

I recently built a SQLite VFS that uses IndexedDB as a block device.

That’s exciting! Would you be able to share it? It would help me with writing a memory VFS without needing to through the FS module.

I don't quite understand the advantages of a memory VFS over the current in-memory filesystem, can you explain? In any case, I can share what I have; I'll try to DM you.

Since I don’t expect a sync filesystem API to arrive ever (persistence? sync? low level? lol), would it be more reasonable to keep files in memory and persist them to in a separate process?

I wouldn't count a synchronous API in or out completely, at least for Worker contexts. There apparently is an actual implementation in the Chrome origin trial described here and mentioned in #302. I'm prepared for disappointment, but I do have hope for it.

I would like to see SQL.js either define its own official asynchronous API or make it easier to develop and use alternative APIs (including asynchronous ones). That's what I was getting at in my #377 comment.

seidtgeist · 2021-04-13T10:44:14Z

I don't quite understand the advantages of a memory VFS over the current in-memory filesystem, can you explain? In any case, I can share what I have; I'll try to DM you.

Based on your previous comments I assume you mean emscripten’s MEMFS and not sqlite’s :memory: databases, so that database files can be easily accessed without needing to serialize or copy.

I haven’t looked much into sqlite’s VFS interface yet, so I don’t know whether there’s additional information/parameters available to the VFS interface that might be useful in implementing something better. Looking at https://www.sqlite.org/src/doc/trunk/src/test_demovfs.c it doesn’t seem that way.

The performance gain of an in-memory VFS compared to the default VFS → posix shims → MEMFS path is probably minimal.
The VFS interface is very close to posix and there’s no additional information on that layer.

I’m interested in this area because I once considered streaming WAL pages to remote persistent storage.

I wouldn't count a synchronous API in or out completely, at least for Worker contexts. There apparently is an actual implementation in the Chrome origin trial described here and mentioned in #302. I'm prepared for disappointment, but I do have hope for it.

Fair enough, but it’s not something I’ll expect to be available universally within the next three years or so.

I would like to see SQL.js either define its own official asynchronous API or make it easier to develop and use alternative APIs (including asynchronous ones). That's what I was getting at in my #377 comment.

I agree with your points. I’ve started my own emscripten-compiled version of the sqlite amalgamation to learn more about the C API and optimize it for my use case. Since I‘m not banking on widely available and reliable persistent storage any time soon I’m considering IDB for local persistence and S3 as a remote key-value store for reliable persistence. I find it interesting to think about guaranteeing data consistency on the application level and minimizing deltas.

Taytay · 2021-04-21T19:23:56Z

One of the concerns I have with the VFS shims is that it appears to use exceptions for some unexceptional things. When I debug SQL.js and tell it to breaking on all exceptions (not just uncaught), it's constantly breaking due to some sort of exception-based flow control. I think it's due to SQLite performing stat calls on non-existent lock files, but it's been a long time since I looked.

rhashimoto · 2021-04-21T20:24:04Z

One of the concerns I have with the VFS shims is that it appears to use exceptions for some unexceptional things. When I debug SQL.js and tell it to breaking on all exceptions (not just uncaught), it's constantly breaking due to some sort of exception-based flow control. I think it's due to SQLite performing stat calls on non-existent lock files, but it's been a long time since I looked.

You're right that SQL.js throws and catches a bunch of exceptions looking for files on start up, but I think that's actually all in the Emscripten filesystem. When it sets up its initial directories and devices, it checks to see if creating them is allowed here:

    mayCreate: function(dir, name) {
      try {
        var node = FS.lookupNode(dir, name);
        return {{{ cDefine('EEXIST') }}};
      } catch (e) {
      }
      return FS.nodePermissions(dir, 'wx');
    },

And indeed the non-existence of a file is signaled with an exception. Amusingly, the error condition here occurs on the lack of an exception.

But I don't think this is related to SQLite VFS, other that the standard VFS uses the Emscripten FS. I don't see any exceptions, caught or uncaught, using my own VFS implementations (other than from my bugs).

Taytay · 2021-04-24T18:21:24Z

Ah yes - you're right. I wasn't clear in my statement. The built in Emcripten posix-emulating file system was what concerned me. It sounds like your implementation is avoiding those issues and that sounds promising.

Use VACUUM INTO tempfile for export

90ff568

seidtgeist changed the title ~~Persistence improvements~~ Export improvements May 28, 2020

lovasoa force-pushed the master branch from 351ffdd to 6e7ec33 Compare October 11, 2020 20:00

cepm-nate mentioned this pull request Dec 7, 2020

Anyway to save SQL DB to IndexedDB without calling export() for persistence? #302

Open

seidtgeist closed this Sep 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export improvements #400

Export improvements #400

seidtgeist commented May 28, 2020 •

edited

Loading

seidtgeist commented May 28, 2020 •

edited

Loading

lovasoa commented May 28, 2020 •

edited

Loading

seidtgeist commented May 28, 2020

seidtgeist commented May 28, 2020

kripken commented May 28, 2020

Taytay commented Jul 16, 2020

Taytay commented Mar 12, 2021

seidtgeist commented Mar 12, 2021

rhashimoto commented Apr 10, 2021

seidtgeist commented Apr 12, 2021

rhashimoto commented Apr 12, 2021

seidtgeist commented Apr 12, 2021

rhashimoto commented Apr 12, 2021

seidtgeist commented Apr 13, 2021

Taytay commented Apr 21, 2021 •

edited

Loading

rhashimoto commented Apr 21, 2021 •

edited

Loading

Taytay commented Apr 24, 2021

Export improvements #400

Export improvements #400

Conversation

seidtgeist commented May 28, 2020 • edited Loading

seidtgeist commented May 28, 2020 • edited Loading

lovasoa commented May 28, 2020 • edited Loading

seidtgeist commented May 28, 2020

seidtgeist commented May 28, 2020

kripken commented May 28, 2020

Taytay commented Jul 16, 2020

Taytay commented Mar 12, 2021

seidtgeist commented Mar 12, 2021

rhashimoto commented Apr 10, 2021

seidtgeist commented Apr 12, 2021

rhashimoto commented Apr 12, 2021

seidtgeist commented Apr 12, 2021

rhashimoto commented Apr 12, 2021

seidtgeist commented Apr 13, 2021

Taytay commented Apr 21, 2021 • edited Loading

rhashimoto commented Apr 21, 2021 • edited Loading

Taytay commented Apr 24, 2021

seidtgeist commented May 28, 2020 •

edited

Loading

seidtgeist commented May 28, 2020 •

edited

Loading

lovasoa commented May 28, 2020 •

edited

Loading

Taytay commented Apr 21, 2021 •

edited

Loading

rhashimoto commented Apr 21, 2021 •

edited

Loading