Skip to content
This repository has been archived by the owner on Oct 25, 2023. It is now read-only.

Offline mode #351

Closed
7 of 8 tasks
akindyakov opened this issue Dec 4, 2022 · 24 comments
Closed
7 of 8 tasks

Offline mode #351

akindyakov opened this issue Dec 4, 2022 · 24 comments
Assignees

Comments

@akindyakov
Copy link
Contributor

akindyakov commented Dec 4, 2022

Implementation plan: - ~50 hours total

@SergNikitin
Copy link
Collaborator

Notes from brief research on IndexedDb:
- is a mature, stable, widely supported way to store data persistently in a web browser
- is supposedly very slow - without even looking for it, I stumbled into multiple articles about it
- doesn't have native SQL support, but has libraries that build it on top (although they supposedly are even slower)

@akindyakov
Copy link
Contributor Author

doesn't have native SQL support

We have very limited needs for SQL in smuggler today: only access-by-key and iteration over for a search. Is this supported by IndexDB at least?

@SergNikitin
Copy link
Collaborator

Does IndexDB support access-by-key and iteration over for a search?

Yes it does! Setup of "tables", indexes and query APIs are all completely alien to what we are used to, but the capabilities are there.

More notes:

  • ⚠️⚠️ data stored inside a browser is persistent, but not really (see "Browser storage is not really persistent" section)
    • absurd-sql + sql.js looks like the most "bare-bones" solution to this issue, but it relies on some changes that haven't been merged to upstream for more than a year; usage of patched sql.js fork is required
    • supposedly (1, 2) we should expect many more persistent solutions that are both simple to integrate and performant once browsers implement "The Origin Private File System" part of File System Access API (on desktop it seems Firefox is the only one left out ATM)
  • if an app wants offline support AND replication to backend then there are some sophisticated of the box solutions available
    • apparently, out of the box solutions are all NoSQL (see "There is no relational data" section) because

creating replication for an SQL offline first database is way more work than just adding some network protocols on top of PostgreSQL

  • ⚠️using IndexedDb directly (instead of using some abstraction on top of it) means a future rewrite if support for a mobile application is needed

@akindyakov
Copy link
Contributor Author

This is concerning. Are the limitations on a size and persistence of indexedDB equality strict on chrome and edge?

Do we know how much disk space does Mazed node takes on average?

As far as I remember we do not use relations between tables in smuggler, so we should be safe in that regard. Do I miss anything?

@SergNikitin
Copy link
Collaborator

Are the limitations on a size and persistence of indexedDB equalityequally strict on chrome and edge?

  • persistence part: if I understood your question correctly, see "on automatic data purging" note below!
  • size part: I haven't found any alarming restrictions about size, only statements that IndexedDb becomes slower as the size grows. There are good solutions for this however -- see the note on "OPFS" at the bottom!

Do we know how much disk space does Mazed node takes on average?

No, we do not. At the moment performance and size are secondary concerns from my perspective - but see the note about them at the bottom, search for "OPFS"!

As far as I remember we do not use relations between tables in smuggler, so we should be safe in that regard. Do I miss anything?

I believe you are correct.


More notes

  1. on automatic data purging:
    • Safari: as mentioned in an article linked above, purges all data (including IndexedDb) after 7 days of inactivity
    • Firefox & Chromium-based: the behaviour is less aggressive
      • a call to StorageManager.persist() marks data of an active website as "persistent" and after that the browser won't touch it unless explicitly requested by a user; shaky 🤔
      • I haven't found any info on the persist() API from within background, will have to experiment 🥼 See below why background's APIs are likely to be the most important ones
  2. ⚠️⚠️ IndexedDb adheres to "same-origin policy", which seemingly means that it is NOT possible to share a database between archaeologist and truthsayer natively.
    • looks like there are ways around this, but if we are to keep truthsayer in offline mode at all then it'll at the very least mean increased implementation cost and complexity.
    • in all our workflows we can get access to two databases: 1) archaeologist's database, accessed from background and 2) mased.se's database, natively available to truthsayer's code AND to content script
    • since truthsayer is not expected to be open all the time, only one option is viable: archaeologist's database, accessed from background
    • above means
      • either truthsayer itself has to become a mostly empty shell, with all UI elements that work with data (like indexed cards, search box etc) injected into it by content (like we do for browser history import in Import fragments page #297 )
      • content can't see javascript of the page itself, but if the opposite is possible then perhaps content can inject a database accessor into truthsayer, for truthsayer to use
  3. ⭐ In the previous notes I said that a new set of web APIs is expected to unleash much better storage solutions. Turns out they are already here -- we can embed sqlite into our web app that uses these APIs under the hood (search for "OPFS"), released only a month ago! This
    • supposedly solves the issues with size and performance
    • retains the same persistence issues and "same-origin policy" complications, just like IndexedDb and all other storage options

Next steps

Issues with data persistence look manageable on Chrome and Firefox. On Safari they appear unacceptable long-term, but short-term we have no Safari support plans anyway. So persistence doesn't block our immediate goals.

"Same-origin policy" complications on the other hand are less clear to me. It seems bad, but docs across the web are not very clear on what can and can't be done, so as a next step🦵I'll need to run some experiments.

@SergNikitin
Copy link
Collaborator

On sharing a data store between truthsayer and archaeologist:

  1. extensions don't have an "origin" and APIs like IndexedDb and OPFS don't expose anything that could allow archaeologist to say "fetch me a store for mazed.se" (that's sort of what we rely on for sharing cookies between the two)
  2. archaeologist can't inject a "callable" into truthsayer because extensions live in an isolated world
  3. ⭐ but luckily for us there are two other approaches
    • on Chromium-based browsers it's dead easy for truthsayer to send requests to archaeologist and receive responses - there is a special externally_connectable manifest permission for that (if we knew about it earlier we might have implemented browser history import controls very differently)
    • on Firefox externally_connectable is not supported, but there is an escape hatch - although webpages and content scripts don't share callables/variables/etc they do share events, so truthsayer can post an event saying "query a database" and archaeologist can post an event with a response

@SergNikitin
Copy link
Collaborator

SergNikitin commented Dec 15, 2022

<estimates moved to the top>

@akindyakov
Copy link
Contributor Author

Thanks a lot for the detailed write-up!

A few questions:

  1. Is "Plug in SQLite with trivial proof that it works" the biggest risk of failure? If so, can we start from it then?
  2. With this setup, offline mode will be unavailable in Truthsayer when Archaeologist is not installed, is this correct?
  3. What corners can we cut to speed it up?

@akindyakov
Copy link
Contributor Author

What corners can we cut to speed it up?

We can push implementation for Firefox to the end of this list. The goal is to make it work for engineers in tech companies with a special policy on installed software, I know none who use Firefox. We can get to the point without Firefox at first

@akindyakov
Copy link
Contributor Author

Plug in SQLite

As far as I remember, you wanted to use WASM to do it, didn't you? Can we add some small piece of code with WASM first and try to release an extension to the Chrome web store? Just to make sure they won't bloc/ban us for this.

@SergNikitin
Copy link
Collaborator

  1. Is "Plug in SQLite with trivial proof that it works" the biggest risk of failure? If so, can we start from it then?

The biggest in my opinion was communication between truthsayer and archaeologist, but I made it work. SQLite comes second and will be my next step!

  1. With this setup, offline mode will be unavailable in Truthsayer when Archaeologist is not installed, is this correct?

Yes, archaeologist will become a mandatory requirement to use it.

  1. What corners can we cut to speed it up?

The most obvious one is to "push implementation for Firefox to the end of this list", I agree that we should use it. Just thinking at the level of big chunks of work estimated above, I don't see any obvious ones that can be removed. But once we start working on them we'll probably see sub-parts that aren't needed.

We can push implementation for Firefox to the end of this list.

Yes, good one to cut! (see the previous answer)

As far as I remember, you wanted to use WASM to do it, didn't you? Can we add some small piece of code with WASM first and try to release an extension to the Chrome web store?

Yes, sqlite itself is WASM-only in JS as far as I understand. It's a good idea to trial a WASM release, let's use the outcome of "Plug in SQLite with a trivial proof that it works" step for that.

@SergNikitin
Copy link
Collaborator

Minor update on how to pull sqlite into our code: prior to the current "official" (as in "by maintainers of sqlite itself") effort to provide WASM builds there have been a number of projects that experimented in a similar territory (see the "Attribution" section). All of them has been published to npm and can be consumed easily:

  • sql.js (in our case +absurd-sql) - more notes on why this combo is not very desirable in earlier comments
  • wa-sqlite - apparently the first unofficial project to experiment with WASM sqlite + OPFS. Probably would work for our case, but with the recent release of the "official" WASM sqlite the maintainer posted this discussion with the reasons why wa-sqlite is unlikely to stay attractive

So based on the above, it's appealing to use the "official" offerings. I was surprised to find that although WASM is officially supported, there are currently no official NPM packages - we are expected to download the binaries manually from their website. This is obviously inconvenient, and luckily some folks have created a paper-thin NPM-compatible wrapper over the official build -- sqlite-wasm-esm; they also have an ongoing conversation with sqlite maintainers to help them eventually offer the same convenience out of the box. That's the package I'll experiment with first ⭐

@akindyakov
Copy link
Contributor Author

let's use the outcome of "Plug in SQLite with a trivial proof that it works" step for that.

Let's experiment with something smaller that sqlite and less "experimental", something that works 💯 . Any stable package with wasm inside would do. My concern is that Chrome Web Store might block Archaeologist entirely for using WASM - for some reason they afraid of it

@SergNikitin
Copy link
Collaborator

SergNikitin commented Dec 26, 2022

I managed to make our build process to pick up WASM dependencies properly and, as we expected, after that their usage is the same as regular JS - just do an import. The PR showcases use of a default DB of type 'memory' - that's an in-memory database, contents get lost as soon as the browser gets closed. So WASM-wise we are good, can write familiar SQL without a problem. 🎉

We are, however, not good on the persistence front and as a result will not proceed with SQLite at this time. See the second part of the comment for more context if curious.

What's next?

In summary,

  1. messaging experiment - success ✅, will be used in final solution ✅
  2. wasm experiment - success ✅, won't be used in final solution ❌
  3. sqlite experiment - failure ❌

Next - implementation of something with direct usage of IndexedDb or browser.storage.

SQLite storage options

Aside from the in-memory DB, SQLite comes with two more types of storage:

Storage option 1 - OPFS

This is the one I was hoping to use, but it won't work right now. It does work "within workers", but turns out there are multiple kinds of workers. There are

  1. "web workers" (that's what you get when you run new Worker()). These are general-purpose, can be used to do whatever the application needs. They are further divided into
    1. "dedicated workers"
    2. "shared workers"
  2. "service workers" - these are workers with a "specific purpose", they are not general

OPFS docs say:

The createSyncAccessHandle() [...] Note that it is only usable inside dedicated Web Workers.

SQLite's OPFS implementation requires this API and their docs say

[OPFS via sqlite3_vfs] support is only available when sqlite3.js is loaded from a [...] dedicated worker [...].

As you might have guessed by this point, background.js is a "service worker" in Manifest V3, so the API is unuavailable and by extension OPFS support is unavailable. 👎

Is it possible to spawn a "dedicated worker" from background?

In Manifest V2 - yes. In Manifest V3 - no, but it has been identified as an undesired regression. Chrome expressed intent to fix this, that's being tracked here. So at some point we'll be able to use OPFS in our usecase 👍, but that day is not today.

Storage option 2 - localStorage/sessionStorage (nicknamed kvvfs)

That's what is available via window.localStorage and such. These specific globals are only available in a web page environment, so background is out of luck again.

What's interesting is background has access to a very similiar API -- browser.storage and, unlike window.localStorage there is no limit on how much data can be stored (if "unlimitedStorage" permission is requested). It's also a key-value VFS, just has differently named getter and setter. We probably don't have a chance to move along the OPFS blocker, but this I believe we could overcome with managable effort if we really wanted:

  1. The glue code which uses localStorage/sessionStorage API's getItem()/setItem() would have to change to browser.storage's get()/set()
  2. a couple of other decision-making places

Although a prospect of contributing to sqlite itself to make localStorage work in background is very exciting, we'll keep this treat for "future us" if and when we get dissatisfied with a hand-written persistence. In parallel, I have submitted a question to sqlite maintainers to better understand if this is possible at all.

@akindyakov
Copy link
Contributor Author

Wow, you digging really deep! Read it like an adventure story😲

Just as an idea, is there some politfill package for or dependency injection trick to replace 'window.localStorage' with 'browser.storage' without changing code of SQLite at all?

@akindyakov
Copy link
Contributor Author

So what's your plan now then?

@SergNikitin
Copy link
Collaborator

It's as described in "What's next?" section: instead of smuggler-api backed by sqlite I'm working on

implementation of smuggler-api with direct usage of IndexedDb or browser.storage.local

@SergNikitin
Copy link
Collaborator

SergNikitin commented Dec 27, 2022

is there some politfill package for or dependency injection trick to replace 'window.localStorage' with 'browser.storage' without changing code of SQLite at all?

Probably not -- these APIs have 1 fundamental difference which I missed originally, one is synchronous and another is asynchronous. Maintainers of sqlite were kind enough to look into this very quickly with the intension to patch their code in the next release, but identified this as the severe complication.

This means we can do our hand-written implementation without sqlite with a peace of mind -- we have ruled out all the possibilities where it wouldn't cost us an arm and a leg to get sqlite to work.

@SergNikitin
Copy link
Collaborator

SergNikitin commented Jan 11, 2023

First pass of a background implementation of StorageApi is close to done in #396, at least for most of the core endpoints. The one that presented a challenge that is difficult to address just within the bounds of StorageApi is node search - currently the only endpoint available is node.slice which is used in two different contexts:

  • to iterate over nodes (called like node.slice({}))
  • to lookup a very specific list of nodes (used only in steroid.node.lookup)

The API is inherently time-based and yet neither of the usecases actually cares about time-related parameters of GetNodeSliceArgs - first one just ignores them, second specifies parameters that say "give me a single range from 0 to now".

Since time-based lookups are very unattractive in a KV-storage I think I may need to split these into two distinct APIs, otherwise I'll either make steroid.node.lookup very slow or node.slice({}) very slow -- #397

@SergNikitin
Copy link
Collaborator

SergNikitin commented Jan 21, 2023

Core functionality is in place for Chromium-based browsers. Bugs to fix (mostly extracted from #404 comments):

@akindyakov
Copy link
Contributor Author

file upload doesn't work

Only images uploading doesn't work, text files uploading works fine

@akindyakov
Copy link
Contributor Author

akindyakov commented Feb 5, 2023

@akindyakov
Copy link
Contributor Author

akindyakov commented Feb 5, 2023

cards get shown in reverse order, oldest first newest last #410

Update: Looked closer, it looks like i found the issue. @SergNikitin , if you are not looking at it already, I'll take it

@SergNikitin
Copy link
Collaborator

Done!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants