You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Doesn't seem like there's any advantage to this over the files/search endpoints given that its the same basic string matching and doesn't offer any way to sort the results.
Custom Index File
The general idea here is that if the remote storage service doesn't provide adequate methods of searching for, and ordering the results of, oplog entry files by HLC timestamp, then some separate data structure is manually maintained by the plugin itself. This data structure can then be downloaded and used by the plugin to figure out which oplog entry files to download. It seem likely that this type of solution would be slow (e.g., possibly only downloading one oplog entry file at a time), and involve a lot of work. It really should only be pursued in the interest of offering a new remote storage service that app users want to use as a home for their data (e.g., someone really doesn't want to keep their data in Google Drive and really prefers to use Dropbox instead, for whatever reason). It seems like using an HTTP-based email API would be easier if that API makes it possible to do more advanced searching and ordering by custom times.
each client updates and uploads an "index" file: an ordered list of all the HLC timestamps (i.e., filenames) for oplog entries it has created
these files should all have a standard extension to be easily discovered by via filename pattern searching (e.g., {client ID}.index.txt)
each line in the file has the HLC timestamp for an entry created by that client and whatever additional data is needed to retrieve the corresponding file for that oplog entry. For example:
2021-04-05T12:29:02.790Z 0000: this is enough info to download the file from Dropbox by filename.
2021-04-05T12:29:02.790Z 0000 | { dropboxFileId: 'a4ayc_80_OEAAAAAAAAAY' }: this basically maps the HLC timestamp to some structured metadata--in this case, the Dropbox file ID.
the file lines are sorted by HLC time
this file should be uploaded to the server on each sync (overwriting the existing file if one exists)
other clients can download this (pre-sorted) list, somehow jump to a specific HLC time (e.g., iterate lines), and begin going through all following lines to get the info about which files need to be requested, one-by-one.
Custom index downsides, ideas for improvement
Using a manually maintained oplog index like this has some real downsides and should really be avoided if the remote storage service offers a way to filter and order files. Index files could get BIG (e.g., adding 1M lines of timestamps to a text file resulted in a 45MB file). Some thoughts on ways to reduce the data transfer and/or "search by time" operations.
compression could help; this could be deferred to server and browser (assuming server compression is enabled), or maybe done in the browser (JSZip compressed 45MB realistic text file to 134 KB in 1.2 sec)
Partition the indices into separate files using a file name format that allows the client to list the files and decide which part(s) of the index to download. For example, maybe putting timestamp ranges in index filenames: {nodeId}.{firstTimeStamp__lastTimeStamp}.index.txt)
storing the index as some other data structure other than "one timestamp per line" could make the "get all entries after time X" faster
a tree where the paths to leaves are an encoded version of the times (similar to the way the merkle trees in the crdt-example-app) might be an option
maybe store the file as a serialized, compressed SQLite database? this seems like it would have all kinds of problems and would require the client to do something like use a WASM module to run sqlite in the browser...
The text was updated successfully, but these errors were encountered:
Add a new plugin that allows oplog entry data to sync'ed with a user's Dropbox account.
Summary:
Dropbox API
Unfortunately the Dropbox API is pretty limited in terms of search and retrieval of files.
POST files/search
path
specifies path to folder to searchquery
can be used for very primitive filename matching:bat c
matches "bat cave" but not "batman car"mode: 'filename'
to limit search to file names and not contentPOST files/search_v2
: it's primitivequery
: string to search for (file name or contents). no boolean logic, regex patterns, etc.options.path
:/Folder
options.filename_only: true
: to limitquery
search to file namesrelevance
orlast_modified_time
(which cannot be specified by client).POST /properties/search
allows very basic/limited searching of files by metadata props.files/search
endpoints given that its the same basic string matching and doesn't offer any way to sort the results.Custom Index File
The general idea here is that if the remote storage service doesn't provide adequate methods of searching for, and ordering the results of, oplog entry files by HLC timestamp, then some separate data structure is manually maintained by the plugin itself. This data structure can then be downloaded and used by the plugin to figure out which oplog entry files to download. It seem likely that this type of solution would be slow (e.g., possibly only downloading one oplog entry file at a time), and involve a lot of work. It really should only be pursued in the interest of offering a new remote storage service that app users want to use as a home for their data (e.g., someone really doesn't want to keep their data in Google Drive and really prefers to use Dropbox instead, for whatever reason). It seems like using an HTTP-based email API would be easier if that API makes it possible to do more advanced searching and ordering by custom times.
{client ID}.index.txt
)2021-04-05T12:29:02.790Z 0000
: this is enough info to download the file from Dropbox by filename.2021-04-05T12:29:02.790Z 0000 | { dropboxFileId: 'a4ayc_80_OEAAAAAAAAAY' }
: this basically maps the HLC timestamp to some structured metadata--in this case, the Dropbox file ID.Custom index downsides, ideas for improvement
Using a manually maintained oplog index like this has some real downsides and should really be avoided if the remote storage service offers a way to filter and order files. Index files could get BIG (e.g., adding 1M lines of timestamps to a text file resulted in a 45MB file). Some thoughts on ways to reduce the data transfer and/or "search by time" operations.
{nodeId}.{firstTimeStamp__lastTimeStamp}.index.txt
)The text was updated successfully, but these errors were encountered: