-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add chel migrate and use pinned esm.sh deps #7
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done @snowteamer! Thanks also for the pinned esh.sh deps! Left my review below.
@@ -19,3 +32,22 @@ export function isDir (path: string | URL): boolean { | |||
return false | |||
} | |||
} | |||
|
|||
export function isFile (path: string | URL): boolean { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should we treat symlinks?
Whatever decision you pick, can you document it in English here in a comment above this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt symlinks should be accepted as valid keys in a key-value store, especially since they would likely point to somewhere outside the data folder. So I'd rather skip them when migrating from fs
.
Besides, if a symlink gets passed in as command argument e.g. --out
, I'm not sure but I guess the link should be followed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah if the output directory includes a symlink we should follow it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @snowteamer! Left a few comments (minor things).
Main thing is that I'm seeing annoying red deprecation warnings in the console when I try this out locally:
-> % deno task chel migrate
Task chel deno task build && deno run --import-map=./vendor/import_map.json --allow-read=. --allow-write build/main.js "migrate"
Task build deno run --import-map=./vendor/import_map.json --no-remote --allow-run --allow-read --allow-env --allow-write=./build --allow-net scripts/build.ts
[npm] deprecated multibase@4.0.6: This module has been superseded by the multiformats module
built: build
[npm] deprecated multibase@4.0.6: This module has been superseded by the multiformats module
[chel] Error: missing argument: --from
Think you could figure out how to get rid of these?
++numVisitedKeys | ||
// Prints a message roughly every 10% of progress. | ||
if (numVisitedKeys % (numKeys / 10) < 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could combine these into one line: if (++numVisitedKeys ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but I've read some people can get confused by such syntax, because if (++numVisitedKeys ...
and if (numVisitedKeys++ ...
behave differently, so I refrained from using it
@@ -0,0 +1,77 @@ | |||
"use strict" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update help.ts
with an entry for this new command?
// deno-lint-ignore require-await | ||
export async function readData (key: string): Promise<Uint8Array | string | void> { | ||
// For some reason `[null]` is returned when the value is an empty Uint8Array. | ||
const maybeRow = readStatement.first([key]) | ||
return maybeRow === undefined ? undefined : maybeRow[0] ?? new Uint8Array() | ||
} | ||
|
||
export function writeData (key: string, value: Buffer | string): Promise<void> { | ||
export async function* iterKeys (): AsyncGenerator<string> { | ||
for (const row of iterKeysStatement.iter()) { | ||
yield row[0] | ||
} | ||
} | ||
|
||
// deno-lint-ignore require-await | ||
export async function writeData (key: string, value: Uint8Array | string): Promise<void> { | ||
checkKey(key) | ||
writeStatement.execute([key, value]) | ||
} | ||
|
||
// deno-lint-ignore require-await | ||
export async function writeDataOnce (key: string, value: Uint8Array | string): Promise<void> { | ||
checkKey(key) | ||
writeOnceStatement.execute([key, value]) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functions readData
, writeData
, and writeDataOnce
are marked as async
but don't call await
.
If this because they are using a synchronous API and we want to ensure API compatibility across backends, can you add a comment to that effect? Without an explanation, it will look weird, especially if we remove the fs
backend.
BTW, I think you can call await
on a synchronous function, so they might not even need the async
in front of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea that's mainly for API compatibility across backends, and also as a visual clue that the function returns a Promise. Maybe I'm wrong, but I'm not fan of promise-returning functions not being explicitly marked as async
.
I think you can call await on a synchronous function
Yes, which means the same API function could either return a promise or a plain value according to different backends, and it would work the same as long as the caller doesn't forget to await the result. But in the case of e.g. writeData()
, it's conceptually an IO function and it has no -Sync
suffix, so I think it should always return a promise not a plain value (that would be for a writeDataSync()
function).
Actually the count()
function is not yet consistent in that regard. I don't know yet if it would be sync or async in most backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the
count()
function is not yet consistent in that regard. I don't know yet if it would be sync or async in most backends.
I think we should assume on some backends it might be asynchronous
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work upgrading the library and getting rid of that warning! However I think there's an issue with how the library is being used.
src/utils.ts
Outdated
// Note: Could also use `@multiformats/blake2`: | ||
// import blake from "https://esm.sh/@multiformats/blake2" | ||
// const uint8array = blake.blake2b.blake2b256.encode(data) | ||
// Output is 32 byte long. | ||
const uint8array = blake.blake2b(data, undefined, 32) | ||
// if you need Buffer: https://doc.deno.land/https://deno.land/std@0.141.0/io/mod.ts/~/Buffer | ||
// const buf = Buffer.from(uint8array.buffer) | ||
return multihash.toB58String(multihash.encode(uint8array, 'blake2b-32', 32)) | ||
// Previously we had to pass this 32 byte array to `multihash.encode()` to prepend | ||
// four bytes of metadata before base-encoding to base58. | ||
return base58btc.encode(uint8array) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this is being used incorrectly, and because of that is generating the wrong prefix.
Here's proof:
-> % deno run --import-map=./vendor/import_map.json --allow-read=. --allow-write build/main.js hash build/main.js
blake32Hash(build/main.js): zHnggvswR6hwn3qEpD7VXLzZWQXgu2JVuLFzdDEHj1mRV
-> % deno run --import-map=./vendor/import_map.json --allow-read=. --allow-write build/main.js hash deno.json
blake32Hash(deno.json): zG5pBxvgCzEwrAJmYRAgWc2SgWB8AVSZqVYUTPz6Cmbta
Notice these two hashes have a different prefix.
If this was being correctly used they would have the same prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The prefix is just a 'z', hence the Multibase<'z'>
return type for the blake32Hash()
function. Is this incorrect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, but I thought the prefix had to be several bytes long? I can look into this more deeply...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I think you're right, let me submit a change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@snowteamer Here are two links that might be helpful:
- https://github.com/multiformats/js-multiformats#multihash-hashers
- https://github.com/multiformats/js-blake2
It's a little unclear to me though:
- Why they are doing different things
- What a "Block" is
- What a "CID" is
Can you investigate?
export function blake32Hash (data: string | Uint8Array): Multibase<'z'> { | ||
// TODO: for node/electron, switch to: https://github.com/ludios/node-blake2 | ||
const uint8array = blake.blake2b(data, undefined, 32) | ||
// if you need Buffer: https://doc.deno.land/https://deno.land/std@0.141.0/io/mod.ts/~/Buffer | ||
// const buf = Buffer.from(uint8array.buffer) | ||
return multihash.toB58String(multihash.encode(uint8array, 'blake2b-32', 32)) | ||
const uint8array = typeof data === 'string' ? new TextEncoder().encode(data) : data | ||
const digest = blake.blake2b.blake2b256.digest(uint8array) | ||
// While `digest.digest` is only 32 bytes long in this case, | ||
// `digest.bytes` is 36 bytes because it includes a multiformat prefix. | ||
return base58btc.encode(digest.bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, now it seems to be returning a consistent prefix. But ... why is it a different prefix from the one we had before...? 21XWnNK
vs z2Drjgb
Is it because this is a "CID"? Or is it because something in the prefix itself is different (like the type of hash or the length of the hash?)
This prefix wasn't supposed to change, and now it has, so ... I feel like we should at least figure out why. And if we can't figure it out, maybe we should say screw multihash completely and just use a text-based prefix or suffix like SSB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's because we we're tacking a blake2b-32 prefix code onto 32-byte hashes, as I tried to explain earlier on Slack. So I couldn't reproduce the same hashes as before using only the new library functions. Then I investigated a bit and if I read correctly, blake2b-32 is meant for 32-bit long hashes, not 32-bytes, so I started using blake2b-256 instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @snowteamer!
The code looks good to me. Now I went about testing it and here's what I found:
fs -> sqlite
- Migrates hidden files like
.gitkeep
. By default we should skip all hidden files I think. - Actually migrated the
groupincome.db
into the groupincome.db!
sqlite -> fs
😄 👍
✅ All good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving because we agreed that keys beginning with .
could be valid key names. And there's no clear way to filter for whether or not keys ending in .db
should be skipped.
Great job @snowteamer!
No description provided.