Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add chel migrate and use pinned esm.sh deps #7

Merged
merged 8 commits into from
May 12, 2023

Conversation

snowteamer
Copy link
Collaborator

No description provided.

Copy link
Member

@taoeffect taoeffect left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done @snowteamer! Thanks also for the pinned esh.sh deps! Left my review below.

src/utils.ts Outdated Show resolved Hide resolved
@@ -19,3 +32,22 @@ export function isDir (path: string | URL): boolean {
return false
}
}

export function isFile (path: string | URL): boolean {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should we treat symlinks?

Whatever decision you pick, can you document it in English here in a comment above this function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt symlinks should be accepted as valid keys in a key-value store, especially since they would likely point to somewhere outside the data folder. So I'd rather skip them when migrating from fs.
Besides, if a symlink gets passed in as command argument e.g. --out, I'm not sure but I guess the link should be followed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah if the output directory includes a symlink we should follow it

src/migrate.ts Outdated Show resolved Hide resolved
Copy link
Member

@taoeffect taoeffect left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @snowteamer! Left a few comments (minor things).

Main thing is that I'm seeing annoying red deprecation warnings in the console when I try this out locally:

-> % deno task chel migrate     
Task chel deno task build && deno run --import-map=./vendor/import_map.json --allow-read=. --allow-write build/main.js "migrate"
Task build deno run --import-map=./vendor/import_map.json --no-remote --allow-run --allow-read --allow-env --allow-write=./build --allow-net scripts/build.ts
[npm] deprecated multibase@4.0.6: This module has been superseded by the multiformats module
built: build
[npm] deprecated multibase@4.0.6: This module has been superseded by the multiformats module
[chel] Error: missing argument: --from

Think you could figure out how to get rid of these?

Comment on lines +70 to +72
++numVisitedKeys
// Prints a message roughly every 10% of progress.
if (numVisitedKeys % (numKeys / 10) < 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could combine these into one line: if (++numVisitedKeys ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but I've read some people can get confused by such syntax, because if (++numVisitedKeys ... and if (numVisitedKeys++ ... behave differently, so I refrained from using it

@@ -0,0 +1,77 @@
"use strict"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update help.ts with an entry for this new command?

Comment on lines +44 to +67
// deno-lint-ignore require-await
export async function readData (key: string): Promise<Uint8Array | string | void> {
// For some reason `[null]` is returned when the value is an empty Uint8Array.
const maybeRow = readStatement.first([key])
return maybeRow === undefined ? undefined : maybeRow[0] ?? new Uint8Array()
}

export function writeData (key: string, value: Buffer | string): Promise<void> {
export async function* iterKeys (): AsyncGenerator<string> {
for (const row of iterKeysStatement.iter()) {
yield row[0]
}
}

// deno-lint-ignore require-await
export async function writeData (key: string, value: Uint8Array | string): Promise<void> {
checkKey(key)
writeStatement.execute([key, value])
}

// deno-lint-ignore require-await
export async function writeDataOnce (key: string, value: Uint8Array | string): Promise<void> {
checkKey(key)
writeOnceStatement.execute([key, value])
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functions readData, writeData, and writeDataOnce are marked as async but don't call await.

If this because they are using a synchronous API and we want to ensure API compatibility across backends, can you add a comment to that effect? Without an explanation, it will look weird, especially if we remove the fs backend.

BTW, I think you can call await on a synchronous function, so they might not even need the async in front of them.

Copy link
Collaborator Author

@snowteamer snowteamer Apr 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea that's mainly for API compatibility across backends, and also as a visual clue that the function returns a Promise. Maybe I'm wrong, but I'm not fan of promise-returning functions not being explicitly marked as async.

I think you can call await on a synchronous function

Yes, which means the same API function could either return a promise or a plain value according to different backends, and it would work the same as long as the caller doesn't forget to await the result. But in the case of e.g. writeData(), it's conceptually an IO function and it has no -Sync suffix, so I think it should always return a promise not a plain value (that would be for a writeDataSync() function).

Actually the count() function is not yet consistent in that regard. I don't know yet if it would be sync or async in most backends.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the count() function is not yet consistent in that regard. I don't know yet if it would be sync or async in most backends.

I think we should assume on some backends it might be asynchronous

Copy link
Member

@taoeffect taoeffect left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work upgrading the library and getting rid of that warning! However I think there's an issue with how the library is being used.

src/utils.ts Outdated
Comment on lines 9 to 16
// Note: Could also use `@multiformats/blake2`:
// import blake from "https://esm.sh/@multiformats/blake2"
// const uint8array = blake.blake2b.blake2b256.encode(data)
// Output is 32 byte long.
const uint8array = blake.blake2b(data, undefined, 32)
// if you need Buffer: https://doc.deno.land/https://deno.land/std@0.141.0/io/mod.ts/~/Buffer
// const buf = Buffer.from(uint8array.buffer)
return multihash.toB58String(multihash.encode(uint8array, 'blake2b-32', 32))
// Previously we had to pass this 32 byte array to `multihash.encode()` to prepend
// four bytes of metadata before base-encoding to base58.
return base58btc.encode(uint8array)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this is being used incorrectly, and because of that is generating the wrong prefix.

Here's proof:

-> % deno run --import-map=./vendor/import_map.json --allow-read=. --allow-write build/main.js hash build/main.js 
blake32Hash(build/main.js): zHnggvswR6hwn3qEpD7VXLzZWQXgu2JVuLFzdDEHj1mRV
-> % deno run --import-map=./vendor/import_map.json --allow-read=. --allow-write build/main.js hash deno.json 
blake32Hash(deno.json): zG5pBxvgCzEwrAJmYRAgWc2SgWB8AVSZqVYUTPz6Cmbta

Notice these two hashes have a different prefix.

If this was being correctly used they would have the same prefix.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prefix is just a 'z', hence the Multibase<'z'> return type for the blake32Hash() function. Is this incorrect?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, but I thought the prefix had to be several bytes long? I can look into this more deeply...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think you're right, let me submit a change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snowteamer Here are two links that might be helpful:

It's a little unclear to me though:

  • Why they are doing different things
  • What a "Block" is
  • What a "CID" is

Can you investigate?

Comment on lines +7 to +13
export function blake32Hash (data: string | Uint8Array): Multibase<'z'> {
// TODO: for node/electron, switch to: https://github.com/ludios/node-blake2
const uint8array = blake.blake2b(data, undefined, 32)
// if you need Buffer: https://doc.deno.land/https://deno.land/std@0.141.0/io/mod.ts/~/Buffer
// const buf = Buffer.from(uint8array.buffer)
return multihash.toB58String(multihash.encode(uint8array, 'blake2b-32', 32))
const uint8array = typeof data === 'string' ? new TextEncoder().encode(data) : data
const digest = blake.blake2b.blake2b256.digest(uint8array)
// While `digest.digest` is only 32 bytes long in this case,
// `digest.bytes` is 36 bytes because it includes a multiformat prefix.
return base58btc.encode(digest.bytes)
Copy link
Member

@taoeffect taoeffect May 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, now it seems to be returning a consistent prefix. But ... why is it a different prefix from the one we had before...? 21XWnNK vs z2Drjgb

Is it because this is a "CID"? Or is it because something in the prefix itself is different (like the type of hash or the length of the hash?)

This prefix wasn't supposed to change, and now it has, so ... I feel like we should at least figure out why. And if we can't figure it out, maybe we should say screw multihash completely and just use a text-based prefix or suffix like SSB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because we we're tacking a blake2b-32 prefix code onto 32-byte hashes, as I tried to explain earlier on Slack. So I couldn't reproduce the same hashes as before using only the new library functions. Then I investigated a bit and if I read correctly, blake2b-32 is meant for 32-bit long hashes, not 32-bytes, so I started using blake2b-256 instead.

Copy link
Member

@taoeffect taoeffect left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @snowteamer!

The code looks good to me. Now I went about testing it and here's what I found:

fs -> sqlite

  • Migrates hidden files like .gitkeep. By default we should skip all hidden files I think.
  • Actually migrated the groupincome.db into the groupincome.db!

Screen Shot 2023-05-12 at 12 00 08 PM

sqlite -> fs

😄 👍

✅ All good!

Copy link
Member

@taoeffect taoeffect left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving because we agreed that keys beginning with . could be valid key names. And there's no clear way to filter for whether or not keys ending in .db should be skipped.

Great job @snowteamer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants