Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: json and jsonb parsing in postgres-js #1785

Merged
merged 10 commits into from
Aug 6, 2024

Conversation

Angelelz
Copy link
Collaborator

Second (and better) attempt to close #724 and close #1511

This PR depends on #1659 which has already been merged to beta.
After figuring out how to bypass parsers and serializers in postgres.js driver, this technique can be applied to the issue with json and jsonb.

  • Added json and jsonb to the list of bypassed types on postgres.js driver
  • Added simple tests to pg and postgres-js integration tests for json and jsonb columns.

There are additional tests in #1560. @AndriiSherman Please merge that one after merging this.

@AndriiSherman AndriiSherman changed the base branch from main to beta January 11, 2024 14:48
@aseemk
Copy link

aseemk commented Jan 11, 2024

Thank you @Angelelz & @AndriiSherman for your thorough work here! 😃 🙏

One question I've been wondering: when this is fixed, will there be any easy way to migrate all existing data/columns we persisted through Drizzle that was incorrectly persisted as JSON strings instead of JSON objects directly?

I'm not aware of any "JSON parse" type function in Postgres, so I'm not sure this'll be possible with a simple SQL migration. So this might require a Node/TypeScript migration (which drizzle-kit doesn't support, right?) to read existing data with the old parser (parsing JSON strings to JSON objects) and write it back with the new parser.

I can easily write such a migration myself, but my main question is: once this fix is merged, will there be any way to read data through the old parser with the new Drizzle code? Perhaps I should update my Drizzle schema to change the type of the existing column to text rather than json or jsonb, and explicitly add a new column typed as json or jsonb?

Lemme know if my q is unclear. Thank you very much!

@AndriiSherman
Copy link
Member

Thank you @Angelelz & @AndriiSherman for your thorough work here! 😃 🙏

One question I've been wondering: when this is fixed, will there be any easy way to migrate all existing data/columns we persisted through Drizzle that was incorrectly persisted as JSON strings instead of JSON objects directly?

I'm not aware of any "JSON parse" type function in Postgres, so I'm not sure this'll be possible with a simple SQL migration. So this might require a Node/TypeScript migration (which drizzle-kit doesn't support, right?) to read existing data with the old parser (parsing JSON strings to JSON objects) and write it back with the new parser.

I can easily write such a migration myself, but my main question is: once this fix is merged, will there be any way to read data through the old parser with the new Drizzle code? Perhaps I should update my Drizzle schema to change the type of the existing column to text rather than json or jsonb, and explicitly add a new column typed as json or jsonb?

Lemme know if my q is unclear. Thank you very much!

That's a really good question. Before merging this one we will think this through. I guess we will write a small guide on how you can migrate this data to be a proper JSON

@aseemk
Copy link

aseemk commented Jan 11, 2024

That'd be great, thank you.

I'm remembering that Drizzle doesn't actually know or ask about the JSON schema/type of JSON columns. The $type<T>() function only takes a compile-time TypeScript type, not e.g. a runtime Zod schema. (Feature request btw: that would be nice!) So detecting, handling, and migrating incorrectly vs. correctly persisted JSON data will prob need to be an application-level concern, not something Drizzle can automatically encapsulate.

In our case, every JSON(B) data type we have is an array or object at the top-level, not a string. So we can easily differentiate between "incorrectly" persisted JSON today (SELECT jsonb_typeof(column) returns "string") and "correctly" persisted JSON (SELECT jsonb_typeof(column) would return "object" or "array").

Given this, we can achieve an in-place (not needing to add a new column) and zero-downtime migration with the following approach, provided here if helpful for y'all or anyone else:

    • I update Drizzle to the new version, so it'll now return incorrectly persisted JSON (JSON strings) as strings and no longer as objects or arrays.
    • I keep my Drizzle column schema typed as json or jsonb in my TypeScript code, but I expand the TypeScript type I pass into $type<T>() to T | string. This forces my app to handle both old/incorrect and new/correct cases.
    • My app can call JSON.parse() on the values if they're old/incorrect strings (and maybe even run them through our own Zod schemas for validation) to get the correct objects or arrays.
    • However, Drizzle will now write JSON data correctly (as objects or arrays) on all inserts and updates going forward.
    • Our DB will now start having a mix or old/incorrect and new/correct JSON, but our app will continue working bug-free. 🤞
    • I write and manually run a script (conceptually a TypeScript DB migration) to convert all existing data from JSON strings to JSON objects or arrays.
    • Old data can be detected/selected through WHERE jsonb_typeof(column) = 'string'.
    • Now our DB has only correct JSON! 🙌
    • I update my code Drizzle schema to be just $type<T>() again, no longer T | string, and remove my temporary code to handle old/incorrectly persisted JSON strings.
    • Now my code is back to how it was (simple), but the DB is correct now and going forward. 🎉

^ Those are just my theoretical thoughts. Feedback welcome if you think I'm missing anything. Thanks again!

@Hebilicious
Copy link

@aseemk If you can afford a short downtime/maintenance on the app and have a migration script that reads all tables with jsonb columns, read the jsonb column and fix it, then you could run it as a one time operation on all environments directly after the regular drizzle-kit migrations. Then in your application code nothing has to change, as you'd update the dependencies and deploy directly after.

@aseemk
Copy link

aseemk commented Jan 14, 2024

Yeah, you're right that brief downtime/maintenance may be reasonable.

I also realized that I think a migration might be entirely doable in SQL! There's no json_parse function or similar in Postgres, but Postgres effectively already implements a JSON parser since it needs to support a SQL query setting a JSON(B) column to a string representation of some JSON. So since Drizzle incorrectly stringified JSON twice, and persisted a JSON string (of another JSON value), I think a simple query that "unescapes" one level of JSON stringification should work!

It appears to work on my own data that has both top-level objects and top-level arrays as JSON values (both containing nested objects and arrays). E.g. for a single JSONB column named tags in a single table named users:

select id

, tags::text as bad_jsonb_str -- should see wrapper quotes and backslash escapes
, jsonb_typeof(tags) as bad_jsonb_type -- should be "string"

, left(tags::text, 2) as starting_chars -- should be `"{` or `"[` for top-level objects or arrays respectively
, right(tags::text, 2) as ending_chars  -- should be `}"` or `]"` for top-level objects or arrays respectively

-- Remove the starting and ending quotes:
, substring(tags::text from 2 for length(tags::text) - 2) as trimmed

-- Remove any backslash escape characters:
, regexp_replace(tags::text, '\\(.)', '\1', 'g') as unescaped

-- Combine the two above!
, regexp_replace(substring(tags::text from 2 for length(tags::text) - 2), '\\(.)', '\1', 'g') as good_jsonb_str
, regexp_replace(substring(tags::text from 2 for length(tags::text) - 2), '\\(.)', '\1', 'g')::jsonb as good_jsonb
, jsonb_typeof(
  regexp_replace(substring(tags::text from 2 for length(tags::text) - 2), '\\(.)', '\1', 'g')::jsonb
) as good_jsonb_type -- should now be object or array!

from users
limit 10

Can make this into a migration by changing that into an UPDATE ... SET ... WHERE ... (or updating in batches with a WITH, etc.).

Hope that helps others!

@Hebilicious
Copy link

This doesn't affect pg users right?

@Angelelz
Copy link
Collaborator Author

This doesn't affect pg users right?

If you can insert and retrieve your json data without workarounds or issues, this doesn't affect you.

@aseemk
Copy link

aseemk commented Jan 15, 2024

To clarify: we use pg (edit: nope, we use Postgres.js/postgres; see below), and this affects us.

We're able to insert and retrieve our JSON data through Drizzle without any issues today. But we're unable to "reach into" that JSON data in SQL, e.g. whether in migrations (backfills) or for atomic, field-wise updates. This has been a source of inconvenience (but not downtime or anything like that) several times for us, so I'm excited for this fix! Thanks again. =)

@Hebilicious
Copy link

Hebilicious commented Jan 16, 2024

To clarify: we use pg, and this affects us.

We're able to insert and retrieve our JSON data through Drizzle without any issues today. But we're unable to "reach into" that JSON data in SQL, e.g. whether in migrations (backfills) or for atomic, field-wise updates. This has been a source of inconvenience (but not downtime or anything like that) several times for us, so I'm excited for this fix! Thanks again. =)

Thanks for the heads-up, I've never tried accessing that data without drizzle (yet)

@Angelelz this PR only address postgres.js. Is there another bug that affects pg then ?

@Angelelz
Copy link
Collaborator Author

Angelelz commented Jan 17, 2024

To clarify: we use pg, and this affects us.

We're able to insert and retrieve our JSON data through Drizzle without any issues today. But we're unable to "reach into" that JSON data in SQL, e.g. whether in migrations (backfills) or for atomic, field-wise updates. This has been a source of inconvenience (but not downtime or anything like that) several times for us, so I'm excited for this fix! Thanks again. =)

Can you clarify here? What do you mean by can't reach into the json data with SQL? What sql statement doesn't work for you on json?

@Angelelz
Copy link
Collaborator Author

@Angelelz this PR only address postgres.js. Is there another bug that affects pg then ?

I dont think Pg has an issue. All the test are passing

@aseemk
Copy link

aseemk commented Jan 17, 2024

I see my confusion here:

  • We indeed use postgres (Postgres.js), not pg (node-postgres)
  • I mixed these two up bc 95% of our code imports from drizzle-orm/pg-core (postgres and drizzle-orm/postgres-js are only imported for the DB connection), and I mistakenly thought pg-core here was the same thing as pg, but it's not

Sorry for the mixup! So it sounds like yes, this bug only affects us because we're using Postgres.js (postgres). The bug exhibits the same behavior as everyone's talking about in #724 and #1511.

Thanks for clarifying!

@juliomuhlbauer
Copy link

juliomuhlbauer commented Feb 5, 2024

Any update on this? Is there a workaround?

@hawkett
Copy link

hawkett commented Feb 16, 2024

Is this fix waiting on a migration path for existing jsonb data? If so, would it be possible to decouple these problems so it could be released? A couple of possibilities:

  1. Release as a breaking change with config setting that causes jsonb to use the old behaviour, and then folks can migrate their data independently of this release. Maybe risky, as there will no doubt be people who upgrade without realising they need to set the config, and end up storing both formats.
  2. Release as a non-breaking change with a config setting that causes jsonb to use the correct behaviour. Independently of this PR, the default jsonb behaviour could be documented as deprecated, to be replaced by the correct behaviour in some future breaking version, which would also remove the config. This option seems to create the least resistance to getting the fix available as soon as possible.

If this isn't likely to land soon, would it be possible to identify a recommended workaround?

Thanks for the help!

@Hansenq
Copy link

Hansenq commented Feb 16, 2024

I didn't really see a big difference between Postgres.js and node-postgres (pg), so I switched to node-postgres which doesn't have this issue (it also has the benefit of giving better error logs). It's not really the most satisfying workaround but it is a workaround.

The only downside is that you'll have to write a data migration that translates the old escaped JSON values into the new ones yourself.

@niko-gardenanet
Copy link

I didn't really see a big difference between Postgres.js and node-postgres (pg)

I tried switching to node-postgres and had major performance issues. Not sure why. Really hope this issue gets resolved soon

@niko-gardenanet
Copy link

niko-gardenanet commented Feb 23, 2024

I also tried passing the JSON with the sql helper, but it doesn't work in my case because i need to insert a top level JSON Array ([]).

sql`${[]}::jsonb` // does not work

Produces the following SQL with a syntax error...

insert into
  "my_table" ("my_column")
values
  (()::jsonb)

...and empty params array

// Params
[]
sql`${JSON.stringify([])}::jsonb` // also does not work

Produces the correct SQL...

insert into
  "my_table" ("my_column")
values
  ($1::jsonb)

...but passes the JSON array as a string which gets turned into a JSON string

// Params
[
  "[]", // gets turned into a JSON string i.e. "\"[]\""
]

Anyone knows a workaround for that?

Edit: I found a solution.

sql`${new Param([])}::jsonb` // wrap JSON array in new Param

Credit: #724 (comment)

@Hebilicious
Copy link

Is this fix waiting on a migration path for existing jsonb data? If so, would it be possible to decouple these problems so it could be released? A couple of possibilities:

  1. Release as a breaking change with config setting that causes jsonb to use the old behaviour, and then folks can migrate their data independently of this release. Maybe risky, as there will no doubt be people who upgrade without realising they need to set the config, and end up storing both formats.
  2. Release as a non-breaking change with a config setting that causes jsonb to use the correct behaviour. Independently of this PR, the default jsonb behaviour could be documented as deprecated, to be replaced by the correct behaviour in some future breaking version, which would also remove the config. This option seems to create the least resistance to getting the fix available as soon as possible.

If this isn't likely to land soon, would it be possible to identify a recommended workaround?

Thanks for the help!

I'm looking into migrating from pg to postgres-js so this is important for me as well. I think #1511 (comment) might be a good enough workaround :

Create a custom jsonB type instead of using the drizzle provided one.
@Angelelz could you confirm that this works please ?

@LeonAlvarez
Copy link

LeonAlvarez commented Apr 3, 2024

Is this fix waiting on a migration path for existing jsonb data? If so, would it be possible to decouple these problems so it could be released? A couple of possibilities:

  1. Release as a breaking change with config setting that causes jsonb to use the old behaviour, and then folks can migrate their data independently of this release. Maybe risky, as there will no doubt be people who upgrade without realising they need to set the config, and end up storing both formats.
  2. Release as a non-breaking change with a config setting that causes jsonb to use the correct behaviour. Independently of this PR, the default jsonb behaviour could be documented as deprecated, to be replaced by the correct behaviour in some future breaking version, which would also remove the config. This option seems to create the least resistance to getting the fix available as soon as possible.

If this isn't likely to land soon, would it be possible to identify a recommended workaround?
Thanks for the help!

I'm looking into migrating from pg to postgres-js so this is important for me as well. I think #1511 (comment) might be a good enough workaround :

Create a custom jsonB type instead of using the drizzle provided one. @Angelelz could you confirm that this works please ?

it works I was using it while back can check #666 (comment)

@arjunyel
Copy link
Contributor

patch-package form for anyone who needs it

diff --git a/node_modules/drizzle-orm/postgres-js/driver.js b/node_modules/drizzle-orm/postgres-js/driver.js
index 7e48e8c..219e0a0 100644
--- a/node_modules/drizzle-orm/postgres-js/driver.js
+++ b/node_modules/drizzle-orm/postgres-js/driver.js
@@ -12,6 +12,8 @@ function drizzle(client, config = {}) {
     client.options.parsers[type] = transparentParser;
     client.options.serializers[type] = transparentParser;
   }
+  client.options.serializers['114'] = transparentParser;
+  client.options.serializers['3802'] = transparentParser;
   const dialect = new PgDialect();
   let logger;
   if (config.logger === true) {

@SiNONiMiTY
Copy link

Bumping this MR. The custom type workaround works like a charm, but this issue needs to be resolved.

@alin-plamadeala
Copy link

alin-plamadeala commented May 27, 2024

As for migration guide, wouldn't this query fix the issue?

UPDATE <table_name>
SET <col_1_name> = (<col_1_name> #>> '{}')::jsonb,
    <col_2_name> = (<col_2_name> #>> '{}')::jsonb
   ...
    <col_n_name> = (<col_n_name> #>> '{}')::jsonb

;

@deepsingh132
Copy link

Please merge the patch if possible

@eddienubes
Copy link

@AndriiSherman Is there any update? It's been quite some time

@DougSchmidt-Leeward
Copy link

I've been playing with an UPDATE statement that will correct existing JSONB columns that have had the extra JSON.stringify(data) encoding applied.

Here is what is working for our project:

UPDATE {tableName} SET {columnName} = ({columnName}->>0)::jsonb
WHERE {columnName}::text LIKE '"%"';

Running that statement will correct bad columns, but leave good columns untouched, so it is safe to run multiple times.

Caveats

This works when JSON.stringify(data) was applied to data that is an object or array, but does not work if data is a primitive JSON type: string, number, boolean, or NULL.

I'm not sure how much real world application data will be using a JSONB column to store a raw JSON primitive type (that kinda defeats the purpose of JSONB), but I wanted to call attention to this limitation.

@andreial
Copy link

Any updates on this PR?

@nikosgram
Copy link

I also tried passing the JSON with the sql helper, but it doesn't work in my case because i need to insert a top level JSON Array ([]).

sql`${[]}::jsonb` // does not work

Produces the following SQL with a syntax error...

insert into
  "my_table" ("my_column")
values
  (()::jsonb)

...and empty params array

// Params
[]
sql`${JSON.stringify([])}::jsonb` // also does not work

Produces the correct SQL...

insert into
  "my_table" ("my_column")
values
  ($1::jsonb)

...but passes the JSON array as a string which gets turned into a JSON string

// Params
[
  "[]", // gets turned into a JSON string i.e. "\"[]\""
]

Anyone knows a workaround for that?

Edit: I found a solution.

sql`${new Param([])}::jsonb` // wrap JSON array in new Param

Credit: #724 (comment)

I found a workaround by executing raw SQL that includes the JSON or JSONB data, pre-stringified. Here's an example:

const json_data = JSON.stringify(data);

db.execute(sql.raw(`INSERT INTO table ("json_data") VALUES ($$${json_data}$$::jsonb)`));

I'm using dollar-quoted constants instead of single or double quotes because the json_data string might contain single quotes and will definitely contain double quotes.

For more information on dollar-quoted constants, refer to the PostgreSQL documentation: https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING

Because the code above looks a bit messy, here is a cleaner example:

const json_data = JSON.stringify(data);

db.execute(
  sql.raw(
    'INSERT INTO table ("json_data") VALUES ($$' + json_data + "$$::jsonb)"
  )
);

@AndriiSherman
Copy link
Member

Fixing conflicts and merging this into the beta tag. I will prepare a guide on how to fix an existing database and will reuse and mention some of the comments from this PR

@AndriiSherman AndriiSherman merged commit 6234cbf into drizzle-team:beta Aug 6, 2024
6 checks passed
@AndriiSherman
Copy link
Member

Merged. It will be available in drizzle-orm@beta in about 30 minutes and will remain in beta for some time (possibly a few days) before being released as the latest version

@tonyxiao
Copy link

tonyxiao commented Aug 7, 2024

Finally, thank you!! Which specific version does this one correspond to?

@tposch
Copy link

tposch commented Aug 20, 2024

FYI: This appears to break previous workarounds like payload: sql`${params}::jsonb` after updating to v0.33. I get the error TypeError: The "string" argument must be of type string or an instance of Buffer or ArrayBuffer. Received an instance of Object.

Now I can simply write payload: params in my update statement and it works fine, but I do need to go and make all those changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet