Consider adding `lenient()` option for string inputs #804

nfantone · 2021-12-01T13:31:56Z

I was in the process of migrating my existing yup schemas to zod and found that the main sticking point is handling the parsing/validation of request path and query string parameters. Since they are typically considered raw strings and the parsing is left to the application level, zod doesn't really provide a great DX in these scenarios. Specially when compared to how yup does it out-of-the-box.

// My existing `yup` schema
const schema = yup.object({ params: yup.object({ id: yup.number().required() }) })

// GET /user/42
// req -> { params: { id: '42' } }
schema.validateSync(req) // { params: { id: 42 } } -> Passes

// `zod` equivalent
const schema = z.object({ params: z.object({ id: z.number() }) })

// GET /user/42
// req -> { params: { id: '42' } }

schema.parse(req) // -> Fails

/*
Uncaught:
[
  {
    "code": "invalid_type",
    "expected": "number",
    "received": "string",
    "path": [
      "params",
      "id"
    ],
    "message": "Expected number, received string"
  }
]
*/

Which leads me to write custom preprocess() validators, along with custom error messages, for each expected type, every time. Here's an example for validating numerical strings.

const DEFAULT_ZOD_NUMERICAL_PARAMS = Object.freeze({
  errorMap: (issue, _ctx) => {
    const message =
      issue.code === z.ZodIssueCode.invalid_type
        ? `Expected ${issue.expected}, received ${issue.received === 'string' ? `'${_ctx.data}'` : issue.received}`
        : _ctx.defaultError

    return { message }
  },
})

function numerical(params) {
  return z.preprocess(value => {
    const num = isNilOrEmpty(value) ? NaN : Number(value)
    return Number.isNaN(num) ? value : num
  }, z.number(params ?? DEFAULT_ZOD_NUMERICAL_PARAMS))
}

export default numerical

I know this has been discussed before, but having something close to a .lenient() parsing option, allowing for values to be internally coerced would be great.

z.lenient(z.number()).parse('42') 
// 42

IMHO, this is such a common scenario when dealing with serialized data, that it only makes sense for a library such as this to support it without extra hassle. In addition, while the .preprocess() method above works, it transfers the responsibility of the parsing to the user, which is arguably the main use case of zod.

The text was updated successfully, but these errors were encountered:

scotttrinh · 2021-12-01T14:31:09Z

Thanks for the write up and the thoughtful suggestion @nfantone !

Speaking for myself, I do not view Zod's main use case as parsing JSON or query strings but as safely turning an unknown to a T. Assuming that the input data is of any particular serialized type (query strings, JSON, form data, protobuf, csv, etc) is a level of abstraction above what I feel Zod should be focused on. However, all of those use cases are important and making it easier to build the right thing for each use case is definitely something I think we should try to make easier.

Your proposal (lenient) is ergonomic, but I'm not sure that we want to hard-code this into core since there are so many ways you might want to coerce the input. A library (or a module in your own application) seems like a good place to have wrapped versions of most types that attempt to coerce their input from strings (or whatever else: Date, null -> 0, false -> 0, etc). This is the approach that I've personally taken and it gives us the flexibility that we absolutely require when doing these kinds of transformations.

Definitely open to continue discussing this and trying to support making this as easy as possible.

FWIW, this is our numeric string schema: CodeSandbox

const numericString = z
  .string()
  .refine((s) => {
    const n = Number(s);

    return Number.isFinite(n) && !Number.isNaN(n);
  })
  .transform(Number);

nfantone · 2021-12-01T15:41:51Z

Hi @scotttrinh. Thanks for your reply! You raise good points. Let me see if I can expand on them inline.

Speaking for myself, I do not view Zod's main use case as parsing JSON or query strings but as safely turning an unknown to a T.

Well, at the risk of sounding a bit cheeky, saying that while having your core/main library function be named .parse is a tough sell 😛.

In all seriousness, I get what you mean here - but I really didn't want to circumscribe the uses of zod to "parsing query strings". More like parsing arbitrary typed data. It just so happens that, this being JavaScript, you can expect plenty of very valid use cases dealing with web servers where that data is pretty much always represented as either strings or byte streams.

A library (or a module in your own application) seems like a good place to have wrapped versions of most types that attempt to coerce their input from strings

It would a good place, absolutely. A better place? zod. Or maybe a companion library? Obviously, your milage may vary, but to me, without this concept of being able to coerce string values without effort, because I always end up writing boilerplate code, zod comes in second when deciding which validation library to use for most projects.

FWIW, this is our numeric string schema

Thanks for sharing that! That's really good. Doesn't quite fit my needs (i.e., the semantics of the lenient() approach), though:

doesn't work with numbers;
parses empty/blank strings as 0;
default error message doesn't provide any hints on what is actually expected (reads "Invalid input").

Of course, you can go ahead and try to fix those things.

z.number().or(
  z.preprocess(
    value => (isNil(value) ? value : String(value).trim()),
    z
      .string()
      .min(1, 'Expected number or numeric string, received empty string')
      .refine(
        s => {
          const n = Number(s)
          return Number.isFinite(n) && !Number.isNaN(n)
        },
        value => ({ message: `Expected number or numeric string, received '${value}'` })
      )
      .transform(Number)
  )
)

But I guess this kinda further proves the point I am trying to raise.

scotttrinh · 2021-12-01T16:08:44Z

Well, at the risk of sounding a bit cheeky, saying that while having your core/main library function be named .parse is a tough sell 😛.

😆

More like parsing arbitrary typed data. It just so happens that, this being JavaScript, you can expect plenty of very valid use cases dealing with web servers where that data is pretty much always represented as either strings or byte streams.

Yeah, I 100% agree with you that dealing with serialized data is something you have to do often. I guess my feeling is that Zod is an abstraction below that as a runtime type system. In a similar way to TypeScript, a z.number should be a number and if it's not, you should have to be explicit about that. TypeScript doesn't let you be lose about it, and I feel pretty strongly that Zod should be as explicit as possible about what it is doing and not offer coercion or transformation as a hidden or implicit effect. Now, I know you're proposing something explicit (lenient) here, but what lenient does seems pretty sensitive to case-by-case variation.

It would a good place, absolutely. A better place? zod. Or maybe a companion library? Obviously, your milage may vary, but to me, without this concept of being able to coerce string values without effort, because I always end up writing boilerplate code, zod comes in second when deciding which validation library to use for most projects.

Totally fair! If Zod isn't the style of runtime type system that works for you, I think it's totally fine to say something like yup is a better fit. As it is, I appreciate that Zod is very close to TypeScript and doesn't come with any (many?) assumptions about your use cases so you can adapt it to your domain. Which brings me to the point you make here:

Doesn't quite fit my needs (i.e., the semantics of the lenient() approach), though:
But I guess this kinda further proves the point I am trying to raise.

On the contrary, I think that points to the point I'm trying to make: For us, we want to be absolutely certain that if something is a numeric string, it's a numeric string, not a date or null or something else that can be coercible to a number. And parsing empty strings as 0 is precisely what coercion should do, in my opinion, but as you've pointed out, you feel differently (it should return an error?). So, if we bring your (or my) opinion into core about how to convert between types, we risk increasing our maintenance burden, introducing more complexity (bugs), and still only properly serving a portion of users.

I think this is why I feel like building a library (or having an internal module/library) is really the best way forward here: It allows Zod to focus on just representing and narrowing TypeScript types at runtime while providing affordances to do transformations/refinements/coercions as needed. This approach seems to be successful for io-ts which similarly has a "core" and libraries that you can opt into depending on your needs. Superstruct has taken a similar core vs library approach as well. I absolutely support the idea of an ecosystem building up around Zod that supports these use cases in opinionated and ergonomic ways that align to the values and opinions of those library maintainers.

As a separate note, here is another wrinkle in the proposed solution:

And think of all of the other types beyond the primitives. What is a lenient(intersection)? Or do we need to distinguish between primitives and complex types?

nfantone · 2021-12-01T16:38:32Z

Ok, so I'm 100% behind everything you are saying here, in principle 👍🏼.

Except (there's always a "but"), I have a small issue with the implicit implication that the original use case I provided for something like lenient() is "opinionated" and/or can be dismissed as being "sensitive to case-by-case variation". I think it's fair to say that no library will cover every use case. That's a given. But I frankly can't remember the last time I had to work on a node web service and did not need to parse/validate string data. It's so quintessential to web development that, to me, if feels like a glaring omission on zod's part. Whether this logic should be housed within zod core or not, is not really the point, I believe. The argument is more centered around developer experience.

Since you brought up the topic of libraries, after seeing what other libraries closer to zod exist in the ecosystem, I think it's safe to say that a very important function of zod is to provide safe typings and parsing to web APIs (tRPC, json-schema-to-zod, etc.). With that in mind, I can't help but wonder: why is it not straightforward to express the type for something like GET /users/:id, with id being a number, with a zod schema?

And think of all of the other types beyond the primitives. What is a lenient(intersection)? Or do we need to distinguish between primitives and complex types?

I admit I didn't go deep into the implementation details of my proposal. But I suppose that lenient() should provide a "best effort" approach at coercing your input value (think JSON.parse). The next parser in the pipeline shouldn't really matter.

scotttrinh · 2021-12-01T17:14:14Z

I have a small issue with the implicit implication that the original use case I provided for something like lenient() is "opinionated" and/or can be dismissed as being "sensitive to case-by-case variation". I think it's fair to say that no library will cover every use case. That's a given. But I frankly can't remember the last time I had to work on a node web service and did not need to parse/validate string data.

Right, but as your example pointed out, you think the parser should accept numbers also, and if the string is empty it should throw an error. I think that's perfect valid, but that's not at all how I would want something similar to act. I think that's what I'm trying to say when I say that I think each developer (or team) needs to make decisions about how, when, and in what way serialized data in converted, and that providing functionality that picks a way is necessarily opinionated. I don't mean to be dismissive: I think your approach is a good one that makes sense for some use cases!

The argument is more centered around developer experience.

I 100% agree, and a lot of people have brought up other such use cases: especially forms. In each case, you might want to make different decisions about how to cast. For another example, pg converts some "serialized" data already for you, but leaves some types as strings since they can be round-trip lossy without BigInt. Making it easy to write the layer on-top of Zod that is appropriate for each team and use case is absolutely a part of what I see as Zod's responsibility (preprocess, transform, refine, etc). Providing implementations for each use case is something I wouldn't want to see Zod take on, and as the link you posted in your first comment attests to, I don't believe @colinhacks wants there to be multiple ways to transform data for certain cases like number -> string, etc.

Since you brought up the topic of libraries, after seeing what other libraries closer to zod exist in the ecosystem, I think it's safe to say that a very important function of zod is to provide safe typings and parsing to web APIs (tRPC, json-schema-to-zod, etc.). With that in mind, I can't help but wonder: what is it not straightforward to express the type for something like GET /users/:id, with id being a number, with a zod schema?

From my perspective, the schema for that (z.string().transform(Number)) is perfectly straightforward. And if you're writing a library like tRPC maybe you have some more robust schemas that check for NaN and undefined and return some appropriate error, but I think that kind of logic belongs in that library rather than in Zod. I think it makes sense that Zod treats types in a similar way to TypeScript (TypeScript treats that "type" as string also) while giving the affordances for transformation such that the input type and the output type might be different.

I don't mean to come across as confrontational, and I very much appreciate your perspective and thoughtful answers and suggestions here. My hope is that users with the right vantage point based on their expertise and opinions can provide the layer that you feel we're missing, and I very much agree with you that the ecosystem is missing these sorts of developer-friendly and use-case specific libraries. I am also frustrated that I have to write these transforms by hand, but even if we provided your specific solution, I would still write them by hand since they do not align with my team's specific viewpoint on the proper way to specify these schemas for the multitude of use cases we have (json, query strings, form data, database data, etc.). I hope my comments help to situate my opinion (and that's all this is: my opinion!) about the direction I'd like to see Zod take and don't dissuade you from continuing to advocate for your own perspective.

nfantone · 2021-12-02T09:52:42Z

You're not coming across as confrontational - quite the contrary. Don't stress about it! And many thanks for taking the time to reply thoughtfully.

And again, I do agree with your points, even if we don't see things exactly the same colour. Perhaps I'm expecting things from zod that it's just not meant to be providing. That's on me and it's absolutely fine.

The one comment I would like to address is:

From my perspective, the schema for that (z.string().transform(Number)) is perfectly straightforward.

I would like to challenge that. IMHO, there are (several) issues with your suggestion. These are the ones I can think of off the top of my head.

Unlike z.number(), produces NaN for most inputs.
Unlike z.number(), produces unexpected results for some inputs (i.e, z.string().transform(Number).parse(' ') // 0) **.
The error messages arising from failing to parse that are completely useless/misleading.
Can't chain extra number validators, such as gt, int, etc.
...and more importantly, semantics and documentation. yup.number() explicitly states intention and unequivocally conveys information about the expected input type. z.string().transform(Number) doesn't.

So, no - sadly, I don't think it's straightforward in zod (or useful, at all, in my perspective, for that matter). Like I mention before: yes - you can work around (some) of these limitations, currently. The natural question is: why?

** I appreciate your comments above on how this is "my own use case" and it points to zod covering "team needs". I get it. But I assure you, most teams in the world (I don't feel confident about saying "all", but it should be pretty darn close) would not expect curl 'https://my.company.io/api/users/%20%20%20' to gracefully and willingly fetch user 0. Making this the default behaviour makes little sense.

RichiCoder1 · 2021-12-28T01:59:57Z

I ran into this for a rather weird use case where we're currently using a system that "stringifys" all the properties passed in. So we get a correctly shaped object, but all the booleans/numbers/etc... are stringified. It's another edge case I'll admit, but it is a case.

Of the above the most "significant" issue personally is the inability to chain. So even if I wanted to write my own "type" I end up having to go through contortions to make it look like a normal type.

nfantone · 2021-12-28T12:16:44Z

So, if we bring your (or my) opinion into core about how to convert between types, we risk increasing our maintenance burden, introducing more complexity (bugs), and still only properly serving a portion of users.

This is true for every design decision on every project, ever. Any and all additions to an existing stack bear forward a certain opinion, dismissing (voluntarily or not) others. I frankly don't see this being an argument against implementing new stuff.

Also, I'm not convinced that the fact that there might be "different opinions" on how to convert data, prevents us from giving users the option. There are also different opinions on how to parse functions, objects and every other type out there and still zod provides ways to handle those. Like any other library on npm, zod is alredy very opinionated.

const z = require('zod')

typeof null // 'object'
z.object().parse(null) // 'Expected object, received null' <--- Opinion 👀

markandrus · 2022-01-02T11:42:32Z

I'd like to share what I've been experimenting with while working on a project to convert the outputs of openapi-typescript to Zod in order to implement a strongly typed REST API server.

Parsing JSON bodies is straightforward. But, as already mentioned in the thread, parsing path, query, and header parameters is trickier because, although the Open API spec and Zod schemas may specify a parameter to be boolean-valued, numeric, etc., these parameters always arrive as strings.

I've taken the approach of preprocessing my Zod schemas (ZodObjects) for handling these parameters. I do this by preprocessing each ZodObject property:

Detect whether the property could be boolean-valued or numeric. This requires recursing through ZodLazy and ZodUnion, looking for instances of ZodBoolean and ZodNumber. (I may need to handle other types, too, but I haven't gotten there yet.)
Wrap the property's schema in z.preprocess:
1. If the property could be boolean-valued, arrives as a string, and its trimmed lowercase representation equals "true" or "false", return the corresponding boolean.
2. If the property could be numeric, arrives as a non-empty string, and the result of parsing it as a Number is neither NaN nor Infinity, return the number.
3. Otherwise, return the unparsed input.

I give precedence to boolean values and then numeric values before falling back to whatever the ZodSchema is looking for (which hopefully can be parsed from a string). Although it's a bit annoying to do this, I am forced to take some decisions that may not be applicable to other use cases. I think I generally agree with @scotttrinh's comment here: #804 (comment)

stale · 2022-04-28T19:11:57Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

alavkx · 2022-07-28T20:18:26Z

FWIW I'm encountering similar struggles when attempting to work with number inputs, using zod as a react-hook-form resolver. It is.....pretty challenging to figure out.
https://codesandbox.io/s/stupefied-moser-0fpq94?file=/src/App.tsx

Given...

a strongly typed endpoint
a matching zod schema (consider tRPC)
a form design for HTTP PATCH; a partial update
the need to represent NO CHANGE as an empty input
HTML's native behavior to represent EMPTY as empty string ('')

How do you represent a number input?

ryanhaticus · 2023-01-17T22:08:16Z

Please see coercion: https://github.com/colinhacks/zod#coercion-for-primitives

JacobWeisenburger added the enhancement New feature or request label Feb 27, 2022

selimb mentioned this issue Mar 23, 2022

Throwing errors in preprocess #696

Closed

rottmann mentioned this issue Mar 30, 2022

Shorter way to validate boolean from string "true" / "false" / overwrite a schema #1055

Closed

stale bot added the wontfix This will not be worked on label Apr 28, 2022

stale bot closed this as completed May 5, 2022

alavkx mentioned this issue Jul 28, 2022

How can I allow null as input value of a schema, but not as a valid output? #1206

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider adding `lenient()` option for string inputs #804

Consider adding `lenient()` option for string inputs #804

nfantone commented Dec 1, 2021 •

edited

Loading

scotttrinh commented Dec 1, 2021

nfantone commented Dec 1, 2021 •

edited

Loading

scotttrinh commented Dec 1, 2021

nfantone commented Dec 1, 2021 •

edited

Loading

scotttrinh commented Dec 1, 2021

nfantone commented Dec 2, 2021 •

edited

Loading

RichiCoder1 commented Dec 28, 2021

nfantone commented Dec 28, 2021

markandrus commented Jan 2, 2022

stale bot commented Apr 28, 2022

alavkx commented Jul 28, 2022 •

edited

Loading

ryanhaticus commented Jan 17, 2023

Consider adding lenient() option for string inputs #804

Consider adding lenient() option for string inputs #804

Comments

nfantone commented Dec 1, 2021 • edited Loading

scotttrinh commented Dec 1, 2021

nfantone commented Dec 1, 2021 • edited Loading

scotttrinh commented Dec 1, 2021

nfantone commented Dec 1, 2021 • edited Loading

scotttrinh commented Dec 1, 2021

nfantone commented Dec 2, 2021 • edited Loading

RichiCoder1 commented Dec 28, 2021

nfantone commented Dec 28, 2021

markandrus commented Jan 2, 2022

stale bot commented Apr 28, 2022

alavkx commented Jul 28, 2022 • edited Loading

Given...

How do you represent a number input?

ryanhaticus commented Jan 17, 2023

Consider adding `lenient()` option for string inputs #804

Consider adding `lenient()` option for string inputs #804

nfantone commented Dec 1, 2021 •

edited

Loading

nfantone commented Dec 1, 2021 •

edited

Loading

nfantone commented Dec 1, 2021 •

edited

Loading

nfantone commented Dec 2, 2021 •

edited

Loading

alavkx commented Jul 28, 2022 •

edited

Loading