Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle null value without quotes #335

Closed
wants to merge 4 commits into from
Closed

Handle null value without quotes #335

wants to merge 4 commits into from

Conversation

gwukelic
Copy link
Contributor

@gwukelic gwukelic commented Jul 25, 2023

This code change will allow null values to be used without be surrounded by quotation marks, i.e. "null".

Before this change, type: null was not acceptable but type: "null" was. With these changes, both null and "null" will be acceptable syntax. This change will also cover null values inside of union types.

Acceptable ✅

type: union
types:
  - type: null
  - type: bool

@criccomini
Copy link
Contributor

A type_dict of None isn't a valid entry for Recap's concrete syntax tree. For details, see:

https://recap.build/specs/type/0.1.1/

I don't feel converting the invalid type_dict to a null type is a good idea; it's not part of the spec. It should raise an error,

@criccomini
Copy link
Contributor

@gwukelic LMK if you've got a specific need for this change. Otherwise, I'll close it off in a few days.

@gwukelic
Copy link
Contributor Author

Hi @criccomini,

I am running into issues with contracts that have unions. The union type only works if the null type is written as a string.

This works ✅

  - name: location
    type: union
    types:
      - type: "null"
      - type: bool

This does not ❌

  - name: location
    type: union
    types:
      - type: null
      - type: bool

But I think null without strings should be acceptable. Many people will write contracts with the null value like that. Plus the recap spec shows null without strings as valid syntax.
Screenshot 2023-07-26 at 11 39 12 AM

Lastly, I don't expect all these changes to be added. This PR is a WIP and I'm currently testing my proposed changes against different contracts. The exact error I am seeing without these changes is 'NoneType' object has no attribute 'copy'. I propose converting null to "null" in the clean_dict method. I would love to hear your thoughts. Thanks!

@gwukelic gwukelic changed the title Handle null type WIP: Handle null type in unions Jul 26, 2023
@gwukelic gwukelic changed the title WIP: Handle null type in unions Handle null value without quotes Jul 26, 2023
@criccomini
Copy link
Contributor

criccomini commented Jul 27, 2023

EDIT: Oof, you're right, the spec DOES show that as valid. That's an oversight on my part! 🤦 It's not meant to be valid syntax--only "null" is supported. Lol, even I made the mistake. Anyway, read-on for my thoughts.

This issue is an artifact of YAML, which is coercing text into a string for "boolean" since it's not a literal, but it's leaving null as a null type, since null is a literal in YAML. You can see this in action here:

>>> import yaml
>>> 
>>> # Your YAML document stored as a string variable
>>> yaml_string = """
... type: union
... types:
...   - type: null
...   - type: boolean
...     bits: 32
... """
>>> 
>>> parsed_dict = yaml.safe_load(yaml_string)
>>> 
>>> print(parsed_dict)
{'type': 'union', 'types': [{'type': None}, {'type': 'boolean', 'bits': 32}]}

The 'null' is required to tell YAML that you want the string, not the null literal--as you've discovered.

TOML, for example, doesn't behave this way:

>>> import tomli
>>> 
>>> toml_string = """
... type = "union"
... types = [
...   { type = "null" },
...   { type = "string", bits = 32 }
... ]
... """
>>> 
>>> parsed_dict = tomli.loads(toml_string)
>>> 
>>> print(parsed_dict)
{'type': 'union', 'types': [{'type': 'null'}, {'type': 'string', 'bits': 32}]}

If you were to put null instead of "null" in the TOML, you'd get a parse error. In fact, TOML has no null literal at all.

Similarly, JSON does not coerce as YAML does. You can set "type": null in JSON. It would parse, but would not adhere to the Recap type spec. And you can't specify "type": boolean, as this would be a parse error--again JSON doesn't coerce unknown literals as strings the way YAML does.

I'm wondering if the real question here is whether we should revisit how we represent nulls/nullable fields. I had a somewhat in-depth conversation with @cpard last week about this very subject. I'm going to paste my comment from that discussion below (it was buried in a private repo). Comically, even I made the null mistake in my YAML example below. 🤣


Intro

I'm glad you're calling this out, because it's worth some discussion.

For the AST, I feel strongly we should keep the nullable design as it is now (UnionType of NullType and other types).

I think what's jamming you up is the CST, which right now is a mirror of the AST. In YAML, it's something like:

# A struct with an optional string field called "secondary_phone"
type: struct
fields:
  - name: secondary_phone
    type: ["null", "string32"]
    default: null

This mirrors how Avro handles nullable fields. It's always been a bit goofy as developer ergonomics go.

If you don't use the verbose syntax rather than aliases it's even uglier:

type: union
types:
    - type: null
    - type: string
      bytes: 128
      variable: false
default: null

I'm open to optimizing how the CST handles nulls.

I can think of three approaches:

  1. nullable: true
  2. optional: true
  3. default: null with a non-"null" type

nullable: true

# A struct with an optional string field called "secondary_phone"
type: struct
fields:
  - name: secondary_phone
    type: "string32"
    nullable: true

One thing to call out with something like nullable is that there's NULLABLE and then there's DEFAULT NULL. In the above example, the field is still technically required, it can just support null values. To emulate NULLABLE DEFAULT NULL, you'd need:

# A struct with an optional string field called "secondary_phone"
type: struct
fields:
  - name: secondary_phone
    type: "string32"
    nullable: true
    default: null

optional: true

# A struct with an optional string field called "secondary_phone"
type: struct
fields:
  - name: secondary_phone
    type: "string32"
    optional: true

This would convert to a union of null/string32 with default null set.

default: null with a non-"null" type

# A struct with an optional string field called "secondary_phone"
type: struct
fields:
  - name: secondary_phone
    type: "string32"
    default: null

In this style, we simply interpret any default: null for a non-null type to implicitly mean union of that null and that type. So the above is seen as:

# A struct with an optional string field called "secondary_phone"
type: struct
fields:
  - name: secondary_phone
    type: ["null", "string32"]
    default: null

Conclusion

I feel like the right answer is probably 2 or 3? I haven't given a ton of thought to implementation, yet. Some might be easier than others. But I'll deal with that later. For now, let's converge on the syntax we want.

WDYT?

@cpard
Copy link

cpard commented Jul 27, 2023

@criccomini @gwukelic I think we agree here that the problem is with what syntactic sugar or shorthand to provide for representing optional types. I say optional types because I think that's a more general definition of what we are dealing here with nulls.

Also, my understanding here is that optionality at the AST is always a union between a type and a null, empty, unit, (whatever we want to call it) type. But these are implementation details that shouldn't bother the user of Recap.

When it comes to what syntactic sugar can be the best, I suggest to consider two things:

  1. The familiarity of the potential Recap user with similar problems. For example, if everyone is coming from python land, it makes total sense to try and add syntax that is close to the Optional[type] syntax of python.
  2. Learn from the evolution of syntax on this from other systems.

Regarding (1), as was primarily working with Recap from the relational angle, I was leaning more into the "nullable", "optional" approach. This comes naturally mainly because of how databases have historically handled the ergonomics of working with null values.

But, I don't think that this is the best way to do it. There has been a lot of progress in PL on that stuff and databases are encouraging you to almost instinctively think in terms of data and concrete values when it comes to that stuff and not the semantics of the types. So, my suggestion here is to avoid this path.

I took a look on some of the languages out there and how they handle optionality in their type systems. Some examples,

  1. Typescript, Zig, CUE: use the "?" symbol for optional types.
  2. Python: Optional[type] = None.
  3. Rust: Option but this is an enum between (union) between None and Some(T).

A. Using Optional<> or Optional[] etc. would be the closest to Python and Rust so it has that added benefit if we assume that users will be more familiar with python.

B. Languages from group (1) are treating optionality more like an attribute of the type than a different type, (when it comes to syntactic sugar). Hear the "?" is like a shorthand to optional/nullable=true.

A feels more "formal" as it exposes syntactically something closer to a type with a parameter and it's also closer to the way Python does it.

B is much cleaner in my opinion and offers better ergonomics.

At the end it's all about developer ergonomics when it comes to syntactic sugar so there's no right or wrong, just trade offs!

All the above are about syntax, but I have a question about semantics too. What's the meaning of a type being optional in Recap?

to me there are two definitions of being optional:

  1. If a type is optional, the whole field might be missing completely.
  2. If a type is optional, it means that the field exists but the value is null or whatever represents emptiness.

What's Recap's definition?

@criccomini
Copy link
Contributor

What's Recap's definition?

Per-spec (here):

The default value for a reader if the field is not set in the struct.

We can of course change this definition, but this is in-line with what other serdes such as Avro/Proto do.

@gwukelic
Copy link
Contributor Author

@criccomini @cpard thank you for your thoughts and some background. @adrianisk and I talked and we vote for Option 2 (optional: true). We think it'll be straightforward for our users.

@adrianisk
Copy link
Contributor

Just catching up on all of this... Optional seems like the way to go to me, with

# A struct with an optional string field called "secondary_phone"
type: struct
fields:
  - name: secondary_phone
    type: "string32"
    optional: true

just being syntactic sugar for

type: struct
fields:
  - name: secondary_phone
    type: union
    types: ["null", "string32"]
    default: null

I'm guessing we'd also want to support

# A struct with an optional string field called "non_null_default"
type: struct
fields:
  - name: non_null_default
    type: "string"
    optional: true
    default: "some_default"

?

The problem I see with the default: null with a non-"null" type option is that it doesn't let you define a nullable type with a non-null default, though I'm not sure how much that actually matters.

@criccomini
Copy link
Contributor

criccomini commented Jul 27, 2023

I'm leaning toward optional: true as well. It feels most intuitive to me, coming from a Python/Java/Scala background. It also maps more closely to how DBs work (as @cpard pointed out).

Regarding the ? notation, I feel like it's mixing syntaxes between what the CST is written in (JSON, YAML, etc) and Recap's structure. One of the nice things about Recap right now is that it really doesn't pay attention to string literals. Having {"type": "string32?"} feels weird to me. Just a personal opinion.

So, I'm thinking:

type: struct
fields:
  - name: secondary_phone
    type: string32
    optional: true

Would be valid. When no default is defined, null is used. Non-null defaults would also be allowed:

type: struct
fields:
  - name: secondary_phone
    type: string32
    optional: true
    default: "1 (555) 555-5555"

Optional unions are also allowed:

type: struct
fields:
  - name: some_union
    type: union
    types: ["int32", "float32"]
    optional: true

In the above case, the union would be converted to:

type: struct
fields:
  - name: some_union
    type: union
    types: ["null", "int32", "float32"]
    default: null

Aliases would continue to support attribute overrides.

type: struct
fields:
  - name: phone
    alias: phone_type
    type: string32
  - name: secondary_phone
    type: phone_type
    optional: true

In this example, phone is required but secondary_phone is optional. Similarly:

type: struct
fields:
  - name: phone
    alias: phone_type
    type: string32
    optional: true
  - name: secondary_phone
    type: phone_type

In this example, secondary_phone is required but phone is optional.

This would all be implemented in from_dict (when going from CST to Recap AST) and clean_dict (when going from Recap AST back to CST).

WDYT? Is this tolerable to y'all?

@gwukelic
Copy link
Contributor Author

Thanks @criccomini! @adrianisk and I think this is great! Feel free to close this PR.

criccomini added a commit that referenced this pull request Jul 28, 2023
The current Recap null-field handling is a bit cumbersome. I discussed with
@gwukelic, @adrianisk, and @cpard. We decided to add support for an `optional`
field to the CST. This will allow us to specify that a field is optional.

See the conversation here:

#335

Users can now specify an optional field as:

```yaml
type: struct
fields:
  - name: secondary_phone
    type: string32
    optional: true
```

This will get treated the same as:

```yaml
type: struct
fields:
  - name: secondary_phone
    type: "union",
    types: ["null", "string32"],
    default: null
```

I will update the Recap spec accordingly, as well.

Closes #335
criccomini added a commit to gabledata/recap-website that referenced this pull request Jul 29, 2023
The current Recap null-field handling is a bit cumbersome. I discussed with
@gwukelic, @adrianisk, and @cpard. We decided to add support for an `optional`
field to the CST. This will allow us to specify that a field is optional.

See the conversation here:

gabledata/recap#335

Users can now specify an optional field as:

```yaml
type: struct
fields:
  - name: secondary_phone
    type: string32
    optional: true
```

This will get treated the same as:

```yaml
type: struct
fields:
  - name: secondary_phone
    type: "union",
    types: ["null", "string32"],
    default: null
```
@criccomini
Copy link
Contributor

I've updated the spec here:

gabledata/recap-website#6

@gwukelic mind taking a quick peak to make sure it makes sense?

criccomini added a commit that referenced this pull request Jul 29, 2023
The current Recap null-field handling is a bit cumbersome. I discussed with
@gwukelic, @adrianisk, and @cpard. We decided to add support for an `optional`
field to the CST. This will allow us to specify that a field is optional.

See the conversation here:

#335

Users can now specify an optional field as:

```yaml
type: struct
fields:
  - name: secondary_phone
    type: string32
    optional: true
```

This will get treated the same as:

```yaml
type: struct
fields:
  - name: secondary_phone
    type: "union",
    types: ["null", "string32"],
    default: null
```

I will update the Recap spec accordingly, as well.

Closes #335, #338

a
criccomini added a commit that referenced this pull request Jul 29, 2023
The current Recap null-field handling is a bit cumbersome. I discussed with
@gwukelic, @adrianisk, and @cpard. We decided to add support for an `optional`
field to the CST. This will allow us to specify that a field is optional.

See the conversation here:

#335

Users can now specify an optional field as:

```yaml
type: struct
fields:
  - name: secondary_phone
    type: string32
    optional: true
```

This will get treated the same as:

```yaml
type: struct
fields:
  - name: secondary_phone
    type: "union",
    types: ["null", "string32"],
    default: null
```

I will update the Recap spec accordingly, as well.

Closes #335, #338
@criccomini
Copy link
Contributor

PR is up here:

#340

@gwukelic @cpard @adrianisk PTAL

criccomini added a commit that referenced this pull request Jul 30, 2023
The current Recap null-field handling is a bit cumbersome. I discussed with
@gwukelic, @adrianisk, and @cpard. We decided to add support for an `optional`
field to the CST. This will allow us to specify that a field is optional.

See the conversation here:

#335

Users can now specify an optional field as:

```yaml
type: struct
fields:
  - name: secondary_phone
    type: string32
    optional: true
```

This will get treated the same as:

```yaml
type: struct
fields:
  - name: secondary_phone
    type: "union",
    types: ["null", "string32"],
    default: null
```

I will update the Recap spec accordingly, as well.

Closes #335, #338
@gwukelic gwukelic deleted the gwukelic/null_union_fix branch July 31, 2023 21:48
criccomini added a commit to gabledata/recap-website that referenced this pull request Aug 1, 2023
The current Recap null-field handling is a bit cumbersome. I discussed with
@gwukelic, @adrianisk, and @cpard. We decided to add support for an `optional`
field to the CST. This will allow us to specify that a field is optional.

See the conversation here:

gabledata/recap#335

Users can now specify an optional field as:

```yaml
type: struct
fields:
  - name: secondary_phone
    type: string32
    optional: true
```

This will get treated the same as:

```yaml
type: struct
fields:
  - name: secondary_phone
    type: "union",
    types: ["null", "string32"],
    default: null
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants