Change JSONFields to more specific Fields #3653

pedro-psb · 2024-07-01T15:20:33Z

Closes: #3657

Previous Problem this was trying to solve

Problem

As mentioned in #3639, the DRF JSONFields are too generic and can be any JSON primitive (str, bool, etc), not just dicts or lists (as our codebase appears to be interpreting it).

After drf-spectacular fixed that interpretation on their side, our openapi schema broke because Object from JSONField became Any, which matches what JSON Fields are in DRF but not what we think it meant before.

Approaches considered

Change all JSONField to some other fields that represents better what the field actually is. Example, repo_config is a DictField, some listy object should be ListField, etc.
Use extend_schema to modify how the openapi schema will be generated. It affect only schema generation, AFAIK.
Force JSONField to be a OA 'object', as suggested here. The downside is that the representation of our fields would continue to be inaccurate.

dralley · 2024-07-01T16:39:02Z

What do you mean by "The downside is that the representation of our fields would continue to be inaccurate."?

pedro-psb · 2024-07-01T16:48:13Z

For example, the Repository.repo_config is a JsonFIeld. We want it be a json "Object Type" (or dict), but the JSONField serializer will happily accept a list or any json primitives, such as ints, booleans.

Edit: but given all those test failures, I'm thinking about doing the alternative fix and create another issue about reviewing the JSONFields, since using them has worked so far. Wdyt?

mdellweg · 2024-07-02T10:36:49Z

IMHO Option 3. is a non starter, because it leaks into the whole codebase and affects other plugins too.
Have you considered updating the bindings container? Maybe there is a newer version that knows how to deal with Any.
But anyway, if you know that repo-config is supposed to be a dictionary, than that is exactly the type of change we should do to the schema here.

pedro-psb · 2024-07-02T13:44:04Z

IMHO Option 3. is a non starter, because it leaks into the whole codebase and affects other plugins too.

Hmm, I didnt know about that.

Have you considered updating the bindings container? Maybe there is a newer version that knows how to deal with Any.

Will check that out.

But anyway, if you know that repo-config is supposed to be a dictionary, than that is exactly the type of change we should do to the schema here.

Yes, it is. I started by naively changing all the fields, but that broke the tests. It'll require a more thoughtful approach.

dralley · 2024-07-05T02:44:18Z

For the sake of documenting it on Github, re: the discussion we had on Slack, is it correct to say that upgrading openapi-generator-cli to >=5.30 will resolve this issue?

(but with some risk of breakage it sounds like - in terms of backports)

mdellweg · 2024-07-05T10:00:03Z

For the sake of documenting it on Github, re: the discussion we had on Slack, is it correct to say that upgrading openapi-generator-cli to >=5.30 will resolve this issue?

(but with some risk of breakage it sounds like - in terms of backports)

From the looks of it, yes.
The risk with updating the openapi-generator templates is, that it immediately affects all new and old versions of all of Pulp plugins. I'm unsure if we are ready to do that.

pedro-psb · 2024-07-05T16:22:16Z

I'm working on getting it pass our basic tests here: pulp/pulp-openapi-generator#101

About the break risk, we could ask katello to run some tests with the updated bindings before we do anything.

dralley · 2024-07-05T22:26:57Z

The risk with updating the openapi-generator templates is, that it immediately affects all new and old versions of all of Pulp plugins. I'm unsure if we are ready to do that.

Well that would seem to be a long-term problem, surely we need to update the templates eventually.

mdellweg · 2024-07-08T06:27:30Z

The risk with updating the openapi-generator templates is, that it immediately affects all new and old versions of all of Pulp plugins. I'm unsure if we are ready to do that.

Well that would seem to be a long-term problem, surely we need to update the templates eventually.

Absolutely. And actually I think we may need to take different versions of it for different versions of pulpcore.
My take is, improving the annotations of the individual fields here and now is a fast solution that is also future proof and worth doing in any scenario.

pedro-psb · 2024-07-08T20:34:59Z

Given those considerations, I'll back to work on this as an immediate solution to the broken bindings.

I made progress on the work with fixing the bump regressions, but I reached a deadline there where I don't know where to go (without getting really deep into openapi stuff).

pedro-psb · 2024-07-09T19:29:50Z

pulp_rpm/app/serializers/package.py

The test are passing, but it doesnt look right (see snippet below).

We have a clear definition of what this "json" object should be (here). Is there any reason not to create/add serializers that precisely describe its structure?

That's how it looks now with only the List specification and without knowing anything about its items:
(I've tried going a step further and using child=DictField(), but that broke stuff).

"files": [ [ null, "/tmp/", "duck.txt" ] ], "requires": [ [ "cockateel", null, null, null, null, false ], [ "lion", null, null, null, null, false ] ], "provides": [ [ "duck", "EQ", "0", "0.6", "1", false ] ],

The schema is roughly speaking exactly as it is represented by createrepo_c, as that is what we use for parsing the metadata and also for publishing it.

Unfortunately that's not documented super well upstream either, but obviously you've seen the comments and here's an example w/ createrepo_c: https://github.com/rpm-software-management/createrepo_c/blob/master/examples/python/manual_repodata_parsing.py#L13C1-L35C63

It's not the ideal format for display to the user, but doing the minimum amount of work to transform the data during sync and publish was fairly important for performance. That's partly why we used JSONField also - it preserves order without needing to do sorts and joins during query-time. Coming from Pulp 2 where publish performance sucked (for different reasons but nonetheless), we were pretty sensitive to that concern.

To be honest that may or may not be an issue in practice nowadays. Ergonomically it is still a bit more painful though.

As for why not use sub-serializers - I think it was just simpler at the time, and survived from the prototype phase. If we can use sub-serializers and experience benefits from that without unacceptable drawbacks then I'd merge that patch.

What exactly do you mean by "doesn't look right"?

Thanks for the info, makes sense these are not db fields.

About the benefit of sub-serializers, maybe some of the fields could benefit more and be less risky of bringing significant drawbacks, such as the config ones (repo_config and copy config).

pedro-psb · 2024-07-09T19:33:19Z

pulp_rpm/app/serializers/repository.py

Here I've added an example of the config on the field help, but I actually think the right approach is defining a precise serializer (in this example, its this spec), so a correct sample will be generated by Redoc/OpenApi themselves.

That schema isn't used by OpenAPI at all, fwiw. It's validated by our own code in the serializer.

Unless you just mean that's an example of the sort of thing we'd need to provide for OpenAPI.

I mean that the serializer and its fields are used by openapi/redoc to generate the api docs page.

Notice the config: { } in Request Samples, and how it says its an Object type.

stale · 2024-10-12T01:01:31Z

This pull request has been marked 'stale' due to lack of recent activity. If there is no further activity, the PR will be closed in another 30 days. Thank you for your contribution!

stale · 2024-11-13T23:11:01Z

This pull request has been closed due to inactivity. If you feel this is in error, please reopen the pull request or file a new PR with the relevant details.

pedro-psb marked this pull request as draft July 1, 2024 16:53

github-actions bot added the multi-commit Added when a PR consists of more than one commit label Jul 1, 2024

pedro-psb force-pushed the fix/json-fields-breaking-bindings branch 3 times, most recently from 117cb50 to 74f6803 Compare July 1, 2024 18:33

github-actions bot removed the multi-commit Added when a PR consists of more than one commit label Jul 1, 2024

pedro-psb marked this pull request as ready for review July 1, 2024 18:54

dralley marked this pull request as draft July 1, 2024 21:04

dralley marked this pull request as ready for review July 1, 2024 21:04

pedro-psb marked this pull request as draft July 2, 2024 18:35

Fix Package fields

300683f

pedro-psb force-pushed the fix/json-fields-breaking-bindings branch from 74f6803 to 300683f Compare July 9, 2024 16:54

Fix Repository json fields

d3f5344

github-actions bot added the multi-commit Added when a PR consists of more than one commit label Jul 9, 2024

pedro-psb mentioned this pull request Jul 9, 2024

Quick fix for AnyType ruby bindings #3679

Merged

pedro-psb commented Jul 9, 2024

View reviewed changes

patchback bot mentioned this pull request Jul 10, 2024

[PR #3679/2e0c2933 backport][3.27] Quick fix for AnyType ruby bindings #3680

Merged

stale bot added the stale label Oct 12, 2024

stale bot closed this Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change JSONFields to more specific Fields #3653

Change JSONFields to more specific Fields #3653

pedro-psb commented Jul 1, 2024 •

edited

Loading

dralley commented Jul 1, 2024

pedro-psb commented Jul 1, 2024 •

edited

Loading

mdellweg commented Jul 2, 2024

pedro-psb commented Jul 2, 2024

dralley commented Jul 5, 2024 •

edited

Loading

mdellweg commented Jul 5, 2024

pedro-psb commented Jul 5, 2024

dralley commented Jul 5, 2024

mdellweg commented Jul 8, 2024

pedro-psb commented Jul 8, 2024

pedro-psb Jul 9, 2024

dralley Jul 10, 2024 •

edited

Loading

dralley Jul 10, 2024

pedro-psb Jul 10, 2024

pedro-psb Jul 9, 2024

dralley Jul 10, 2024

pedro-psb Jul 10, 2024

stale bot commented Oct 12, 2024

stale bot commented Nov 13, 2024

Change JSONFields to more specific Fields #3653

Change JSONFields to more specific Fields #3653

Conversation

pedro-psb commented Jul 1, 2024 • edited Loading

Problem

Approaches considered

dralley commented Jul 1, 2024

pedro-psb commented Jul 1, 2024 • edited Loading

mdellweg commented Jul 2, 2024

pedro-psb commented Jul 2, 2024

dralley commented Jul 5, 2024 • edited Loading

mdellweg commented Jul 5, 2024

pedro-psb commented Jul 5, 2024

dralley commented Jul 5, 2024

mdellweg commented Jul 8, 2024

pedro-psb commented Jul 8, 2024

pedro-psb Jul 9, 2024

Choose a reason for hiding this comment

dralley Jul 10, 2024 • edited Loading

Choose a reason for hiding this comment

dralley Jul 10, 2024

Choose a reason for hiding this comment

pedro-psb Jul 10, 2024

Choose a reason for hiding this comment

pedro-psb Jul 9, 2024

Choose a reason for hiding this comment

dralley Jul 10, 2024

Choose a reason for hiding this comment

pedro-psb Jul 10, 2024

Choose a reason for hiding this comment

stale bot commented Oct 12, 2024

stale bot commented Nov 13, 2024

pedro-psb commented Jul 1, 2024 •

edited

Loading

pedro-psb commented Jul 1, 2024 •

edited

Loading

dralley commented Jul 5, 2024 •

edited

Loading

dralley Jul 10, 2024 •

edited

Loading