Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Need more robust handling of URL operation 'path' segment character encoding #5117

Open
4 of 6 tasks
sebastien-rosset opened this issue Jan 26, 2020 · 3 comments
Open
4 of 6 tasks

Comments

@sebastien-rosset
Copy link
Contributor

sebastien-rosset commented Jan 26, 2020

Bug Report Checklist

  • Have you provided a full/minimal spec to reproduce the issue?
  • Have you validated the input using an OpenAPI validator (example)?
  • What's the version of OpenAPI Generator used?
  • Have you search for related issues/PRs?
  • What's the actual output vs expected output?
  • [Optional] Bounty to sponsor the fix (example)
Description

The generated Python code has problems when the operation "path" segment includes every possible allowed character per RFC 3986, but I suspect this might be a more general problem. I also noticed the OAS spec 3.0.2 is ambiguous about the "path" encoding, so I opened OAI/OpenAPI-Specification#2119.

I think the OpenAPITool makes the assumption that URL paths in the OAS spec are encoded according to RFC 3986 section 3.3, though this may be by accident, not intentional. When I look at the Python implementation, I see the "query" part is URL encoded, but not the "path" part of the URL.

Per RFC 3986, a path segment (the parts in a path separated by /) in an absolute URI path can contain zero or more of pchar that is defined as follows:

  pchar       = unreserved / pct-encoded / sub-delims / ":" / "@"
  pct-encoded = "%" HEXDIG HEXDIG
  unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
  sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="

This means this is a valid "path" segment:

/store/orderAZa–z09-._~!$&'(abc)*+ , ;=:@%C5%92%C3%AB%F0%9F%8D%87%E2%84%AC:

Without percent encoding, the path is as follows. One interpretation of the OAS spec is that the OAS author is allowed to write the path without percent-encoding, and it's the tooling responsibility to URL encode. Another interpretation is that the OAS author must write the path with URL encoding.

/store/orderAZa–z09-._~!$&'(abc)*+ , ;=:@Œë🍇ℬ:

I have typed these paths in the online swagger validator, no exception is raised.

openapi-generator version

master, as of January 26th, 2020

OpenAPI declaration file content or url
  /store/orderAZa–z09-._~!$abc*+,;=:@%C5%92%C3%AB%F0%9F%8D%87%E2%84%AC:
    post:
      tags:
        - store
      summary: Place an order for a pet
      description: ''
      operationId: placeOrder
Command line used for generation
Steps to reproduce

Given the following path in the OAS:

/store/orderAZa–z09-._~!$&'(abc)*+ , ;=:@Œë🍇ℬ:

The python generator generates samples/openapi3/client/petstore/python-experimental/petstore_api/api/store_api.py:

        self.place_order = Endpoint(
            settings={
                'response_type': (order.Order,),
                'auth': [],
                'endpoint_path': '/store/orderAZa–z09-._~!$&'(abc)*+ , ;=:@Œë🍇ℬ',
                'operation_id': 'place_order',
                'http_method': 'POST',
                'servers': [],
            },

And looking at the implementation, there is no URL encoding of the path.

Related issues/PRs
Suggest a fix

I suggest that:

  • The OpenAPITool performs validation of the "path" segment when parsing the OAS spec, BEFORE invoking the mustache templates.
  • If the "path" segment contains characters that do not comply with RFC 3986, the tool either fails OAS validation (and the OAS author must fix the spec), or codegen invokes URL encoding to the "path" before processing of the mustache templates.

We should also agree on what is a proper encoding of the path. There are many ways to encode a path. I don't have any of these characters in my spec, but I can see how the use of these characters (such as "+", which could be %2db or %20, and "/") could cause problems.

URIs that differ only by whether an unreserved character is percent-encoded or appears literally are equivalent by definition, but URI processors, in practice, may not always recognize this equivalence. For example, URI consumers shouldn't treat %41 differently from A or %7E differently from ~, but some do. For maximum interoperability, URI producers are discouraged from percent-encoding unreserved characters.

@sebastien-rosset
Copy link
Contributor Author

I see AbstractJavaCodeGen modifies the path with a call to sanitizePath(), but it's done in a very limited way (only the double quotes are changed). IMO, this should be done in a canonicalized way across all languages.

@sebastien-rosset
Copy link
Contributor Author

sebastien-rosset commented Jan 26, 2020

Here is a golang program that does proper path escaping for a complicated URL. It does not escape the curly braces to retain them for OAS path templating. But the URL may have curly braces in the path as long as they are URL encoded. The implementation is more complicated that it seems at first. Even the standard go library does not properly escape the path.

Http://SomeUrl.com:8080//store/curlybraces-%7B%7D/AZaz-09._~!$&'(abc)*+, ;=:@Œë🍇ℬ/order/template-{orderId}/.././c///g?c=3&a=1&b=9&c=0#target

https://play.golang.org/p/X-sX8CwwlnM

@spacether
Copy link
Contributor

spacether commented Jun 13, 2022

python-experimental now encodes query param values and path params values which was added in #12561

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants