Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check the content type before returning the json value. #2177

Closed
wants to merge 6 commits into from

Conversation

fduxiao
Copy link

@fduxiao fduxiao commented Aug 8, 2017

What do these changes do?

Check the content type before returning the json value.

Are there changes in behavior for the user?

When getting json requests one could directly await request.json().
If one'd like to ignore the Content-Type, he/she should specify if_check manually i.e. await request.json(if_check=False).

Related issue number

referencing #2174

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes
  • If you provide code modification, please add yourself to CONTRIBUTORS.txt
    • The format is Xiao Tan.
    • Please keep alphabetical order, the file is sorted by names.
  • Add a new news fragment into the changes folder
    • name it 2174.bugfix for example (588.bug)
    • if you don't have an issue_id change it to the pr id after creating the pr
    • ensure type is one of the following:
      • .feature: Signifying a new feature.
      • .bugfix: Signifying a bug fix.
      • .doc: Signifying a documentation improvement.
      • .removal: Signifying a deprecation or removal of public API.
      • .misc: A ticket has been closed, but it is not of interest to users.
    • Make sure to use full sentences with correct case and punctuation, for example: "Fix issue with non-ascii contents in doctest text files."

@fduxiao fduxiao force-pushed the content_type_check branch 2 times, most recently from 88f254c to a4abca2 Compare August 8, 2017 03:54
@codecov-io
Copy link

codecov-io commented Aug 8, 2017

Codecov Report

Merging #2177 into master will decrease coverage by 0.04%.
The diff coverage is 71.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2177      +/-   ##
==========================================
- Coverage    97.2%   97.15%   -0.05%     
==========================================
  Files          39       39              
  Lines        7945     7956      +11     
  Branches     1378     1382       +4     
==========================================
+ Hits         7723     7730       +7     
- Misses         98      100       +2     
- Partials      124      126       +2
Impacted Files Coverage Δ
aiohttp/web_request.py 98.46% <71.42%> (-1.22%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d4c1237...99068d8. Read the comment docs.

You can also handle it manually by passing the
`wrong_format` parameter.
"""
if if_check and self.content_type != 'application/json':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would I use, say, jsonapi if it will never match this hardcoded content-type?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reasonable. I will add it.

@@ -510,8 +510,19 @@ def text(self):
return bytes_body.decode(encoding)

@asyncio.coroutine
def json(self, *, loads=json.loads):
"""Return BODY as JSON."""
def json(self, *, loads=json.loads, wrong_format=None, if_check=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think signature is quite overloaded. I suggest just to allow to pass what exact content-type should be considered as valid. No match - throw an exception, nothing more. For BC, default value could be None until some release when we change it to 'application/json'. Thoughts?

wrong_format_flag = False
if isinstance(allowed_types, str):
wrong_format_flag = allowed_types != self.content_type
elif isinstance(allowed_types, list):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if allowed_types be a set or tuple or frosenset or whatever container?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So will iterable make sense? I mean how about hasattr(alowed_types, '__iter__')?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be do like it was don't in ClientResponse.json() ? It's quite abstract from what kind of type was passed and the behaviour would be consistent between client and server.

if if_check and self.content_type != 'application/json':
if wrong_format is None:
wrong_format_flag = False
if isinstance(allowed_types, str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if I receive application/json in legacy form when it will have a charset option defined explicitly like application/json;charset=utf-8?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just checking the type, subtype and suffix? And should json be decoded according to the charset?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest just supporting a container (list, tuple, whatever) only and raising explicit TypeError on single string.
By default it should be None for backward compatibility. That's sad but we cannot change it.
Application might have a parameter for enabling checking json types in the application by default though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just checking the type, subtype and suffix?

That's correct, but will make things too much complicated.

And should json be decoded according to the charset?

In modern RFC's it's mandatory should be UTF-8. However, there are lot of legacy around which forces charset via params. I don't thing we should care about non-UTF8 JSON, but if client asks about it explicitly, there is no reason to deny him, imho.

I suggest just supporting a container (list, tuple, whatever) only and raising explicit TypeError on single string.

It's good idea, but it conflicts with how client's json() acts. I don't think we should have different behaviour of a similar things - that would annoy people all around.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think we can use either Container or str, and None is for wildcard (or should I use a '*' as in add_route, since allowed_types=None looks weird for wildcard); otherwise a TypeError is raised

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like global switches. You have argument and it's default. None disables the check. For awhile it will be default to preserve BC. After some release it becomes application/json in defaults and may be we could prohibit None usage, but that's another story.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could I suggest a function(mapping a string to boolean) be passed as an argument to filter the Content-Type and str, None are also accepted and nothing more is supported?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend to avoid callback style. This is the road to make aiohttp acts weird depending on what functions user passed around. There is pretty simple rules to find out if MIME type is json-like. In the rest cases user could override that behavior by explicit string value.

Copy link
Author

@fduxiao fduxiao Aug 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that means the user won't want to behavior weirdly? Then I just need to check whether the type is str or tuple or list or set and compare them directly, or that it is something that means json-like or wildcard. Also I need to derive the charset from Content-type and pass it to loads.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why he should? .json() provides stable interface and guarantee about what it does. User may customize default expected content-type via string argument. Why there is need to do more?

If user need to do more complex content type check, it should make it at middleware level, not in .json() call.

raise HTTPNotAcceptable()
elif callable(wrong_format):
yield from wrong_format()
elif callable(err_handler):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You expect callable, but below err_handler is a coroutine actually.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can only use module inspect to decide whether an object is a coroutine. Should I decide whether it is a coroutine or not or even something else(says a string) and make calling correspondingly. I think more details should be given.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to not pass exc_handler at all. No argument - no problems (:

elif callable(wrong_format):
yield from wrong_format()
elif callable(err_handler):
yield from err_handler()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why to pass here err_handler after all? You can wrap req.json() call with try/except and do the rest on user code side.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to have reduced the use of try/except so one could get rid of the fact that if you want to handle the error yourself, you need the try/except block i.e. one doesn't need to raise the error, catch it and handle it since catch an error is costly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to worry about try/except usage. If you do it, you explicitly sign about that you're aware about that things here could go wrong. Some custom argument that does the same makes it not so oblivious.

As about speed, you shouldn't worry about catch performance. I bet, your app business logic will be million times slower than that.

then the request is rejected.
You can also handle it manually by passing the
`wrong_format` parameter.
You can also handle it manually by passing the `err_handler` parameter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is need to note what will happens if content-type won't match.

"""
wrong_format_flag = False
if isinstance(allowed_types, str):
wrong_format_flag = allowed_types != self.content_type
Copy link
Contributor

@socketpair socketpair Aug 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The self.content_type may be application/json; charset=UTF-8. Am I right ? So, use startswith or handle that in a different way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. So how do you think of these codes:

content_type = self.content_type
(mime_type, subtype, suffix, parameters) = parse_mimetype(content_type)
content_type = mime_type + '/' + subtype
if suffix != '':
    content_type += '+' + suffix

charset = parameters.get('charset', None)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems OK, I want to review this part in context of whole path.

wrong_format_flag = False
if isinstance(allowed_types, str):
wrong_format_flag = allowed_types != self.content_type
elif isinstance(allowed_types, list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about tuple and set ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe all Containers should be supported, says

if isinstance(allowed_types, str):
    wrong_format_flag = allowed_types != self.content_type
elif isinstance(allowed_types, collections.Container):
    wrong_format_flag = self.content_type not in allowed_types
elif allowed_types is None:  # do not check the content_type
    wrong_format_flag = False
else:  # for other type
    raise TypeError("Unsupported container of `Content-Type`")

Copy link
Contributor

@socketpair socketpair Aug 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 for any containers, though. Even list.

@@ -510,8 +510,26 @@ def text(self):
return bytes_body.decode(encoding)

@asyncio.coroutine
def json(self, *, loads=json.loads):
"""Return BODY as JSON."""
def json(self, *, loads=json.loads, allowed_types=None, err_handler=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for allowed_types='application/json'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be breaking change. See previous discussions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we directly check if the mime type starts with 'application/' and ends with '+json' i.e. flask's way to deal with it? Hence no need to consider about how should allowed_type behavior

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we stay with string only argument as @socketpair suggested, that's would be friendly default behaviour with almost no cost.

if err_handler is None:
raise HTTPNotAcceptable()
elif callable(err_handler):
yield from err_handler()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please do not check if it is callable. Instead, check return value of call. if it is awaitable -- await it. (see signal handlers in aiohttp)
  2. In current code, with err_handler=42 -- error will be just ignored, which is a bug.
  3. errors must not be handled as callbacks or so. There are exceptions for that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed above, there's no need to give callbacks. Instead, only the 406 error is raised.

charset = parameters.get('charset', None)

if isinstance(allowed_types, str):
if allowed_types == 'json':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useless. Users must supply full contenttype like xxxx/yyyyy

Copy link
Author

@fduxiao fduxiao Aug 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means 'json-like'. View my code after that. I will then check whether the Content-Type is 'application/json' or starts with 'application' and ends with '+json'. If it is a string and is not 'json', I will compare it with Content-Type directly. This may be convenient i.e. one won't need to always specify the argument for different json format

raise HTTPNotAcceptable()
elif allowed_types != content_type:
raise HTTPNotAcceptable()
elif isinstance(allowed_types, set):
Copy link
Contributor

@socketpair socketpair Aug 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-2 for containers. Why they are intended for ? Which user-story (case) do you suggest ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, we discussed what if one sends 'application/json; charset=utf-8' and 'application/json' is hard-coded and now I strip the charset. My concern is that will there be something else the frontend will send fail to pass the examination, says the handler need to handle both application/json and application/javascript or some uncontrollable i.e. for compatibility of an old frontend framework.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep code of aiohttp simple and clear. Simple thing should be done in a simple way, and difficult ones should be possible.

If an specific application needs to handle such very obscure situation, it may control content-type by it's own before decoding of json. And also, I think, situation you have provided is just an imagination, not the real one. If no, please describe deeper.

body = yield from self.text()
return loads(body)
return loads(body, encoding=charset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other arguments have the same meaning as in load(), except encoding which is ignored and deprecated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Charset should go as additional argument of the function. And should be passed to .text() if defined.

Copy link
Author

@fduxiao fduxiao Aug 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the frontend sends a request specifying charset? And I think a json object should be parsed with loads, with encoding supported by python itself. Should .text() decode something like "\\u4e16\\u754c\\u4f60\\u597d"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PLEASE look at the aiohttp sources before asking and before changing the code. .text() will analyze conten-type header (regarding charset) before converting bytes to text.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON-decoding is two-step process. First, it needs to convert from bytes to characters (symbols). Thi is done by UTF-8 (typically) decoder in .text(). Second-step -- is to decode slash-escapes according to JSON-format. this is done using loads(). So, encoding you are talking about is for first step only. As you can see, letter u after every slash signifies that every item is UNICODE symbol (as opposed to some byte value). And so encoding is not applicable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I did a search and found it so I removed these codes.

Copy link
Contributor

@socketpair socketpair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please remove containers
  2. Make Support of either None or full content type
  3. Remove encoding from loads

""" Return BODY as JSON. If the `Content-Type` is not in allowed_types,
then the request is rejected.
"""
content_type = self.content_type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not required to parse self.content_type if allowed_type is None

if suffix != '':
content_type += '+' + suffix

if isinstance(allowed_type, str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much simplier:

if allowed_type is not None:
    parse content type
    if parsed_ct != allowed_type:
        raise ...

without any else

def json(self, *, loads=json.loads):
"""Return BODY as JSON."""
def json(self, *, loads=json.loads, allowed_type=None):
""" Return BODY as JSON. If the `Content-Type` is not in allowed_types,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is not in is wrong now.

allowed_type, then the request is rejected.
"""

if allowed_type is None: # do not check the content_type
Copy link
Contributor

@socketpair socketpair Aug 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for replacing this chain with only one condition like:

if allowed_type is not None:
   ...

content_type += '+' + suffix

if allowed_type != content_type:
raise HTTPNotAcceptable()
Copy link
Contributor

@socketpair socketpair Aug 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm not sure if this concrete exception should be raised. Also please provide an message in exception.

@asvetlov @fafhrd91

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, what do you guys think now? Should the message be passed as an argument? I think the error should be raised directly since that's exactly what I want for a JSON API in most cases and I think we've reached the agreement that for special requirements one should write a middleware or catch the exception as we discussed before.

Copy link
Member

@samuelcolvin samuelcolvin Sep 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be wrong with just

def json(self, *, loads=json.loads, allowed_type='application/json'):
    if allowed_type and self.content_type != allow_type:
        raise HTTPNotAcceptable(text='invalid content-type: "{}" not "{}"'.format(self.content_type, allowed_type))
    ...

?

If users really want more sophisticated behaviour than this they can add their own logic, there's nothing very complicated here that couldn't be replicated in a custom coroutine.

@samuelcolvin
Copy link
Member

also is 406 really the correct response?

from wikipedia:

The requested resource is capable of generating only content not acceptable according to the Accept headers sent in the request.

That doesn't sound quite right. 400 is very generic but is technically more correct I think.

@asvetlov
Copy link
Member

Fixed by another PR #3889

@asvetlov asvetlov closed this Jul 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants