Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix all mypy typechecking errors, add a lot of type annotations #1399

Closed
wants to merge 7 commits into from

Conversation

oprypin
Copy link
Contributor

@oprypin oprypin commented Nov 1, 2023

Functions that don't have any type annotations aren't checked by mypy, they are skipped for now in this commit
Copy link
Member

@waylan waylan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got much less that half way through this and keep seeing the same issues reoccurring and gave up. See my comments below and please make the adjustments everywhere, not just where I've noted them.

markdown/core.py Outdated Show resolved Hide resolved
@@ -426,7 +426,9 @@ def get_items(self, block: str) -> list[str]:
if not items and self.TAG == 'ol':
# Detect the integer value of first list item
INTEGER_RE = re.compile(r'(\d+)')
self.STARTSWITH = INTEGER_RE.match(m.group(1)).group()
int_match = INTEGER_RE.match(m.group(1))
assert int_match is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you are trying to accomplish here. Does this make it possible for Markdown content to cause an error? If so, that would be unacceptable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With lines 429-431 reverted:

$ mypy markdown
markdown/blockprocessors.py:429: error: Item "None" of "Match[str] | None" has no attribute "group"  [union-attr]
Found 1 error in 1 file (checked 33 source files)

  • The code before:

    self.STARTSWITH = RE.match(string).group()

    where RE.match(string) can be None, and None.group() is an error

  • The code after:

    int_match = RE.match(string)
    assert int_match is not None
    self.STARTSWITH = int_match.group()

The old code doesn't care to check for the None case where it would violently error with AttributeError. mypy doesn't let it slide and exposes a potential bug.

The new code directly checks for it and errors with a clearer message.

There is no change regarding which situations an error does or does not happen.

TBD: It is worth checking whether this bug case can actually happen (in current code already). In any case we should be happy that mypy flagged this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, now we know about a bug which we didn't know about before. That is good. But this is not the way to address it. Under no circumstances should source text cause Markdown to raise an error. In fact, that is one of the primary goals of the project as documented on the home page. Therefore, this is not the proper way to fix the bug. The error needs to be silenced and some reasonable text needs to be included in the output (depending on the conditions under which the issue arises).

Comment on lines -445 to +449
output_file = writer(output, errors="xmlcharrefreplace")
output_file.write(html)
output_writer = writer(output, errors="xmlcharrefreplace")
output_writer.write(html)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this change was made at all. Note that the Contributing Guide states:

Legacy code which does not follow the guidelines should only be updated if and when other changes (bug fix, feature addition, etc.) are being made to that section of code. While new features should be given names that follow modern Python naming conventions, existing names should be preserved to avoid backward incompatible changes.

I realize that in this instance the variable name is not exposed outside of this method so there is no concern over a backward incompatible change, but the general principle remains. Please, let's refrain from making unnecessary changes just for personal preference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% of code body changes in this pull request are in response to error messages produced by mypy. Don't need to tell me this regarding unrelated changes because I'm always the first one to say this as well.

This particular change is because mypy doesn't like that these two differently-typed values share the same variable name.


With lines 425-429 reverted:

$ mypy markdown                                             
markdown/core.py:427: error: Incompatible types in assignment (expression has type "StreamReader", variable has type "StreamReaderWriter")  [assignment]
Found 1 error in 1 file (checked 33 source files)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I embrace Python not being strongly typed. If that means making changes like this, then no thank you.

markdown/core.py Outdated Show resolved Hide resolved
Comment on lines -492 to +497
def markdownFromFile(**kwargs: Any):
def markdownFromFile(
*,
input: str | BinaryIO | None = None,
output: str | BinaryIO | None = None,
encoding: str | None = None,
**kwargs: Any
) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall why the specific keyword parameters were removed some years ago, but why did you feel the need to add them back in? I'm not opposed to it if there is a good reason related to this scope of this PR, but I would like to here the reason.

However, I am more concerned about why you added position arguments (*)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function's signature doesn't change, I'm just formalizing the de-facto signature. The * is to start named-only parameters, not positional-only parameters. They were also named-only before.

Comment on lines -73 to +82
sibling = self.current_sibling
prev_sibling = self.current_sibling
block, the_rest = self.detab(block, self.content_indent)
self.current_sibling = None
self.content_indent = 0
return sibling, block, the_rest
return prev_sibling, block, the_rest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, we are making unnecessary out-of-scope changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was made in response to mypy errors. Again this is because mypy doesn't like that a variable name is reused with different types in different places. Which is really a limitation of mypy, most other type checkers would deal with this perfectly fine.


With lines 78-82 reverted:

$ mypy markdown
markdown/extensions/admonition.py:84: error: Incompatible types in assignment (expression has type "Element | None", variable has type "Element")  [assignment]
markdown/extensions/admonition.py:87: error: Incompatible types in assignment (expression has type "None", variable has type "Element")  [assignment]
markdown/extensions/admonition.py:101: error: Incompatible types in assignment (expression has type "Element | None", variable has type "Element")  [assignment]
markdown/extensions/admonition.py:113: error: Incompatible types in assignment (expression has type "None", variable has type "Element")  [assignment]

@@ -143,6 +148,7 @@ def run(self, parent, blocks):
p.text = title
p.set('class', self.CLASSNAME_TITLE)
else:
assert sibling is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this raise an error based on Markdown input? If so, this is unacceptable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code already has a problem where sibling might be None and sibling.tag will be an error. mypy correctly detects and reports it, and requires us to make an explicit assertion, which indicates that we either know that this will never actually happen, or perhaps don't care if it does but then the error will be more explicit.

TBD: It is worth checking whether this bug case can actually happen (in current code already). In any case we should be happy that mypy flagged this.


With line 151 reverted:

$ mypy markdown
markdown/extensions/admonition.py:152: error: Item "None" of "Element | None" has no attribute "tag"  [union-attr]
markdown/extensions/admonition.py:152: error: Item "None" of "Element | None" has no attribute "text"  [union-attr]
markdown/extensions/admonition.py:153: error: Item "None" of "Element | None" has no attribute "text"  [union-attr]
markdown/extensions/admonition.py:154: error: Item "None" of "Element | None" has no attribute "text"  [union-attr]
markdown/extensions/admonition.py:155: error: Argument 1 to "SubElement" has incompatible type "Element | None"; expected "Element"  [arg-type]
markdown/extensions/admonition.py:158: error: Incompatible types in assignment (expression has type "Element | None", variable has type "Element")  [assignment]

""" Find code blocks and store in `htmlStash`. """
blocks = root.iter('pre')
for block in blocks:
if len(block) == 1 and block[0].tag == 'code':
local_config = self.config.copy()
text = block[0].text
assert text is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another assert....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another place where the code currently assumes that something is present there and violently errors if there's no match. mypy doesn't let it slide.

TBD: It is worth checking whether this bug case can actually happen (in current code already). In any case we should be happy that mypy flagged this.


With lines 275-278 reverted:

$ mypy markdown
markdown/extensions/codehilite.py:276: error: Argument 1 to "code_unescape" of "HiliteTreeprocessor" has incompatible type "str | None"; expected "str"  [arg-type]
Found 1 error in 1 file (checked 33 source files)


raw_block = blocks.pop(0)
m = self.RE.search(raw_block)
assert m is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another assert...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another place where the code currently assumes that the match will happen. mypy doesn't let it slide. I see here that it is 100% safe to assume that the match actually happens, based on the outcome of test. But it is OK that mypy forces us to write out this assumption explicitly, both so the writer double-checks this and that the reader doesn't need to think that maybe this is a bug.


With line 47 reverted:

$ mypy markdown
markdown/extensions/def_list.py:48: error: Item "None" of "Match[str] | None" has no attribute "start"  [union-attr]
markdown/extensions/def_list.py:49: error: Item "None" of "Match[str] | None" has no attribute "end"  [union-attr]
markdown/extensions/def_list.py:56: error: Item "None" of "Match[str] | None" has no attribute "group"  [union-attr]
markdown/extensions/def_list.py:58: error: Item "None" of "Match[str] | None" has no attribute "group"  [union-attr]
Found 4 errors in 1 file (checked 33 source files)

Comment on lines -56 to +75
if not terms and sibling is None:
# This is not a definition item. Most likely a paragraph that
# starts with a colon at the beginning of a document or list.
blocks.insert(0, raw_block)
return False
if not terms and sibling.tag == 'p':
# The previous paragraph contains the terms
state = 'looselist'
terms = sibling.text.split('\n')
parent.remove(sibling)
# Acquire new sibling
sibling = self.lastChild(parent)
else:
state = 'list'
state = 'list'
if not terms:
if sibling is None:
# This is not a definition item. Most likely a paragraph that
# starts with a colon at the beginning of a document or list.
blocks.insert(0, raw_block)
return False
if sibling.tag == 'p':
# The previous paragraph contains the terms
state = 'looselist'
assert sibling.text is not None
terms = sibling.text.split('\n')
parent.remove(sibling)
# Acquire new sibling
sibling = self.lastChild(parent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be out-of-scope as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is again in response to mypy errors, unfortunately these kind of gymnastics are necessary sometimes to make it happy.


With lines 61-75 reverted:


$ mypy markdown
markdown/extensions/def_list.py:66: error: Item "None" of "Element | None" has no attribute "tag"  [union-attr]
markdown/extensions/def_list.py:69: error: Item "None" of "Element | None" has no attribute "text"  [union-attr]
markdown/extensions/def_list.py:69: error: Item "None" of "str | Any | None" has no attribute "split"  [union-attr]
markdown/extensions/def_list.py:70: error: Argument 1 to "remove" of "Element" has incompatible type "Element | None"; expected "Element"  [arg-type]

@oprypin
Copy link
Contributor Author

oprypin commented Nov 1, 2023

See my comments below and please make the adjustments everywhere, not just where I've noted them.

The adjustments would basically mean reverting the changes, but almost none of them can be reverted, because then mypy errors appear.

So this would just mean giving up trying to check the code with mypy.
Maybe there is some misconception, but the point of running the actual mypy checks is that it also checks function bodies and can complain about potential problems until you write code in a way that makes it clear to mypy that there is no problem.

@oprypin
Copy link
Contributor Author

oprypin commented Nov 1, 2023

@waylan
Copy link
Member

waylan commented Nov 1, 2023

Maybe there is some misconception

Python not being strongly typed if a feature which I embrace. I will continue to write my code that way. Anything which forces me to write strongly typed code in a language which is not strongly typed is unwelcome.

I see type annotations as documentation. The code dictates the annotations, not the other way around. Of course, over time as updates and changes are made to the code, the annotations could get out-of-sync with the code. Therefore, it is reasonable to include a type checker in the CI to ensure that doesn't happen. However, the type checker should only be checking the already defined annotations. There is no need to type check an unannotated private variable which is never used outside of a single method (for example).

If mypy cannot serve that purpose, then it is the wrong tool for the job.

@waylan
Copy link
Member

waylan commented Nov 1, 2023

I am going to close this in favor of #1401. I suspect that mypy is not the right fit for this project. However, if it can be configured to fit, then this can be reopened. Or, if a different tool is proposed, that should be in a new PR.

@waylan waylan added the rejected The pull request is rejected for the stated reasons. label Nov 1, 2023
@oprypin
Copy link
Contributor Author

oprypin commented Nov 1, 2023

I'll want to sync this branch to latest if #1401 is merged, and then leave this for posterity. Not sure if I'll be able to do that if it's closed.

@oprypin
Copy link
Contributor Author

oprypin commented Nov 1, 2023

Yeah mypy is pretty bad!
Sadly other checkers would ask for even more code changes :D

@waylan
Copy link
Member

waylan commented Nov 1, 2023

Okay, I'll leave it open for now.

@oprypin
Copy link
Contributor Author

oprypin commented Nov 2, 2023

By disabling a large amount of mypy checks, I reduced this to a much smaller change #1402 that is based on #1401.
It will ensure the type annotations don't fall into disrepair.

@facelessuser
Copy link
Collaborator

I think if you are using types, using mypy is very important to keep things from getting broken. I also agree that there is nothing wrong with limiting the checks and/or doing per line ignores for particularly tricky things that can be resolved later if there is interest.

@waylan
Copy link
Member

waylan commented Nov 2, 2023

I have taken a step back and taken a fresh look at the overall matter of annotations and type checking. I think perhaps I dove in with a few misunderstandings and miscomprehensions. Sorry about that; especially if I have led anyone down a path of doing work I would never approve. That was not my intention.

So to be clear. upon reflection, the following are my general expectations.

  1. This library should always be a dynamically typed library. I am opposed to this library being forcibly strongly typed. That probably means we will never have a py.typed file (or whatever it is called) and that should not be a goal to work towards.

  2. The only purpose of annotations is to serve as documentation. The code dictates the annotations, not the other way around. In other words, we do not edit code to fit the annotations, we edit the annotations to fit the code. Of course, occasionally, the specific format of code may prevent defining annotations. In that case, an edit to the format (but not functionality) of the code is acceptable to allow defining annotations (the recent edits to some NamedTuple objects is one such example of this).

    Of course, type checking may occasionally reveal an otherwise unknown bug in the code. And it would only be reasonable for those bugs to be fixed. However, the fix should always be framed as a fix to the now known bug, not a change to the code to match the type annotations. In other words, I would expect a test case which fails before the fix is applied and succeeds after regardless of any type annotations.

  3. Of course, the type annotations should remain in sync with the code. I would expect some tooling to check that on the CI server. Presumably, we would add another linter like tool. The tool should never error or warn on missing annotations. The only task of the tool is to verify that the type annotations which exist agree with the code. If any fail, then a PR would not be accepted until an appropriate fix is made. Of course, an appropriate fix could mean the annotation needs to be updated, or it could mean the code needs to be updated. Is this an intentional change to the type of an object or did the type get changed inadvertently?

There are a few things we will need to do to meet the above goals.

Item 1 above requires no action expect to perhaps remove some proposed changes.

Item 2 is mostly complete. Thanks for the hard work of everyone involved.

Item 3 seems to be the problem. Originally I thought that mypy could meet our needs. After all, the page entitled Using mypy with an existing codebase in their documentation talks about how the tool can be used to progressively work through your code. However, on closer inspection, I see that the idea is to work through the code to gradually make it strongly typed. In fact, the suggestion is to do so before any annotations are added. That seems to suggest that perhaps mypy can only check annotations if the code is strongly typed. That is not a useful tool for our purpose. Unfortunately, I do not have any answers for how we can address this item. It occurs to me that it might not actually be possible to accomplish what I am proposing here.

@oprypin
Copy link
Contributor Author

oprypin commented Nov 2, 2023

I'll want to sync this branch to latest if #1401 is merged, and then leave this for posterity. Not sure if I'll be able to do that if it's closed.

I did the merge, can be closed now :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rejected The pull request is rejected for the stated reasons.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Full type annotations
3 participants