Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🧪 Publish MyPy type coverage to Codecov #3162

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

webknjaz
Copy link
Member

@webknjaz webknjaz commented Dec 20, 2024

This patch also includes running MyPy against both Python 3.9 and 3.13 (3.13 has MyPy violations, so let's mark it out of the scope for now).

Quirks: https://discuss.python.org/t/is-there-any-tool-that-can-report-type-coverage-of-a-project/34962/4

@webknjaz webknjaz added project meta skip newsfragment Newsfragment is not required labels Dec 20, 2024
@webknjaz webknjaz force-pushed the maintenance/mypy-type-coverage-codecov branch from b07705a to b2ce8b2 Compare December 20, 2024 04:09
@webknjaz webknjaz mentioned this pull request Dec 20, 2024
Copy link

codecov bot commented Dec 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.81890%. Comparing base (a670d60) to head (5847a66).
Report is 29 commits behind head on main.

Additional details and impacted files
@@                 Coverage Diff                  @@
##                 main       #3162         +/-   ##
====================================================
- Coverage   100.00000%   98.81890%   -1.18111%     
====================================================
  Files             124         141         +17     
  Lines           18427       23114       +4687     
  Branches         1215        2273       +1058     
====================================================
+ Hits            18427       22841       +4414     
- Misses              0         155        +155     
- Partials            0         118        +118     

see 122 files with indirect coverage changes

@webknjaz webknjaz force-pushed the maintenance/mypy-type-coverage-codecov branch 2 times, most recently from dbdb9c2 to 8c6df05 Compare December 20, 2024 04:19
@jakkdl
Copy link
Member

jakkdl commented Dec 20, 2024

mypy 3.13 is passing on main, no? https://github.com/python-trio/trio/actions/runs/12425306606/job/34691878825
So why would it fail here?

@webknjaz
Copy link
Member Author

webknjaz commented Dec 20, 2024

mypy 3.13 is passing on main, no? python-trio/trio/actions/runs/12425306606/job/34691878825
So why would it fail here?

@jakkdl this confusion is coming from not using an explicit --python-version. main actually runs checks against 3.9: https://github.com/python-trio/trio/blob/b43383d/pyproject.toml#L173C18-L173C23.

In other projects, I also put the lowest supported version in the config: https://github.com/aio-libs/yarl/blob/c1f6ef6/.mypy.ini#L2C18-L2C21. And I include as many settings as I can there, everything that is common.

But then, I also run MyPy against every other version in the range: https://github.com/aio-libs/yarl/blob/c1f6ef6/.pre-commit-config.yaml#L109-L187.

Ideally, it should run against every Python. Period. But it's expensive, so I either do every other, or min+max.


For history, this is how it's failing:

  src/trio/_path.py:246:22:246:58: error: Incompatible types in assignment (expression has type "Callable[[Path, str | PathLike[str], DefaultNamedArg(bool | None, 'case_sensitive')], Awaitable[bool]]", base class "PurePath" defined the type as "Callable[[PurePath, str | PathLike[str], DefaultNamedArg(bool | None, 'case_sensitive')], bool]")  [assignment]

(https://github.com/python-trio/trio/actions/runs/12425469238/job/34692341320?pr=3162#step:5:1083)

@webknjaz webknjaz force-pushed the maintenance/mypy-type-coverage-codecov branch from f08ddfa to 1b4daf2 Compare December 20, 2024 13:55
@webknjaz webknjaz force-pushed the maintenance/mypy-type-coverage-codecov branch from 1b4daf2 to e7fd906 Compare December 20, 2024 17:49
@webknjaz
Copy link
Member Author

@jakkdl this should be ready now. Don't worry about Codecov's misleading coverage drops. Their PR diff views can't handle flag filtering well and sometimes display nonsense (requiring one to got to the HEAD commit page to view actual data).

@jakkdl
Copy link
Member

jakkdl commented Dec 20, 2024

[written before your latest comment]

Huh, this is really fascinating - I've never heard of type coverage. Though I'm not sure if the upsides of doing it this way outweigh the bugs/quirks when codecov tries to combine the reports.
Adding >100 # pragma: no cover markers for code that is in fact covered is quite a lot, and in case they later on do become uncovered then we won't get a warning.

I think our approach with pyright --verifytypes, where we commit a json with exceptions to the repo has worked quite well, and CI checks for changes so it's easy to notice if somebody tries to add an object without types/docstrings.

We could perhaps do something similar with the output from mypy, but even then I'm unsure if MyPy type coverage actually adds much of value to this repo - given that we both run --verifytypes and have a very strict mypy config. Before we'd completed transitioning to fully typed it probably would've been more useful, but at this point I'd be fairly confident in saying every single remaining Any has been checked multiple times if there's any reasonable way it could be replaced.

mypy 3.13 is passing on main, no? python-trio/trio/actions/runs/12425306606/job/34691878825
So why would it fail here?

@jakkdl this confusion is coming from not using an explicit --python-version. main actually runs checks against 3.9: https://github.com/python-trio/trio/blob/b43383d/pyproject.toml#L173C18-L173C23.

oh yeah derp. Yeah current setup prioritizes only having to specify the version in a single place.

Checking multiple mypy versions would be nice, but yeah it quickly becomes very slow when we also check three platforms.

  src/trio/_path.py:246:22:246:58: error: Incompatible types in assignment (expression has type "Callable[[Path, str | PathLike[str], DefaultNamedArg(bool | None, 'case_sensitive')], Awaitable[bool]]", base class "PurePath" defined the type as "Callable[[PurePath, str | PathLike[str], DefaultNamedArg(bool | None, 'case_sensitive')], bool]")  [assignment]

Looks like Path vs PurePath. That error is a pretty good argument for checking more than just 3.9 in CI, given that the method in question is guarded by if sys.version_info >= (3, 13):

@webknjaz
Copy link
Member Author

@jakkdl for me, the benefit is getting visual representation of the type preciseness. MyPy can also output txt and HTML reports. The latter are useful locally, while the former could go into job summaries.

Here's an example of posting all those text outputs into job summaries: https://github.com/ansible/awx-plugins/actions/runs/12415268940#summary-34661329642.

Also, regarding the separation of pytest vs. MyPy coverage, it does compute distinct checks correctly. So running it allows visually seeing the percentages in PR statuses and also, setting some as required, but not the others.

In general, it's important to have the tests at 100% (the runtime/pytest check) so that we know that there's no dead code in tests. The next priority is 100% of the actual library code (also pytest/runtime) — for this one, it makes sense to use "no cover" pragmas. As for MyPy, I've seen cases where it's impossible to reach 100% precision with the current state of typing. But still, it's useful to see some coverage representation.

@jakkdl
Copy link
Member

jakkdl commented Dec 20, 2024

hmm, I'm not seeing a codecov check now. Is this a silent fail from having after_n_builds set too high?

@webknjaz
Copy link
Member Author

I think Codecov might've been drunk at the time. The checks are in place now.

@webknjaz webknjaz force-pushed the maintenance/mypy-type-coverage-codecov branch from e7fd906 to 90aa073 Compare December 21, 2024 04:18
@A5rocks
Copy link
Contributor

A5rocks commented Dec 21, 2024

"type preciseness" is just tracking Anys, correct? I'm pretty sure we handle those explicitly with type ignore comments, having enabled plenty of Any-related errors.

I guess it can't hurt though. I don't think having codecov fail based on mypy status makes sense -- someone must have already explicitly accepted the Anys (if that's what this is about).

@jakkdl
Copy link
Member

jakkdl commented Dec 21, 2024

I'm probably still voting against, but certainly not vetoing it or anything if others think it's fine.

@jakkdl
Copy link
Member

jakkdl commented Dec 21, 2024

2024-12-21T11:43:31,088942410+01:00
though having 7 (this+1 failing) codecov entries in the summary list looks way overkill/cluttery

@webknjaz
Copy link
Member Author

2024-12-21T11:43:31,088942410+01:00
though having 7 (this+1 failing) codecov entries in the summary list looks way overkill/cluttery

@jakkdl try enabling the newish feature preview for the PR checks. With that, it collapses most of the statuses, only surfacing the failing ones by default.

@webknjaz
Copy link
Member Author

@A5rocks

"type preciseness" is just tracking Anys, correct? I'm pretty sure we handle those explicitly with type ignore comments, having enabled plenty of Any-related errors.

I think that MyPy is not configured to be fully strict, though. This is why the reported type coverage is at 91.79%.

I guess it can't hurt though. I don't think having codecov fail based on mypy status makes sense -- someone must have already explicitly accepted the Anys (if that's what this is about).

Not exactly, I think there are some Any-related checks that aren't enabled by default. That's what I've discovered while researching this. Pretty sure it's these: https://github.com/ansible/awx-plugins/blob/53ebc59/.mypy.ini#L49-L52.

So yeah, it may be useful to have this check voting.

@jakkdl and @A5rocks check out this view: https://app.codecov.io/gh/python-trio/trio/commit/90aa0739d926abd4d6ac0a897557f949a1aee1e6/tree/src/trio?flags%5B0%5D=MyPy. It reveals a ton of imprecise coverage.

@webknjaz webknjaz force-pushed the maintenance/mypy-type-coverage-codecov branch from 90aa073 to 09901bb Compare December 21, 2024 13:31
@jakkdl
Copy link
Member

jakkdl commented Dec 22, 2024

@jakkdl and @A5rocks check out this view: https://app.codecov.io/gh/python-trio/trio/commit/90aa0739d926abd4d6ac0a897557f949a1aee1e6/tree/src/trio?flags%5B0%5D=MyPy. It reveals a ton of imprecise coverage.

I have no clue what the coverage fails in this run are trying to point out. e.g. https://app.codecov.io/gh/python-trio/trio/commit/90aa0739d926abd4d6ac0a897557f949a1aee1e6/blob/src/trio/testing/_raises_group.py?flags%5B0%5D=MyPy apparently only has 73% coverage but going through the coverage fails I can't figure out why any of the uncovered lines is incorrect/problematic in any way. How am I supposed to interpret it?

If it's just that we haven't enabled all the strictness flags... then I guess we could just do that, instead?

@webknjaz
Copy link
Member Author

@jakkdl

but going through the coverage fails I can't figure out why any of the uncovered lines is incorrect/problematic in any way. How am I supposed to interpret it?

I've found that it's useful to have the HTML output integrated into the local contributor automation. It's a bit easier to understand.

The thing is that MyPy measures “preciseness” of the typing and if a typing expression evaluates to Any in some unobvious cases because you referenced a type alias that does that indirectly or the stub of the function you're calling has an Any somewhere in its definition, the preciseness for the line would go down.

So the confusion for many, I think, will be coming from having to learn this new thing. A partially covered line should trigger an investigation into what causes it, with tools beyond just what Codecov renders.

FWIW, I think that a part of what confuses Codecov is the way the tests are run. I have seen problems with it in the past, but not on this scale. This seems to be hitting more corner cases.

If it's just that we haven't enabled all the strictness flags... then I guess we could just do that, instead?

Yeah, kinda. It would still be useful to get some visuals, though. When new strictness flags are added to MyPy, and we don't add them (because one has to know to do this upfront), the invocation may report that everything's good, but the reports may point out that there are problems.

I suggest you to try running MyPy locally with --txt-report and --html-report=. See what it outputs and how you can interpret it. The text output produces several files with different data. The HTML one you can open in your browser and play around.

@webknjaz webknjaz force-pushed the maintenance/mypy-type-coverage-codecov branch from 09901bb to 185fb45 Compare December 23, 2024 01:42
@webknjaz
Copy link
Member Author

@jakkdl by the way, I noticed the tests aren't type-checked. It's usually good to check their types, too.

@webknjaz webknjaz force-pushed the maintenance/mypy-type-coverage-codecov branch from 314f7d7 to 2a8c841 Compare December 23, 2024 01:53
@webknjaz
Copy link
Member Author

@jakkdl I added the strictness settings found in https://github.com/ansible/awx-plugins/blob/53ebc59/.mypy.ini#L9-L58 to demonstrate what's missing.

I also realized that things like suppressing missing imports globally may contribute to this, making them Any. I've found that this can be solved by in-project stubs like https://github.com/ansible/awx-plugins/blob/53ebc59/.mypy.ini#L35. Or at least moving suppressions to separate per-import settings sections rather than being global.

See: https://github.com/python-trio/trio/actions/runs/12459794456/job/34777119411?pr=3162#step:5:1085.

@jakkdl
Copy link
Member

jakkdl commented Dec 23, 2024

@jakkdl by the way, I noticed the tests aren't type-checked. It's usually good to check their types, too.

ah good catch. We do type-check the vast majority of tests, the tests/ is a recent addition. (though #274)

@jakkdl
Copy link
Member

jakkdl commented Dec 23, 2024

@jakkdl I added the strictness settings found in https://github.com/ansible/awx-plugins/blob/53ebc59/.mypy.ini#L9-L58 to demonstrate what's missing.

I also realized that things like suppressing missing imports globally may contribute to this, making them Any. I've found that this can be solved by in-project stubs like https://github.com/ansible/awx-plugins/blob/53ebc59/.mypy.ini#L35. Or at least moving suppressions to separate per-import settings sections rather than being global.

See: https://github.com/python-trio/trio/actions/runs/12459794456/job/34777119411?pr=3162#step:5:1085.

let's open separate issues/PR(s) to address the missing strictness settings. But enabling a % target that's depending on checks that aren't enabled sounds like a bad idea, the workflow a dev would have to do when they get a coverage drop is to manually rerun mypy with extra strictness flags, specifically look for the code they touched amongst the myriad of other errors, and fix it. We could add tracking with no target in this PR, but that wouldn't do much.

Also, going back to raisesgroup that CI run gives 7 errors for _raises_group.py, but coverage gives partial hits for 72 lines, so the vast majority of cov fails will likely remain.

e.g. this line https://app.codecov.io/gh/python-trio/trio/commit/90aa0739d926abd4d6ac0a897557f949a1aee1e6/blob/src/trio/testing/_raises_group.py?flags%5B0%5D=MyPy#L147 is super basic and doesn't depend on any imports or anything.

@jakkdl
Copy link
Member

jakkdl commented Dec 23, 2024

Why did codecov/project/lib drop? 😕

@webknjaz
Copy link
Member Author

Why did codecov/project/lib drop? 😕

Not sure, actually.

@webknjaz
Copy link
Member Author

webknjaz commented Dec 26, 2024

@jakkdl

e.g. this line app.codecov.io/gh/python-trio/trio/commit/90aa0739d926abd4d6ac0a897557f949a1aee1e6/blob/src/trio/testing/_raises_group.py?flags%5B0%5D=MyPy#L147 is super basic and doesn't depend on any imports or anything.

Not explicitly. I'm pretty sure it's hitting this Any: https://github.com/python/typeshed/blob/d882268/stdlib/builtins.pyi#L1453C54-L1453C57. So MyPy knows that it might be that default str or, if the attribute exists, it'd be Any. Technically, that place could concatenate a few Nones and attempt passing that into str.join(), exploding at runtime.

@webknjaz
Copy link
Member Author

let's open separate issues/PR(s) to address the missing strictness settings. But enabling a % target that's depending on checks that aren't enabled sounds like a bad idea, the workflow a dev would have to do when they get a coverage drop is to manually rerun mypy with extra strictness flags, specifically look for the code they touched amongst the myriad of other errors, and fix it. We could add tracking with no target in this PR, but that wouldn't do much.

Yeah, agreed. Working on fixing those things would be more meaningful. However, the visuals could help with the clues later, after that's fixed.

@jakkdl
Copy link
Member

jakkdl commented Dec 27, 2024

@jakkdl

e.g. this line app.codecov.io/gh/python-trio/trio/commit/90aa0739d926abd4d6ac0a897557f949a1aee1e6/blob/src/trio/testing/_raises_group.py?flags%5B0%5D=MyPy#L147 is super basic and doesn't depend on any imports or anything.

Not explicitly. I'm pretty sure it's hitting this Any: https://github.com/python/typeshed/blob/d882268/stdlib/builtins.pyi#L1453C54-L1453C57. So MyPy knows that it might be that default str or, if the attribute exists, it'd be Any. Technically, that place could concatenate a few Nones and attempt passing that into str.join(), exploding at runtime.

right - that makes sense, but I don't see much/any upside to push contributors to delve into the innards of typeshed; or litter tons of casts; if they want to debug coverage drops. The false positive rate here seems so high that the ~only reasonable thing to do as a dev is to ignore the cov% being spit out by mypy; and the graphs/visualization on codecov will be mostly noise (aside from jumps when we enable stricter checks).

For this example I don't think the code can be improved, if it blows up it's because has written a broken exception subgroup; and the only reasonable thing to do in that case is crashing.

Maybe if the false positives drop off a cliff after we enable all the checks (and doing that might in itself not be worth doing tbh), but I ~only see downsides (though mostly small/trivial ones) to adding this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
project meta skip newsfragment Newsfragment is not required
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants