-
-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🧪 Publish MyPy type coverage to Codecov #3162
base: main
Are you sure you want to change the base?
🧪 Publish MyPy type coverage to Codecov #3162
Conversation
b07705a
to
b2ce8b2
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3162 +/- ##
====================================================
- Coverage 100.00000% 98.81890% -1.18111%
====================================================
Files 124 141 +17
Lines 18427 23114 +4687
Branches 1215 2273 +1058
====================================================
+ Hits 18427 22841 +4414
- Misses 0 155 +155
- Partials 0 118 +118 |
dbdb9c2
to
8c6df05
Compare
mypy 3.13 is passing on main, no? https://github.com/python-trio/trio/actions/runs/12425306606/job/34691878825 |
@jakkdl this confusion is coming from not using an explicit In other projects, I also put the lowest supported version in the config: https://github.com/aio-libs/yarl/blob/c1f6ef6/.mypy.ini#L2C18-L2C21. And I include as many settings as I can there, everything that is common. But then, I also run MyPy against every other version in the range: https://github.com/aio-libs/yarl/blob/c1f6ef6/.pre-commit-config.yaml#L109-L187. Ideally, it should run against every Python. Period. But it's expensive, so I either do every other, or min+max. For history, this is how it's failing: src/trio/_path.py:246:22:246:58: error: Incompatible types in assignment (expression has type "Callable[[Path, str | PathLike[str], DefaultNamedArg(bool | None, 'case_sensitive')], Awaitable[bool]]", base class "PurePath" defined the type as "Callable[[PurePath, str | PathLike[str], DefaultNamedArg(bool | None, 'case_sensitive')], bool]") [assignment] (https://github.com/python-trio/trio/actions/runs/12425469238/job/34692341320?pr=3162#step:5:1083) |
f08ddfa
to
1b4daf2
Compare
1b4daf2
to
e7fd906
Compare
@jakkdl this should be ready now. Don't worry about Codecov's misleading coverage drops. Their PR diff views can't handle flag filtering well and sometimes display nonsense (requiring one to got to the HEAD commit page to view actual data). |
[written before your latest comment] Huh, this is really fascinating - I've never heard of type coverage. Though I'm not sure if the upsides of doing it this way outweigh the bugs/quirks when codecov tries to combine the reports. I think our approach with We could perhaps do something similar with the output from mypy, but even then I'm unsure if MyPy type coverage actually adds much of value to this repo - given that we both run
oh yeah derp. Yeah current setup prioritizes only having to specify the version in a single place. Checking multiple mypy versions would be nice, but yeah it quickly becomes very slow when we also check three platforms. src/trio/_path.py:246:22:246:58: error: Incompatible types in assignment (expression has type "Callable[[Path, str | PathLike[str], DefaultNamedArg(bool | None, 'case_sensitive')], Awaitable[bool]]", base class "PurePath" defined the type as "Callable[[PurePath, str | PathLike[str], DefaultNamedArg(bool | None, 'case_sensitive')], bool]") [assignment] Looks like |
@jakkdl for me, the benefit is getting visual representation of the type preciseness. MyPy can also output txt and HTML reports. The latter are useful locally, while the former could go into job summaries. Here's an example of posting all those text outputs into job summaries: https://github.com/ansible/awx-plugins/actions/runs/12415268940#summary-34661329642. Also, regarding the separation of pytest vs. MyPy coverage, it does compute distinct checks correctly. So running it allows visually seeing the percentages in PR statuses and also, setting some as required, but not the others. In general, it's important to have the tests at 100% (the runtime/pytest check) so that we know that there's no dead code in tests. The next priority is 100% of the actual library code (also pytest/runtime) — for this one, it makes sense to use "no cover" pragmas. As for MyPy, I've seen cases where it's impossible to reach 100% precision with the current state of typing. But still, it's useful to see some coverage representation. |
hmm, I'm not seeing a codecov check now. Is this a silent fail from having |
I think Codecov might've been drunk at the time. The checks are in place now. |
e7fd906
to
90aa073
Compare
"type preciseness" is just tracking Anys, correct? I'm pretty sure we handle those explicitly with type ignore comments, having enabled plenty of Any-related errors. I guess it can't hurt though. I don't think having codecov fail based on mypy status makes sense -- someone must have already explicitly accepted the Anys (if that's what this is about). |
I'm probably still voting against, but certainly not vetoing it or anything if others think it's fine. |
@jakkdl try enabling the newish feature preview for the PR checks. With that, it collapses most of the statuses, only surfacing the failing ones by default. |
I think that MyPy is not configured to be fully strict, though. This is why the reported type coverage is at 91.79%.
Not exactly, I think there are some Any-related checks that aren't enabled by default. That's what I've discovered while researching this. Pretty sure it's these: https://github.com/ansible/awx-plugins/blob/53ebc59/.mypy.ini#L49-L52. So yeah, it may be useful to have this check voting. @jakkdl and @A5rocks check out this view: https://app.codecov.io/gh/python-trio/trio/commit/90aa0739d926abd4d6ac0a897557f949a1aee1e6/tree/src/trio?flags%5B0%5D=MyPy. It reveals a ton of imprecise coverage. |
90aa073
to
09901bb
Compare
I have no clue what the coverage fails in this run are trying to point out. e.g. https://app.codecov.io/gh/python-trio/trio/commit/90aa0739d926abd4d6ac0a897557f949a1aee1e6/blob/src/trio/testing/_raises_group.py?flags%5B0%5D=MyPy apparently only has 73% coverage but going through the coverage fails I can't figure out why any of the uncovered lines is incorrect/problematic in any way. How am I supposed to interpret it? If it's just that we haven't enabled all the strictness flags... then I guess we could just do that, instead? |
I've found that it's useful to have the HTML output integrated into the local contributor automation. It's a bit easier to understand. The thing is that MyPy measures “preciseness” of the typing and if a typing expression evaluates to So the confusion for many, I think, will be coming from having to learn this new thing. A partially covered line should trigger an investigation into what causes it, with tools beyond just what Codecov renders. FWIW, I think that a part of what confuses Codecov is the way the tests are run. I have seen problems with it in the past, but not on this scale. This seems to be hitting more corner cases.
Yeah, kinda. It would still be useful to get some visuals, though. When new strictness flags are added to MyPy, and we don't add them (because one has to know to do this upfront), the invocation may report that everything's good, but the reports may point out that there are problems. I suggest you to try running MyPy locally with |
09901bb
to
185fb45
Compare
@jakkdl by the way, I noticed the tests aren't type-checked. It's usually good to check their types, too. |
314f7d7
to
2a8c841
Compare
@jakkdl I added the strictness settings found in https://github.com/ansible/awx-plugins/blob/53ebc59/.mypy.ini#L9-L58 to demonstrate what's missing. I also realized that things like suppressing missing imports globally may contribute to this, making them See: https://github.com/python-trio/trio/actions/runs/12459794456/job/34777119411?pr=3162#step:5:1085. |
let's open separate issues/PR(s) to address the missing strictness settings. But enabling a % target that's depending on checks that aren't enabled sounds like a bad idea, the workflow a dev would have to do when they get a coverage drop is to manually rerun mypy with extra strictness flags, specifically look for the code they touched amongst the myriad of other errors, and fix it. We could add tracking with no target in this PR, but that wouldn't do much. Also, going back to raisesgroup that CI run gives 7 errors for e.g. this line https://app.codecov.io/gh/python-trio/trio/commit/90aa0739d926abd4d6ac0a897557f949a1aee1e6/blob/src/trio/testing/_raises_group.py?flags%5B0%5D=MyPy#L147 is super basic and doesn't depend on any imports or anything. |
Why did |
Not sure, actually. |
Not explicitly. I'm pretty sure it's hitting this |
Yeah, agreed. Working on fixing those things would be more meaningful. However, the visuals could help with the clues later, after that's fixed. |
right - that makes sense, but I don't see much/any upside to push contributors to delve into the innards of typeshed; or litter tons of For this example I don't think the code can be improved, if it blows up it's because has written a broken exception subgroup; and the only reasonable thing to do in that case is crashing. Maybe if the false positives drop off a cliff after we enable all the checks (and doing that might in itself not be worth doing tbh), but I ~only see downsides (though mostly small/trivial ones) to adding this now. |
This patch also includes running MyPy against both Python 3.9
and 3.13(3.13 has MyPy violations, so let's mark it out of the scope for now).Quirks: https://discuss.python.org/t/is-there-any-tool-that-can-report-type-coverage-of-a-project/34962/4