Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Folder with module name detected as source leads to false I001 error #10519

Open
carschno opened this issue Mar 22, 2024 · 7 comments
Open

Folder with module name detected as source leads to false I001 error #10519

carschno opened this issue Mar 22, 2024 · 7 comments
Labels
isort Related to import sorting

Comments

@carschno
Copy link

carschno commented Mar 22, 2024

This issue affects the expected sorting of imports in Python and hence (incorrectly) triggers a I001 error.

When there is a folder that has the same name as a module, it is (possibly incorrectly) identified as SourceMatch. Ruff then categorizes the module as Known(FirstParty) and adapts the expected sorting accordingly.

This happens commonly, but not exclusively, when using the wandb library because it creates a wandb folder, as discovered in ChartBoost/ruff-action#20.

To reproduce the issue, create a Python file with these imports (I called it test.py):

import csv
import logging
import random
import sys
from typing import Any, Optional, TextIO

import torch
import torch.nn as nn
from torch import optim
from torcheval.metrics import (
    Metric,
    MulticlassAccuracy,
    MulticlassF1Score,
    MulticlassPrecision,
    MulticlassRecall,
)
from tqdm import tqdm

import wandb

In the initial scenario, there is a wandb directory:

% ls -d wandb/
wandb/

Running Ruff to check the import sorting:

% poetry run ruff check -v --select=I001 test.py                                            
[2024-03-22][08:57:28][ruff::resolve][DEBUG] Using configuration file (via parent) at: /Users/carstenschnober/LAHTeR/workspace/document-segmentation/pyproject.toml
[2024-03-22][08:57:28][ruff::commands::check][DEBUG] Identified files to lint in: 2.013375ms
[2024-03-22][08:57:28][ruff::diagnostics][DEBUG] Checking: /Users/carstenschnober/LAHTeR/workspace/document-segmentation/test.py
[2024-03-22][08:57:28][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'torch.nn' as Known(ThirdParty) (NoMatch)
[2024-03-22][08:57:28][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'csv' as Known(StandardLibrary) (KnownStandardLibrary)
[2024-03-22][08:57:28][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'random' as Known(StandardLibrary) (KnownStandardLibrary)
[2024-03-22][08:57:28][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'sys' as Known(StandardLibrary) (KnownStandardLibrary)
[2024-03-22][08:57:28][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'logging' as Known(StandardLibrary) (KnownStandardLibrary)
[2024-03-22][08:57:28][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'torch' as Known(ThirdParty) (NoMatch)
[2024-03-22][08:57:28][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'wandb' as Known(FirstParty) (SourceMatch("/Users/carstenschnober/LAHTeR/workspace/document-segmentation"))
[2024-03-22][08:57:28][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'torch' as Known(ThirdParty) (NoMatch)
[2024-03-22][08:57:28][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'torcheval.metrics' as Known(ThirdParty) (NoMatch)
[2024-03-22][08:57:28][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'typing' as Known(StandardLibrary) (KnownStandardLibrary)
[2024-03-22][08:57:28][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'tqdm' as Known(ThirdParty) (NoMatch)
[2024-03-22][08:57:28][ruff::commands::check][DEBUG] Checked 1 files in: 765.709µs
All checks passed!

The checks pass, wandb has been categorized as Known(FirstParty)

Now remove the wandb directory:

% mv wandb wandb.bak
% ls -d wandb/
ls: wandb/: No such file or directory

Running the same Ruff check triggers a I001 error on the same file, categorizing wandb as Known(ThirdParty); the module categorization is cached, so I remove the .ruff_cache directory first to reproduce the error:

% rm -r .ruff_cache                             
% poetry run ruff check -v --select=I001 test.py
[2024-03-22][09:06:42][ruff::resolve][DEBUG] Using configuration file (via parent) at: /Users/carstenschnober/LAHTeR/workspace/document-segmentation/pyproject.toml
[2024-03-22][09:06:42][ruff::commands::check][DEBUG] Identified files to lint in: 1.959083ms
[2024-03-22][09:06:42][ruff::diagnostics][DEBUG] Checking: /Users/carstenschnober/LAHTeR/workspace/document-segmentation/test.py
[2024-03-22][09:06:42][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'torch.nn' as Known(ThirdParty) (NoMatch)
[2024-03-22][09:06:42][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'csv' as Known(StandardLibrary) (KnownStandardLibrary)
[2024-03-22][09:06:42][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'random' as Known(StandardLibrary) (KnownStandardLibrary)
[2024-03-22][09:06:42][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'sys' as Known(StandardLibrary) (KnownStandardLibrary)
[2024-03-22][09:06:42][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'logging' as Known(StandardLibrary) (KnownStandardLibrary)
[2024-03-22][09:06:42][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'torch' as Known(ThirdParty) (NoMatch)
[2024-03-22][09:06:42][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'wandb' as Known(FirstParty) (SourceMatch("/Users/carstenschnober/LAHTeR/workspace/document-segmentation"))
[2024-03-22][09:06:42][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'torch' as Known(ThirdParty) (NoMatch)
[2024-03-22][09:06:42][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'torcheval.metrics' as Known(ThirdParty) (NoMatch)
[2024-03-22][09:06:42][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'typing' as Known(StandardLibrary) (KnownStandardLibrary)
[2024-03-22][09:06:42][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'tqdm' as Known(ThirdParty) (NoMatch)
[2024-03-22][09:06:42][ruff::commands::check][DEBUG] Checked 1 files in: 1.004834ms
test.py:1:1: I001 [*] Import block is un-sorted or un-formatted
Found 1 error.
[*] 1 fixable with the `--fix` option.

This is now the expected sorting that is generated when calling ruff --fix call above:

import csv
import logging
import random
import sys
from typing import Any, Optional, TextIO

import torch
import torch.nn as nn
import wandb
from torch import optim
from torcheval.metrics import (
    Metric,
    MulticlassAccuracy,
    MulticlassF1Score,
    MulticlassPrecision,
    MulticlassRecall,
)
from tqdm import tqdm

This configuration option fixes the issue properly (see ChartBoost/ruff-action#20 (comment)):

[tool.ruff.lint.isort]
known-third-party = ["wandb"]

However, it is difficult for users to identify the issue and fix the configuration accordingly. I think a better solution would be to have a more robust source directory detection.
A heuristics like checking for __init__.py or generally the presence of *.py as a condition file might be a solid starting point for Python.

@carschno carschno changed the title Folder with module name detected as source Folder with module name detected as source leads to false I001 error Mar 24, 2024
@charliermarsh
Copy link
Member

We could consider requiring an __init__.py but it would be a breaking change, I think.

@charliermarsh charliermarsh added the isort Related to import sorting label Mar 24, 2024
@carschno
Copy link
Author

We could consider requiring an __init__.py but it would be a breaking change, I think.

I see how a folder that is now correctly categorized as a source folder might not have an __init__.py in some cases, so this requirement could perhaps be something for the next major release indeed.

If a source folder is required to contain any *.py file, however, I cannot see how it would break current behaviour -- except for cases in which the sorting has now been incorrectly adapted to identified source folders.
So perhaps requiring any .py file could be an intermediate fix, whereas the next major release requires an __init__.py?

If somebody could point me to the relevant source code, I could try and work on a PR.

@bkad
Copy link

bkad commented Apr 4, 2024

Running the same Ruff check triggers a I001 error on the same file, categorizing wandb as Known(ThirdParty); the module categorization is cached, so I remove the .ruff_cache directory first to reproduce the error

Shouldn't the removal of the wandb directory be enough to invalidate the module categorization cache? The fact that you need to manually remove the cache to get the correct results also seems wrong.

@charliermarsh
Copy link
Member

Diagnostics are cached on a per-file basis, where the "file" is the file in which the diagnostic is present. Changing other files on the filesystem doesn't invalidate the cache.

Changing the settings will also invalidate the cache. So if you add wandb as a known-third-party module (which is the suggested change), it will also reflect that change on re-run without clearing the cache mnanually.

@bkad
Copy link

bkad commented Apr 4, 2024

Say you were using a third party package in your requirements, but decided later to vendor it to make changes, turning it into a first party package. Then unless caches were manually cleaned you would get incorrect results from the isort check. Even worse, they could be inconsistent with CI checks, which can cause a lot of confusion. We ran into an issue like this today, and had to turn off caching as a result.

@carschno
Copy link
Author

carschno commented Apr 4, 2024

Say you were using a third party package in your requirements, but decided later to vendor it to make changes, turning it into a first party package. Then unless caches were manually cleaned you would get incorrect results from the isort check. Even worse, they could be inconsistent with CI checks, which can cause a lot of confusion. We ran into an issue like this today, and had to turn off caching as a result.

Honestly, I think this should be a separate issue about caching. It is only related in the sense that caching must somehow also handle the detection of third party folders. But if you propose improvements in the caching logic, I think it would make sense to do this separately from this issue (detection of source folders).

@MichaReiser
Copy link
Member

We're looking into changing how we cache data as part of our multifile analysis work. It will allow us to invalidate caches based on dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
isort Related to import sorting
Projects
None yet
Development

No branches or pull requests

4 participants