Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore line endings in exclude-file #1889

Merged
4 changes: 2 additions & 2 deletions codespell_lib/_codespell.py
Original file line number Diff line number Diff line change
Expand Up @@ -628,7 +628,7 @@ def parse_ignore_words_option(ignore_words_option: List[str]) -> Set[str]:
def build_exclude_hashes(filename: str, exclude_lines: Set[str]) -> None:
with open(filename, encoding="utf-8") as f:
for line in f:
exclude_lines.add(line)
exclude_lines.add(line.rstrip())
Jackenmen marked this conversation as resolved.
Show resolved Hide resolved


def build_ignore_words(filename: str, ignore_words: Set[str]) -> None:
Expand Down Expand Up @@ -896,7 +896,7 @@ def parse_file(
return bad_count

for i, line in enumerate(lines):
if line in exclude_lines:
if line.rstrip() in exclude_lines:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've realised the issue. This will write back out. See the -w based tests (and extend them to prove it/stop us breaking it anyway).

An input of abandonned\nAbandonned\r\nABANDONNED \nAbAnDoNnEd will become abandonned\nAbandonned\nABANDONNED\nAbAnDoNnEd I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An input of abandonned\nAbandonned\r\nABANDONNED \nAbAnDoNnEd will become abandonned\nAbandonned\nABANDONNED\nAbAnDoNnEd I think

It becomes abandoned\nAbandoned\nABANDONED \nabandoned on both codespell 2.0.0 and on the version with modifications from this PR.
I'm not quite sure what and why I should be extending tests here as the changes from this PR do not make any changes that would cause any regression. I feel like it would be better suited for a separate PR?

If you don't agree, I can make the changes here, just let me know what test I should be modifying and/or what kind of inputs/outputs should I be checking there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peternewman do you think I could get your decision here so that I know if this is fine or if I should work on something before this will be able to move forward?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An input of abandonned\nAbandonned\r\nABANDONNED \nAbAnDoNnEd will become abandonned\nAbandonned\nABANDONNED\nAbAnDoNnEd I think

It becomes abandoned\nAbandoned\nABANDONED \nabandoned on both codespell 2.0.0 and on the version with modifications from this PR.

Odd, as we added the tests here in #2490 and they seem to behave okay, although admittedly they are a slightly different code path to the command line version. What OS were you testing on?

I'm not quite sure what and why I should be extending tests here as the changes from this PR do not make any changes that would cause any regression. I feel like it would be better suited for a separate PR?

I believe the change you're making will introduce a bug, or perhaps simply further embed an existing bug, so I figure fixing the crucial underlying one first is probably a good idea.

If you don't agree, I can make the changes here, just let me know what test I should be modifying and/or what kind of inputs/outputs should I be checking there.

As mentioned, I think we've now done that in #2490 so when that's in, you can just update/rebase and as long as your code continue to pass we'll be all good.

Copy link
Contributor Author

@Jackenmen Jackenmen Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What OS were you testing on?

Back then probably only on Windows 10.

I believe the change you're making will introduce a bug, or perhaps simply further embed an existing bug, so I figure fixing the crucial underlying one first is probably a good idea.

That certainly makes sense, I am just not really sure what the issue is exactly. Hopefully, the tests added in #2490 will make it more clear whether there is any issue and if so, what that issue is, thanks for working on those tests. Once that PR gets merged, I'll rebase.
If any further work on the PR will be needed, I might not get to it before Saturday but other than that, I'll be happy to make any necessary changes 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rebased the PR on top of current master now that the tests got merged. Locally the tests ran fine, you'll have to approve the CI run though since I haven't had any PRs into this repository yet.


By the way, in case you don't know, the requirement for approval of workflows for first-time contributors who are NOT new to GitHub can be disabled to decrease the contribution barrier for first-time contributors, see:
https://twitter.com/JakubKuczys/status/1562660410108841986

continue

fixed_words = set()
Expand Down
41 changes: 34 additions & 7 deletions codespell_lib/tests/test_basic.py
Original file line number Diff line number Diff line change
Expand Up @@ -325,10 +325,21 @@ def test_ignore_dictionary(
) -> None:
"""Test ignore dictionary functionality."""
bad_name = tmp_path / "bad.txt"
bad_name.write_text("1 abandonned 1\n2 abandonned 2\nabondon\n")
assert cs.main(bad_name) == 3
bad_name.write_text(
"1 abandonned 1\n"
"2 abandonned 2\n"
"3 abandonned 3\r\n"
"4 abilty 4\n"
"5 abilty 5\n"
"6 abilty 6\r\n"
"7 ackward 7\n"
"8 ackward 8\n"
"9 ackward 9\r\n"
"abondon\n"
)
assert cs.main(bad_name) == 10
fname = tmp_path / "ignore.txt"
fname.write_text("abandonned\n")
fname.write_text("abandonned\nabilty\r\nackward")
assert cs.main("-I", fname, bad_name) == 1


Expand Down Expand Up @@ -363,11 +374,27 @@ def test_exclude_file(
) -> None:
"""Test exclude file functionality."""
bad_name = tmp_path / "bad.txt"
bad_name.write_bytes(b"1 abandonned 1\n2 abandonned 2\n")
assert cs.main(bad_name) == 2
# check all possible combinations of lines to ignore and ignores
combinations = "".join(
f"{n} abandonned {n}\n"
f"{n} abandonned {n}\r\n"
f"{n} abandonned {n} \n"
f"{n} abandonned {n} \r\n"
for n in range(1, 5)
)
bad_name.write_bytes(
(combinations + "5 abandonned 5\n6 abandonned 6").encode("utf-8")
)
assert cs.main(bad_name) == 18
fname = tmp_path / "tmp.txt"
fname.write_bytes(b"1 abandonned 1\n")
assert cs.main(bad_name) == 2
fname.write_bytes(
b"1 abandonned 1\n"
b"2 abandonned 2\r\n"
b"3 abandonned 3 \n"
b"4 abandonned 4 \r\n"
b"6 abandonned 6\n"
)
assert cs.main(bad_name) == 18
assert cs.main("-x", fname, bad_name) == 1


Expand Down
Loading