🚧 Ignore escaped characters #1422

akaszynski · 2020-02-06T18:29:13Z

This take a stab at ignoring escaped characters. It's a basic (and probably inefficient) implementation, so if there's a better way to get this done, please let me know.

Also, the escape dictionary should be modifiable by the user.

larsoner

On a real code base this PR takes my run from 1.097 sec to 1.508 sec. If I change your function to:

    for key, rep in find_dict.items():
        text = text.replace(key, rep)
    return text

it only increases to 1.161 sec. So maybe regex is less than ideal here.

Can you check on PyVista and see if you observe the same?

akaszynski · 2020-02-06T19:03:02Z

Same problem here. I've noticed that regex is almost always slower than the built-in replace.

akaszynski · 2020-02-07T23:38:57Z

We're now using replace. As for making the substitution dictionary customizable, would you prefer to have a file containing the key/value pairs, or a string that they read in?

larsoner · 2020-02-08T00:12:53Z

As for making the substitution dictionary customizable, would you prefer to have a file containing the key/value pairs, or a string that they read in?

I find putting characters like \n in the command line to be annoying (usually takes me multiple tries to get it right), so I would lean toward a file

akaszynski · 2020-02-08T00:20:04Z

Agreed, I was encountering that issue as well when testing it out. I'm thinking simple key value pairs in a csv, where the pairs are separated by line breaks.

"\n", " "
"\'", "'"

larsoner · 2020-02-08T00:23:03Z

Might make more sense to match the dictionary format

thing->replacement

akaszynski · 2020-02-08T00:31:47Z

As we’re going to have to use spaces, are quotes permitted in your dictionary format?

"\n"->" "
"\'"->"'"

larsoner · 2020-02-10T15:53:02Z

I would just make it that anything to the left of -> is the substring to look for, and anything to the right is the replacement. So:

\n-> 
\'->

akaszynski · 2020-02-14T23:00:53Z

Should be good to review now.

codespell_lib/_codespell.py

codespell_lib/tests/test_basic.py

larsoner · 2020-02-15T19:17:31Z

Looks good. One last thing: we probably want to provide and use a default file, just like we do for the dictionary. Maybe it should just contain \n-> to start.

akaszynski · 2020-02-15T19:44:34Z

Looks good. One last thing: we probably want to provide and use a default file, just like we do for the dictionary. Maybe it should just contain \n-> to start.

Perhaps \'->' as well.

larsoner · 2020-02-17T15:51:52Z

Fine with me

peternewman · 2020-03-03T11:17:35Z

codespell_lib/_codespell.py

@@ -232,6 +232,10 @@ def parse_options(args):
                        help='Comma separated list of words to be ignored '
                             'by codespell. Words are case sensitive based on '
                             'how they are written in the dictionary file')
+    parser.add_argument('-P', '--sub-pairs', type=str, metavar='FILE',
+                        help='Custom substitution text file that contains '


I'm a bit unclear from the help detail what this is for, is it to "fix up" the dictionary to deal with it matching escape sequences? To actually do sed type runs on my codebase or something else?

Is this linked to #233 ?

larsoner · 2020-04-05T00:21:39Z

Also in the related PR #174 it was noted that the write-changes flag was problematic when dealing with escapes, if we come back to this we should make sure that write-changes works even with these substitutions

bl-ue · 2021-05-27T21:00:45Z

Why the close? I see this is quite old 🤔

akaszynski · 2021-05-27T22:58:52Z

Why the close? I see this is quite old thinking

Just cleaning up old PRs. I'd like to work on this, but this project hasn't been updated in quite some time (last release 6 months ago).

peternewman · 2021-05-28T06:12:11Z

Hi @akaszynski ,

Despite the lack of releases, coding is still happening (although mostly in the dictionary), also see #1923:
v2.0.0...master

You didn't seem to have respond to the review comments myself and @larsoner had left if you were expecting it to have been merged.

akaszynski · 2021-05-28T17:08:46Z

You didn't seem to have respond to the review comments myself and @larsoner had left if you were expecting it to have been merged.

True, there's still work to be done. I'll work on this.

Today, codespell looks at "\tRead" which is "tab" followed by "Read" as "tRead", and flags this as a spelling mistake. See: codespell-project/codespell#1422 which was closed without merging. Once that is resolved upstream - we can undo this. Signed-off-by: Robin Getz <rgetz@mathworks.com>

initial stab at ignoring escaped characters

b4daf4a

akaszynski mentioned this pull request Feb 6, 2020

[MNT] Fix codespell config pyvista/pyvista#591

Merged

larsoner reviewed Feb 6, 2020

View reviewed changes

now using replace instead of re.sub

103fa18

added substitution pair reader

865899c

c72578 reviewed Feb 14, 2020

View reviewed changes

codespell_lib/_codespell.py Outdated Show resolved Hide resolved

c72578 reviewed Feb 14, 2020

View reviewed changes

codespell_lib/tests/test_basic.py Outdated Show resolved Hide resolved

fixed spelling errors

1ae24a1

peternewman reviewed Mar 3, 2020

View reviewed changes

rousskov mentioned this pull request Mar 23, 2020

Bug 5021: Add a script to fix spelling errors with codespell squid-cache/squid#565

Closed

akaszynski closed this May 27, 2021

akaszynski reopened this May 28, 2021

lassoan mentioned this pull request Jun 10, 2021

Fix misspellings and trim trailing whitespace Slicer/Slicer#5686

Merged

bl-ue added the enhancement label Jun 11, 2021

bl-ue marked this pull request as draft June 13, 2021 13:44

akaszynski closed this Jun 16, 2022

juju4 mentioned this pull request Jan 8, 2023

Escaped characters in strings not ignored in 2.0 version #1774

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚧 Ignore escaped characters #1422

🚧 Ignore escaped characters #1422

akaszynski commented Feb 6, 2020

larsoner left a comment

akaszynski commented Feb 6, 2020

akaszynski commented Feb 7, 2020

larsoner commented Feb 8, 2020

akaszynski commented Feb 8, 2020

larsoner commented Feb 8, 2020

akaszynski commented Feb 8, 2020 •

edited

Loading

larsoner commented Feb 10, 2020

akaszynski commented Feb 14, 2020

larsoner commented Feb 15, 2020

akaszynski commented Feb 15, 2020

larsoner commented Feb 17, 2020

peternewman Mar 3, 2020

peternewman Mar 3, 2020

larsoner commented Apr 5, 2020

bl-ue commented May 27, 2021 •

edited

Loading

akaszynski commented May 27, 2021

peternewman commented May 28, 2021

akaszynski commented May 28, 2021

🚧 Ignore escaped characters #1422

🚧 Ignore escaped characters #1422

Conversation

akaszynski commented Feb 6, 2020

larsoner left a comment

Choose a reason for hiding this comment

akaszynski commented Feb 6, 2020

akaszynski commented Feb 7, 2020

larsoner commented Feb 8, 2020

akaszynski commented Feb 8, 2020

larsoner commented Feb 8, 2020

akaszynski commented Feb 8, 2020 • edited Loading

larsoner commented Feb 10, 2020

akaszynski commented Feb 14, 2020

larsoner commented Feb 15, 2020

akaszynski commented Feb 15, 2020

larsoner commented Feb 17, 2020

peternewman Mar 3, 2020

Choose a reason for hiding this comment

peternewman Mar 3, 2020

Choose a reason for hiding this comment

larsoner commented Apr 5, 2020

bl-ue commented May 27, 2021 • edited Loading

akaszynski commented May 27, 2021

peternewman commented May 28, 2021

akaszynski commented May 28, 2021

akaszynski commented Feb 8, 2020 •

edited

Loading

bl-ue commented May 27, 2021 •

edited

Loading