Raise error in usecols when column doesn't exist but length matches #16460

bpraggastis · 2017-05-23T18:09:11Z

closes ERR: usecols fails to raise error if column doesn't exist but is the same length as headers #14671
tests added / passed
passes git diff upstream/master --name-only -- '*.py' | flake8 --diff
Bug fix: In parser.py added check that usecols option in read_csv raises an error if it contains strings not in the names option. 'test_raise_on_usecols_names_mismatch' added to tests/io/parser/usecols.py

TomAugspurger · 2017-05-23T19:10:54Z

There will be some warnings from our style checker:

pandas/tests/io/parser/usecols.py:481:9: E266 too many leading '#' for block comment
pandas/tests/io/parser/usecols.py:483:45: E262 inline comment should start with '# '
pandas/tests/io/parser/usecols.py:484:32: E261 at least two spaces before inline comment
pandas/tests/io/parser/usecols.py:484:33: E262 inline comment should start with '# '
pandas/tests/io/parser/usecols.py:486:23: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:486:27: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:486:31: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:488:38: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:488:50: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:488:62: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:488:74: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:491:23: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:491:27: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:491:31: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:495:23: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:495:27: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:496:80: E501 line too long (91 > 79 characters)
pandas/tests/io/parser/usecols.py:502:38: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:502:50: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:502:62: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:502:74: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:506:80: E501 line too long (84 > 79 characters)
pandas/tests/io/parser/usecols.py:511:80: E501 line too long (84 > 79 characters)
pandas/tests/io/parser/usecols.py:516:9: E303 too many blank lines (2)
pandas/tests/io/parser/usecols.py:516:23: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:516:27: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:516:31: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:517:80: E501 line too long (91 > 79 characters)
pandas/tests/io/parser/usecols.py:518:80: E501 line too long (81 > 79 characters)
pandas/tests/io/parser/usecols.py:519:23: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:519:27: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:520:80: E501 line too long (91 > 79 characters)

you can get the warnings with flake8 pandas/tests/io/parser/usecols.py (may have to pip install flake8)

Could you also add a release note to doc/source/whatsnew/v0.20.2.txt under the bug fixes seciton?

jreback · 2017-05-24T12:33:38Z

pandas/io/parsers.py

@@ -1620,6 +1620,12 @@ def __init__(self, src, **kwds):

        if self.usecols:
            usecols = _evaluate_usecols(self.usecols, self.orig_names)
+
+            #gh-14671


@gfyoung is there a reason some of these checks for usecols are outside _evaluate_usecols?

(e.g. the one after this one could be inside if self.names were passed)

_evaluate_usecols is purely for evaluation. I don't think it was meant to have validation like this inside its implementation as well.

k that's fine, maybe think about making this an actual _validate_usecols (if it reduces code and is more clear).

jreback · 2017-05-24T12:33:55Z

pandas/tests/io/parser/usecols.py

+        expected = DataFrame({'A': [1,5], 'B': [2,6], 'C': [3,7], 'D': [4,8]})
+        tm.assert_frame_equal(df, expected)
+
+        # usecols = ['A','C']


commented out?

Yes, those failures are related to #16469. Should put a TODO there I think.

@bpraggastis I think add a TODO here with the issue above.

jreback · 2017-05-24T12:34:41Z

can you add a whatsnew entry for 0.20.2 (IO section)

gfyoung · 2017-05-24T16:24:48Z

pandas/tests/io/parser/usecols.py

+    def test_raise_on_usecols_names_mismatch(self):
+        ## see gh-14671
+        data = 'a,b,c,d\n1,2,3,4\n5,6,7,8'
+        msg = 'Usecols do not match names'  ## from parsers.py CParserWrapper()


Condition the message on self.engine i.e.:

msg = <first-message> if self.engine == 'c' else <second-message>

That way you don't need that massive regex (and can remove the re import)

do this one as well

TomAugspurger · 2017-05-30T21:34:50Z

@bpraggastis do you have time to update this today or tomorrow? We're releasing 0.20.2 by the end of the week.

jreback

if you can address @gfyoung changes we can get this in .

jreback

pls add a whatsnew note as well (0.20.2) in IO for bug fixes.

gfyoung · 2017-05-31T16:52:13Z

@TomAugspurger @jreback : If @bpraggastis does not respond by tomorrow, just ping me. I can pull this branch down and finish it off.

jreback · 2017-06-02T09:55:11Z

@gfyoung if you are interested

bpraggastis · 2017-06-02T17:40:56Z

@TomAugspurger @jreback Just now had a chance to check back in. Have you fixed all of the things you wanted changed or should I review your comments and respond?

jreback · 2017-06-02T22:33:26Z

@bpraggastis couple of comments, pls do those
add a whatsnew entry for 0.20.2
needs to pass CI

codecov · 2017-06-03T20:35:24Z

Codecov Report

Merging #16460 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #16460      +/-   ##
==========================================
+ Coverage   90.42%   90.42%   +<.01%     
==========================================
  Files         161      161              
  Lines       51024    51026       +2     
==========================================
+ Hits        46139    46141       +2     
  Misses       4885     4885

Flag	Coverage Δ
#multiple	`88.26% <100%> (ø)`	⬆️
#single	`40.17% <0%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/parsers.py	`95.32% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 92372c7...972d72b. Read the comment docs.

codecov · 2017-06-03T20:35:28Z

Codecov Report

Merging #16460 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #16460      +/-   ##
==========================================
+ Coverage   90.92%   90.92%   +<.01%     
==========================================
  Files         161      161              
  Lines       49240    49242       +2     
==========================================
+ Hits        44772    44775       +3     
+ Misses       4468     4467       -1

Flag	Coverage Δ
#multiple	`88.68% <100%> (ø)`	⬆️
#single	`40.23% <0%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/parsers.py	`95.43% <100%> (ø)`	⬆️
pandas/core/indexes/datetimes.py	`95.33% <0%> (+0.09%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9e620bc...3418bde. Read the comment docs.

…t of names, if not throws an error

… use of usecols and names unclear so these tests are commented out

TomAugspurger · 2017-06-04T01:16:47Z

I pushed an update addressing the comments.

Fixed

…ches (pandas-dev#16460) * pandas-devgh-14671 Check if usecols with type string contains a subset of names, if not throws an error * tests added for pandas-devgh-14671, expected behavior of simultaneous use of usecols and names unclear so these tests are commented out * Review comments (cherry picked from commit 50a62c1)

bpraggastis · 2017-06-04T15:55:34Z

@TomAugspurger Thank you for closing this.

TomAugspurger · 2017-06-04T16:00:15Z

Thank you for the contribution @bpraggastis!

…ches (#16460) * gh-14671 Check if usecols with type string contains a subset of names, if not throws an error * tests added for gh-14671, expected behavior of simultaneous use of usecols and names unclear so these tests are commented out * Review comments (cherry picked from commit 50a62c1)

bpraggastis · 2017-06-04T17:51:23Z

@TomAugspurger Thank You for the assistance! I sent you an email with a couple of questions. Looking forward to hearing from you.

…ches (pandas-dev#16460) * pandas-devgh-14671 Check if usecols with type string contains a subset of names, if not throws an error * tests added for pandas-devgh-14671, expected behavior of simultaneous use of usecols and names unclear so these tests are commented out * Review comments

Version 0.20.2 * tag 'v0.20.2': (68 commits) RLS: v0.20.2 DOC: Update release.rst DOC: Whatsnew fixups (pandas-dev#16596) ERRR: Raise error in usecols when column doesn't exist but length matches (pandas-dev#16460) BUG: convert numpy strings in index names in HDF pandas-dev#13492 (pandas-dev#16444) PERF: vectorize _interp_limit (pandas-dev#16592) DOC: whatsnew 0.20.2 edits (pandas-dev#16587) API: Make is_strictly_monotonic_* private (pandas-dev#16576) BUG: reimplement MultiIndex.remove_unused_levels (pandas-dev#16565) Strictly monotonic (pandas-dev#16555) ENH: add .ngroup() method to groupby objects (pandas-dev#14026) (pandas-dev#14026) fix linting BUG: Incorrect handling of rolling.cov with offset window (pandas-dev#16244) BUG: select_as_multiple doesn't respect start/stop kwargs GH16209 (pandas-dev#16317) return empty MultiIndex for symmetrical difference on equal MultiIndexes (pandas-dev#16486) BUG: Bug in .resample() and .groupby() when aggregating on integers (pandas-dev#16549) BUG: Fixed tput output on windows (pandas-dev#16496) Strictly monotonic (pandas-dev#16555) BUG: fixed wrong order of ordered labels in pd.cut() BUG: Fixed to_html ignoring index_names parameter ...

TomAugspurger changed the title ~~Gh 14671~~ Raise error in usecols when column doesn't exist but length matches May 23, 2017

TomAugspurger added Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv labels May 23, 2017

TomAugspurger added this to the 0.20.2 milestone May 23, 2017

jreback reviewed May 24, 2017

View reviewed changes

gfyoung reviewed May 24, 2017

View reviewed changes

jreback requested changes May 30, 2017

View reviewed changes

jreback previously requested changes May 30, 2017

View reviewed changes

brendapraggastis and others added 3 commits June 3, 2017 20:15

pandas-devgh-14671 Check if usecols with type string contains a subse…

812f928

…t of names, if not throws an error

tests added for pandas-devgh-14671, expected behavior of simultaneous…

1968a70

… use of usecols and names unclear so these tests are commented out

Review comments

3418bde

TomAugspurger force-pushed the gh-14671 branch from f20b87b to 3418bde Compare June 4, 2017 01:16

TomAugspurger added the Needs Backport label Jun 4, 2017

TomAugspurger merged commit 50a62c1 into pandas-dev:master Jun 4, 2017

TomAugspurger removed the Needs Backport label Jun 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raise error in usecols when column doesn't exist but length matches #16460

Raise error in usecols when column doesn't exist but length matches #16460

bpraggastis commented May 23, 2017

TomAugspurger commented May 23, 2017

jreback May 24, 2017

gfyoung May 24, 2017

jreback May 30, 2017

jreback May 24, 2017

gfyoung May 24, 2017

jreback Jun 2, 2017

jreback commented May 24, 2017

gfyoung May 24, 2017 •

edited

Loading

jreback Jun 2, 2017

TomAugspurger commented May 30, 2017

jreback left a comment

jreback left a comment

gfyoung commented May 31, 2017

jreback commented Jun 2, 2017

bpraggastis commented Jun 2, 2017

jreback commented Jun 2, 2017

codecov bot commented Jun 3, 2017

codecov bot commented Jun 3, 2017 •

edited

Loading

TomAugspurger commented Jun 4, 2017

bpraggastis commented Jun 4, 2017

TomAugspurger commented Jun 4, 2017

bpraggastis commented Jun 4, 2017

Raise error in usecols when column doesn't exist but length matches #16460

Raise error in usecols when column doesn't exist but length matches #16460

Conversation

bpraggastis commented May 23, 2017

TomAugspurger commented May 23, 2017

jreback May 24, 2017

Choose a reason for hiding this comment

gfyoung May 24, 2017

Choose a reason for hiding this comment

jreback May 30, 2017

Choose a reason for hiding this comment

jreback May 24, 2017

Choose a reason for hiding this comment

gfyoung May 24, 2017

Choose a reason for hiding this comment

jreback Jun 2, 2017

Choose a reason for hiding this comment

jreback commented May 24, 2017

gfyoung May 24, 2017 • edited Loading

Choose a reason for hiding this comment

jreback Jun 2, 2017

Choose a reason for hiding this comment

TomAugspurger commented May 30, 2017

jreback left a comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

gfyoung commented May 31, 2017

jreback commented Jun 2, 2017

bpraggastis commented Jun 2, 2017

jreback commented Jun 2, 2017

codecov bot commented Jun 3, 2017

Codecov Report

codecov bot commented Jun 3, 2017 • edited Loading

Codecov Report

TomAugspurger commented Jun 4, 2017

bpraggastis commented Jun 4, 2017

TomAugspurger commented Jun 4, 2017

bpraggastis commented Jun 4, 2017

gfyoung May 24, 2017 •

edited

Loading

codecov bot commented Jun 3, 2017 •

edited

Loading