Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise error in usecols when column doesn't exist but length matches #16460

Merged
merged 3 commits into from
Jun 4, 2017

Conversation

bpraggastis
Copy link
Contributor

@TomAugspurger TomAugspurger changed the title Gh 14671 Raise error in usecols when column doesn't exist but length matches May 23, 2017
@TomAugspurger TomAugspurger added Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv labels May 23, 2017
@TomAugspurger TomAugspurger added this to the 0.20.2 milestone May 23, 2017
@TomAugspurger
Copy link
Contributor

There will be some warnings from our style checker:

pandas/tests/io/parser/usecols.py:481:9: E266 too many leading '#' for block comment
pandas/tests/io/parser/usecols.py:483:45: E262 inline comment should start with '# '
pandas/tests/io/parser/usecols.py:484:32: E261 at least two spaces before inline comment
pandas/tests/io/parser/usecols.py:484:33: E262 inline comment should start with '# '
pandas/tests/io/parser/usecols.py:486:23: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:486:27: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:486:31: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:488:38: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:488:50: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:488:62: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:488:74: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:491:23: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:491:27: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:491:31: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:495:23: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:495:27: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:496:80: E501 line too long (91 > 79 characters)
pandas/tests/io/parser/usecols.py:502:38: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:502:50: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:502:62: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:502:74: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:506:80: E501 line too long (84 > 79 characters)
pandas/tests/io/parser/usecols.py:511:80: E501 line too long (84 > 79 characters)
pandas/tests/io/parser/usecols.py:516:9: E303 too many blank lines (2)
pandas/tests/io/parser/usecols.py:516:23: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:516:27: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:516:31: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:517:80: E501 line too long (91 > 79 characters)
pandas/tests/io/parser/usecols.py:518:80: E501 line too long (81 > 79 characters)
pandas/tests/io/parser/usecols.py:519:23: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:519:27: E231 missing whitespace after ','
pandas/tests/io/parser/usecols.py:520:80: E501 line too long (91 > 79 characters)

you can get the warnings with flake8 pandas/tests/io/parser/usecols.py (may have to pip install flake8)

Could you also add a release note to doc/source/whatsnew/v0.20.2.txt under the bug fixes seciton?

@@ -1620,6 +1620,12 @@ def __init__(self, src, **kwds):

if self.usecols:
usecols = _evaluate_usecols(self.usecols, self.orig_names)

#gh-14671
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gfyoung is there a reason some of these checks for usecols are outside _evaluate_usecols?

(e.g. the one after this one could be inside if self.names were passed)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_evaluate_usecols is purely for evaluation. I don't think it was meant to have validation like this inside its implementation as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k that's fine, maybe think about making this an actual _validate_usecols (if it reduces code and is more clear).

expected = DataFrame({'A': [1,5], 'B': [2,6], 'C': [3,7], 'D': [4,8]})
tm.assert_frame_equal(df, expected)

# usecols = ['A','C']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented out?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, those failures are related to #16469. Should put a TODO there I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bpraggastis I think add a TODO here with the issue above.

@jreback
Copy link
Contributor

jreback commented May 24, 2017

can you add a whatsnew entry for 0.20.2 (IO section)

def test_raise_on_usecols_names_mismatch(self):
## see gh-14671
data = 'a,b,c,d\n1,2,3,4\n5,6,7,8'
msg = 'Usecols do not match names' ## from parsers.py CParserWrapper()
Copy link
Member

@gfyoung gfyoung May 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Condition the message on self.engine i.e.:

msg = <first-message> if self.engine == 'c' else <second-message>

That way you don't need that massive regex (and can remove the re import)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do this one as well

@TomAugspurger
Copy link
Contributor

@bpraggastis do you have time to update this today or tomorrow? We're releasing 0.20.2 by the end of the week.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you can address @gfyoung changes we can get this in .

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add a whatsnew note as well (0.20.2) in IO for bug fixes.

@gfyoung
Copy link
Member

gfyoung commented May 31, 2017

@TomAugspurger @jreback : If @bpraggastis does not respond by tomorrow, just ping me. I can pull this branch down and finish it off.

@jreback
Copy link
Contributor

jreback commented Jun 2, 2017

@gfyoung if you are interested

@bpraggastis
Copy link
Contributor Author

@TomAugspurger @jreback Just now had a chance to check back in. Have you fixed all of the things you wanted changed or should I review your comments and respond?

@jreback
Copy link
Contributor

jreback commented Jun 2, 2017

@bpraggastis couple of comments, pls do those
add a whatsnew entry for 0.20.2
needs to pass CI

@codecov
Copy link

codecov bot commented Jun 3, 2017

Codecov Report

Merging #16460 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16460      +/-   ##
==========================================
+ Coverage   90.42%   90.42%   +<.01%     
==========================================
  Files         161      161              
  Lines       51024    51026       +2     
==========================================
+ Hits        46139    46141       +2     
  Misses       4885     4885
Flag Coverage Δ
#multiple 88.26% <100%> (ø) ⬆️
#single 40.17% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/io/parsers.py 95.32% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 92372c7...972d72b. Read the comment docs.

@codecov
Copy link

codecov bot commented Jun 3, 2017

Codecov Report

Merging #16460 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16460      +/-   ##
==========================================
+ Coverage   90.92%   90.92%   +<.01%     
==========================================
  Files         161      161              
  Lines       49240    49242       +2     
==========================================
+ Hits        44772    44775       +3     
+ Misses       4468     4467       -1
Flag Coverage Δ
#multiple 88.68% <100%> (ø) ⬆️
#single 40.23% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/io/parsers.py 95.43% <100%> (ø) ⬆️
pandas/core/indexes/datetimes.py 95.33% <0%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9e620bc...3418bde. Read the comment docs.

brendapraggastis and others added 3 commits June 3, 2017 20:15
@TomAugspurger
Copy link
Contributor

I pushed an update addressing the comments.

@TomAugspurger TomAugspurger merged commit 50a62c1 into pandas-dev:master Jun 4, 2017
TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Jun 4, 2017
…ches (pandas-dev#16460)

* pandas-devgh-14671 Check if usecols with type string contains a subset of names, if not throws an error

* tests added for pandas-devgh-14671, expected behavior of simultaneous use of usecols and names unclear so these tests are commented out

* Review comments

(cherry picked from commit 50a62c1)
@bpraggastis
Copy link
Contributor Author

@TomAugspurger Thank you for closing this.

@TomAugspurger
Copy link
Contributor

Thank you for the contribution @bpraggastis!

TomAugspurger pushed a commit that referenced this pull request Jun 4, 2017
…ches (#16460)

* gh-14671 Check if usecols with type string contains a subset of names, if not throws an error

* tests added for gh-14671, expected behavior of simultaneous use of usecols and names unclear so these tests are commented out

* Review comments

(cherry picked from commit 50a62c1)
@bpraggastis
Copy link
Contributor Author

@TomAugspurger Thank You for the assistance! I sent you an email with a couple of questions. Looking forward to hearing from you.

Kiv pushed a commit to Kiv/pandas that referenced this pull request Jun 11, 2017
…ches (pandas-dev#16460)

* pandas-devgh-14671 Check if usecols with type string contains a subset of names, if not throws an error

* tests added for pandas-devgh-14671, expected behavior of simultaneous use of usecols and names unclear so these tests are commented out

* Review comments
stangirala pushed a commit to stangirala/pandas that referenced this pull request Jun 11, 2017
…ches (pandas-dev#16460)

* pandas-devgh-14671 Check if usecols with type string contains a subset of names, if not throws an error

* tests added for pandas-devgh-14671, expected behavior of simultaneous use of usecols and names unclear so these tests are commented out

* Review comments
yarikoptic added a commit to neurodebian/pandas that referenced this pull request Jul 12, 2017
Version 0.20.2

* tag 'v0.20.2': (68 commits)
  RLS: v0.20.2
  DOC: Update release.rst
  DOC: Whatsnew fixups (pandas-dev#16596)
  ERRR: Raise error in usecols when column doesn't exist but length matches (pandas-dev#16460)
  BUG: convert numpy strings in index names in HDF pandas-dev#13492 (pandas-dev#16444)
  PERF: vectorize _interp_limit (pandas-dev#16592)
  DOC: whatsnew 0.20.2 edits (pandas-dev#16587)
  API: Make is_strictly_monotonic_* private (pandas-dev#16576)
  BUG: reimplement MultiIndex.remove_unused_levels (pandas-dev#16565)
  Strictly monotonic (pandas-dev#16555)
  ENH: add .ngroup() method to groupby objects (pandas-dev#14026) (pandas-dev#14026)
  fix linting
  BUG: Incorrect handling of rolling.cov with offset window (pandas-dev#16244)
  BUG: select_as_multiple doesn't respect start/stop kwargs GH16209 (pandas-dev#16317)
  return empty MultiIndex for symmetrical difference on equal MultiIndexes (pandas-dev#16486)
  BUG: Bug in .resample() and .groupby() when aggregating on integers (pandas-dev#16549)
  BUG: Fixed tput output on windows (pandas-dev#16496)
  Strictly monotonic (pandas-dev#16555)
  BUG: fixed wrong order of ordered labels in pd.cut()
  BUG: Fixed to_html ignoring index_names parameter
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ERR: usecols fails to raise error if column doesn't exist but is the same length as headers
5 participants