-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raise error in usecols when column doesn't exist but length matches #16460
Conversation
There will be some warnings from our style checker:
you can get the warnings with Could you also add a release note to |
pandas/io/parsers.py
Outdated
@@ -1620,6 +1620,12 @@ def __init__(self, src, **kwds): | |||
|
|||
if self.usecols: | |||
usecols = _evaluate_usecols(self.usecols, self.orig_names) | |||
|
|||
#gh-14671 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gfyoung is there a reason some of these checks for usecols are outside _evaluate_usecols
?
(e.g. the one after this one could be inside if self.names
were passed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_evaluate_usecols
is purely for evaluation. I don't think it was meant to have validation like this inside its implementation as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k that's fine, maybe think about making this an actual _validate_usecols
(if it reduces code and is more clear).
pandas/tests/io/parser/usecols.py
Outdated
expected = DataFrame({'A': [1,5], 'B': [2,6], 'C': [3,7], 'D': [4,8]}) | ||
tm.assert_frame_equal(df, expected) | ||
|
||
# usecols = ['A','C'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, those failures are related to #16469. Should put a TODO there I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bpraggastis I think add a TODO here with the issue above.
can you add a whatsnew entry for 0.20.2 (IO section) |
pandas/tests/io/parser/usecols.py
Outdated
def test_raise_on_usecols_names_mismatch(self): | ||
## see gh-14671 | ||
data = 'a,b,c,d\n1,2,3,4\n5,6,7,8' | ||
msg = 'Usecols do not match names' ## from parsers.py CParserWrapper() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Condition the message on self.engine
i.e.:
msg = <first-message> if self.engine == 'c' else <second-message>
That way you don't need that massive regex (and can remove the re
import)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do this one as well
@bpraggastis do you have time to update this today or tomorrow? We're releasing 0.20.2 by the end of the week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you can address @gfyoung changes we can get this in .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add a whatsnew note as well (0.20.2) in IO for bug fixes.
@TomAugspurger @jreback : If @bpraggastis does not respond by tomorrow, just ping me. I can pull this branch down and finish it off. |
@gfyoung if you are interested |
@TomAugspurger @jreback Just now had a chance to check back in. Have you fixed all of the things you wanted changed or should I review your comments and respond? |
@bpraggastis couple of comments, pls do those |
Codecov Report
@@ Coverage Diff @@
## master #16460 +/- ##
==========================================
+ Coverage 90.42% 90.42% +<.01%
==========================================
Files 161 161
Lines 51024 51026 +2
==========================================
+ Hits 46139 46141 +2
Misses 4885 4885
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #16460 +/- ##
==========================================
+ Coverage 90.92% 90.92% +<.01%
==========================================
Files 161 161
Lines 49240 49242 +2
==========================================
+ Hits 44772 44775 +3
+ Misses 4468 4467 -1
Continue to review full report at Codecov.
|
…t of names, if not throws an error
… use of usecols and names unclear so these tests are commented out
I pushed an update addressing the comments. |
…ches (pandas-dev#16460) * pandas-devgh-14671 Check if usecols with type string contains a subset of names, if not throws an error * tests added for pandas-devgh-14671, expected behavior of simultaneous use of usecols and names unclear so these tests are commented out * Review comments (cherry picked from commit 50a62c1)
@TomAugspurger Thank you for closing this. |
Thank you for the contribution @bpraggastis! |
@TomAugspurger Thank You for the assistance! I sent you an email with a couple of questions. Looking forward to hearing from you. |
…ches (pandas-dev#16460) * pandas-devgh-14671 Check if usecols with type string contains a subset of names, if not throws an error * tests added for pandas-devgh-14671, expected behavior of simultaneous use of usecols and names unclear so these tests are commented out * Review comments
…ches (pandas-dev#16460) * pandas-devgh-14671 Check if usecols with type string contains a subset of names, if not throws an error * tests added for pandas-devgh-14671, expected behavior of simultaneous use of usecols and names unclear so these tests are commented out * Review comments
Version 0.20.2 * tag 'v0.20.2': (68 commits) RLS: v0.20.2 DOC: Update release.rst DOC: Whatsnew fixups (pandas-dev#16596) ERRR: Raise error in usecols when column doesn't exist but length matches (pandas-dev#16460) BUG: convert numpy strings in index names in HDF pandas-dev#13492 (pandas-dev#16444) PERF: vectorize _interp_limit (pandas-dev#16592) DOC: whatsnew 0.20.2 edits (pandas-dev#16587) API: Make is_strictly_monotonic_* private (pandas-dev#16576) BUG: reimplement MultiIndex.remove_unused_levels (pandas-dev#16565) Strictly monotonic (pandas-dev#16555) ENH: add .ngroup() method to groupby objects (pandas-dev#14026) (pandas-dev#14026) fix linting BUG: Incorrect handling of rolling.cov with offset window (pandas-dev#16244) BUG: select_as_multiple doesn't respect start/stop kwargs GH16209 (pandas-dev#16317) return empty MultiIndex for symmetrical difference on equal MultiIndexes (pandas-dev#16486) BUG: Bug in .resample() and .groupby() when aggregating on integers (pandas-dev#16549) BUG: Fixed tput output on windows (pandas-dev#16496) Strictly monotonic (pandas-dev#16555) BUG: fixed wrong order of ordered labels in pd.cut() BUG: Fixed to_html ignoring index_names parameter ...
git diff upstream/master --name-only -- '*.py' | flake8 --diff