-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: pd.DataFrame.from_records() raises a KeyError if passed a string index and an empty iterable #47319
BUG: pd.DataFrame.from_records() raises a KeyError if passed a string index and an empty iterable #47319
Conversation
Hello @Kelvingandhi! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2022-06-24 20:23:11 UTC |
cc @phofl (from issue) |
pandas/core/frame.py
Outdated
if isinstance(index, str) or not hasattr(index, "__iter__"): | ||
if index not in columns: | ||
result_index = [index] | ||
index = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This swallows KeyErrors when the iterable is populated but the column is not present.
data = [
{"col_1": 3, "col_2": "a"},
{"col_1": 2, "col_2": "b"},
{"col_1": 1, "col_2": "c"},
{"col_1": 0, "col_2": "d"},
]
result = pd.DataFrame.from_records(data, index="col_14")
print(result)
This should still raise, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that maybe is another bug, see #47285 (comment) where we get NaN
in the index for a missing key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On main, this raises a KeyError right now, which is fine I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e. if we do have data (not the empty list case) we should probably raise on missing keys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep I think so too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e. if we do have data (not the empty list case) we should probably raise on missing keys
Yeah agree with you.
Furthermore, when we have EMPTY list and we pass some list index then currently on Main, it returns that list index as output with empty list. We want same behavior for string index as well for EMPTY list instead of raising KeyError.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This swallows KeyErrors when the iterable is populated but the column is not present.
data = [ {"col_1": 3, "col_2": "a"}, {"col_1": 2, "col_2": "b"}, {"col_1": 1, "col_2": "c"}, {"col_1": 0, "col_2": "d"}, ] result = pd.DataFrame.from_records(data, index="col_14") print(result)
This should still raise, I think.
As @simonjayhawkins mentioned, and I also tested it, it generates ValueError: Length of values (4) does not match length of index (1) when we've iterable populated. And if it should raise KeyError then that is another issue.
Here, in my PR, I tried to address actual issue when we have empty list ([]) and we pass as string index, It gives KeyError. But Ideally in output we want Empty list along with passed string as index.
We want Exact same behaviour when we have EMPTY list and we pass some list index then currently on Main, it returns that list index as output with empty list.
Code change I put here, it will swallow KeyError only when we have EMPTY list passed with string index. Otherwise it won't get executed.
But please correct me if I misunderstood any of your concerns in discussion. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My code example should still raise the same error. We do not want to change this behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to fix the bug without breaking anything else.
Please also add my example as a test with the KeyError as expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohk. I apologize, earlier I misunderstood it. I fixed this issue and now it is behaving as expected (exactly what currently in main this raises a KeyError). I have added your example as a test as well. Thank you.
Hello @simonjayhawkins , Just wanted to follow up on this PR. I'm not sure, is it waiting for another approval or is it being checked or need more discussion. If you could provide your feedback, that would be appreciated. |
Did you address our comments? |
Please check my response above. |
…ror, with code formatting
@github-actions pre-commit |
@@ -491,3 +491,19 @@ def test_info_int_columns(): | |||
""" | |||
) | |||
assert result == expected | |||
|
|||
|
|||
def test_info_from_rec(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add the test we discussed where the behavior changed initially?
Also pre-commit is failing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After our discussion, I fixed that issue and also added that test in test_info.py (please check lines 502-509) plus in that test case, I made correction as well suggested by @simonjayhawkins (#47319 (comment)).
Furthermore, I already have initial test case included as well (lines 498-500).
I'm unsure what else is required.
On pre-commit failure, I'm beginner here and not sure what causing it to fail (when last time it passed for earlier commits). Any help to get this resolved would be appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, missed that. Could you split that into two tests? Since technically they are testing different things.
You can run pre-commit locally and check the log. This should show the errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I will do that. Thank you for the response.
All checks are successful except these 2 which are keep getting canceled after 90 min timeout. Not sure what action is needed now.
|
@phofl @simonjayhawkins Have you gotten a chance to look at this 2 checks getting cancelled issue. #47319 (comment) |
|
||
if len(arrays) == 0: | ||
if index is not None: | ||
if isinstance(index, str) or not hasattr(index, "__iter__"): | ||
if index not in columns: | ||
# error: Incompatible types in assignment | ||
# type "List[Any]"; expected "Optional[Index]" | ||
result_index = [index] # type: ignore[assignment] | ||
index = None | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use only one if
here to be consistent with other code.
if len(arrays) == 0: | |
if index is not None: | |
if isinstance(index, str) or not hasattr(index, "__iter__"): | |
if index not in columns: | |
# error: Incompatible types in assignment | |
# type "List[Any]"; expected "Optional[Index]" | |
result_index = [index] # type: ignore[assignment] | |
index = None | |
if ( | |
len(arrays) == 0 | |
and index is not None | |
and (isinstance(index, str) or not hasattr(index, "__iter__")) | |
and index not in columns | |
): | |
# error: Incompatible types in assignment | |
# type "List[Any]"; expected "Optional[Index]" | |
result_index = [index] # type: ignore[assignment] | |
index = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did check that if condition and earlier I had both conditions with one if but it was failing for other case where we pass this pd.DataFrame.from_records([]) (it was failing for if index is not None condition)
Basically we can’t add these conditions in one if to filter out specific edge case.
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen. |
pd.DataFrame.from_records()
raises a KeyError if passed a stringindex
and an empty iterable #47285(Replace xxxx with the Github issue number)doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.