Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure correct boolean dtype in misc table index #431

Merged
merged 8 commits into from
Jun 11, 2024

Conversation

hagenw
Copy link
Member

@hagenw hagenw commented May 30, 2024

Closes #430

This pull request adds extra tests to cover #430 and fixes it by ensuring bool dtype is always converted to "boolean" dtype.
The fix is done by incorporating the needed conversion into audformat.core.utils._maybe_convert_int_dtype() and renaming the function to audformat.core.utils._maybe_convert_pandas_dtype()

The second example from #430 now produces the desired result:

>>> import audformat
>>> import pandas as pd
>>> index = pd.Index([True, False], name="bool")
>>> index.dtype
dtype('bool')
>>> table = audformat.MiscTable(index)
>>> table.index
Index([True, False], dtype='boolean', name='bool')

Besides the actual changes, I also added or expanded docstrings.
Note: I also added tests for columns of a normal table, even though the issue was not occurring there before.

Copy link

codecov bot commented May 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.0%. Comparing base (1876c84) to head (86ade67).

Additional details and impacted files
Files Coverage Δ
audformat/core/common.py 100.0% <100.0%> (ø)
audformat/core/table.py 100.0% <100.0%> (ø)
audformat/core/utils.py 100.0% <100.0%> (ø)

@hagenw hagenw requested a review from ChristianGeng May 30, 2024 13:14
Copy link
Member

@ChristianGeng ChristianGeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

The MR improves typing consistency using nullable dtypes not only for int, but consistently now also for bool.

I have run the tests, they all work fine for me. I will approve this, but I have a more general questions, that we had in different form about schemes in audb.Dependencies quite recently:

Often the code currently uses if clauses. e.g. in common.to_pandas_dtype or class attributes as in define.DataType to handle typing information. From what I see more often in other environments (like e.g. sqlalchemy) is that they use dict like structures to do lookup of typing info.

While this post shows that the disassembly would favor lookup for performance reaseons, this would worry me not so much. However a dict-like implementation appears to be more idiomatic and more readable to me. In addition, lookup apparoaches allow for easier testing.

A minor aspect for me is that there are several implementations with of test_to_pandas_dtype. They seem all to be using the underlying audformat.core.common.to_pandas_dtype but this has confused me a little.

However these are things that should take too much space here. I will therefore approve without further ado. I can confirm that all the tests pass.

@hagenw
Copy link
Member Author

hagenw commented Jun 4, 2024

Often the code currently uses if clauses. e.g. in common.to_pandas_dtype or class attributes as in define.DataType to handle typing information. From what I see more often in other environments (like e.g. sqlalchemy) is that they use dict like structures to do lookup of typing info.

I created audeering/audb#420 as a suggestion how to improve the storage of definitions in audb using a dictionary. Maybe you can have a look at it. After we have solved it there we could also update audformat in the same way, but I would do it independent of this pull request.

@hagenw
Copy link
Member Author

hagenw commented Jun 4, 2024

A minor aspect for me is that there are several implementations with of test_to_pandas_dtype. They seem all to be using the underlying audformat.core.common.to_pandas_dtype but this has confused me a little.

Are you confused, because they are only testing audformat.core.common.to_pandas_dtype() and not audformat.core.utils._maybe_convert_pandas_dtype(), or are you refering to something else here?

@hagenw hagenw merged commit c813739 into main Jun 11, 2024
10 checks passed
@hagenw hagenw deleted the fix-misc-table-bool-dtype branch June 11, 2024 07:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bool index in misc table might have wrong dtype
2 participants