-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: handle null values in data #636
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #636 +/- ##
==========================================
+ Coverage 90.19% 90.40% +0.21%
==========================================
Files 39 39
Lines 3467 3503 +36
==========================================
+ Hits 3127 3167 +40
+ Misses 340 336 -4 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alespour thanks for PR 👍
Please, add the use_extension_dtypes: bool = False
parameter also into async query API:
async def query_data_frame_stream(self, query: str, org=None, data_frame_index: List[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
Closes #621
Proposed Changes
Handles data with missing values when querying to data frames. The query functions
query_data_frame...
have new optional parameteruse_extension_dtypes
.True
, missing values are represented aspandas.NA
and dtype of columns containing<NA>
is of corresponding nullable extension dtypes frompandas
package (ie.Int64
,Float64
,Boolean
etc). Missing value can be checked usingpandas.isna()
function.False
(default), missing values are represented asNone
, and dtype of columns with missing values is either'object'
or'float64'
when type of values is numeric. This is a standard conversion behavior of data frames, seeExample output (with data from #621):
use_extension_dtypes=True
use_extension_dtypes=False
Note: the conversion of numeric values to extension dtypes works properly with
pandas>=2.0
, so in Python 3.7 environment, where the latest available pandas is 1.3.5, dtype of columns with NA values is'object'
ie. same as without the use extension types. For Python 3.8+,pandas
2.x is available.Checklist
pytest tests
completes successfully