fix: handle null values in data #636

alespour · 2024-01-30T14:01:58Z

Closes #621

Proposed Changes

Handles data with missing values when querying to data frames. The query functions query_data_frame... have new optional parameter use_extension_dtypes.

def query_data_frame(self, query: str, org=None, data_frame_index: List[str] = None, params: dict = None,
                     use_extension_dtypes: bool = False):
    ...

def query_data_frame_stream(self, query: str, org=None, data_frame_index: List[str] = None, params: dict = None,
                     use_extension_dtypes: bool = False):
    ...

when set to True, missing values are represented as pandas.NA and dtype of columns containing <NA> is of corresponding nullable extension dtypes from pandas package (ie. Int64, Float64,Boolean etc). Missing value can be checked using pandas.isna() function.
when False (default), missing values are represented as None, and dtype of columns with missing values is either 'object' or 'float64' when type of values is numeric. This is a standard conversion behavior of data frames, see
- https://note.nkmk.me/en/python-pandas-nan-none-na/
- https://pandas.pydata.org/docs/user_guide/missing_data.html

Example output (with data from #621):

use_extension_dtypes=True

<bound method NDFrame.head of     result  table                    _start                     _stop                            _time _measurement  test_double  test_long
0  _result      0 2023-12-15 13:19:54+00:00 2023-12-15 13:19:57+00:00 2023-12-15 13:19:55.372000+00:00         test          4.0       <NA>
1  _result      0 2023-12-15 13:19:54+00:00 2023-12-15 13:19:57+00:00        2023-12-15 13:19:56+00:00         test         <NA>          1>

use_extension_dtypes=False

<bound method NDFrame.head of     result  table                    _start                     _stop                            _time _measurement  test_double  test_long
0  _result      0 2023-12-15 13:19:54+00:00 2023-12-15 13:19:57+00:00 2023-12-15 13:19:55.372000+00:00         test          4.0        NaN
1  _result      0 2023-12-15 13:19:54+00:00 2023-12-15 13:19:57+00:00        2023-12-15 13:19:56+00:00         test          NaN        1.0>

Note: the conversion of numeric values to extension dtypes works properly with pandas>=2.0, so in Python 3.7 environment, where the latest available pandas is 1.3.5, dtype of columns with NA values is 'object' ie. same as without the use extension types. For Python 3.8+, pandas 2.x is available.

Checklist

CHANGELOG.md updated
Rebased/mergeable
A test has been added if appropriate
pytest tests completes successfully
Commit messages are conventional

codecov-commenter · 2024-01-30T18:18:46Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (27777d1) 90.19% compared to head (17ab3b1) 90.40%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #636      +/-   ##
==========================================
+ Coverage   90.19%   90.40%   +0.21%     
==========================================
  Files          39       39              
  Lines        3467     3503      +36     
==========================================
+ Hits         3127     3167      +40     
+ Misses        340      336       -4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

bednar

@alespour thanks for PR 👍

Please, add the use_extension_dtypes: bool = False parameter also into async query API:

influxdb-client-python/influxdb_client/client/query_api_async.py

Line 160 in 27777d1

    
           async def query_data_frame_stream(self, query: str, org=None, data_frame_index: List[str] = None,

bednar

LGTM 🚀

alespour added 5 commits January 30, 2024 15:00

fix: handle null values in Flux data

0c99bfb

test: add tests for null value handling and extension dtypes

01ce750

test: fix failures with empty warnings cases

8030d84

test: comment out dtype some extra assertion until solved

5f7a1a9

test: skip extension dtypes test on pythn 3.7

074c013

alespour added 4 commits January 30, 2024 19:26

fix: single place of dtypes conversion

be6c74d

fix: bump pandas dependency version

32ef47e

docs: update CHANGELOG

2abd288

chore(build): trigger CI/CD pipeline

074ddca

alespour marked this pull request as ready for review January 30, 2024 19:34

bednar requested changes Jan 31, 2024

View reviewed changes

fix: add use_extension_dtypes also to async query API methods

17ab3b1

alespour requested a review from bednar January 31, 2024 12:11

bednar approved these changes Jan 31, 2024

View reviewed changes

bednar merged commit 7a5f655 into master Jan 31, 2024
14 checks passed

bednar deleted the fix/issue-621 branch January 31, 2024 13:46

bednar added this to the 1.41.0 milestone Jan 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle null values in data #636

fix: handle null values in data #636

alespour commented Jan 30, 2024 •

edited

Loading

codecov-commenter commented Jan 30, 2024 •

edited

Loading

bednar left a comment

bednar left a comment

fix: handle null values in data #636

fix: handle null values in data #636

Conversation

alespour commented Jan 30, 2024 • edited Loading

Proposed Changes

Checklist

codecov-commenter commented Jan 30, 2024 • edited Loading

Codecov Report

bednar left a comment

Choose a reason for hiding this comment

bednar left a comment

Choose a reason for hiding this comment

alespour commented Jan 30, 2024 •

edited

Loading

codecov-commenter commented Jan 30, 2024 •

edited

Loading