Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MySQL source parsing NULL value error #111

Closed
ztsweet opened this issue Aug 16, 2021 · 13 comments
Closed

MySQL source parsing NULL value error #111

ztsweet opened this issue Aug 16, 2021 · 13 comments
Milestone

Comments

@ztsweet
Copy link

ztsweet commented Aug 16, 2021

import polars as pl
import pandas as pd
from sqlalchemy import create_engine
import pyarrow
# 
print(pl.__version__)
# 0.8.20
print(pd.__version__)
# 1.3.0
pip list | grep connector*
# connectorx                    0.2.0
pyarrow.__version__
# '4.0.1'

# pandas first
sql = '''select ORDER_ID from tables     '''
engine = create_engine('mysql+pymysql://root:***@*.*.*.*:*')
df = pd.read_sql_query(sql, engine)
df.dtypes
# ORDER_ID                      int64

# polars second
conn = "mysql://root:***@*.*.*.*:*"
pdf = pl.read_sql(sql, conn)
`
---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
<timed exec> in <module>

~/miniconda3/envs/test/lib/python3.8/site-packages/polars/io.py in read_sql(sql, connection_uri, partition_on, partition_range, partition_num)
    556     """
    557     if _WITH_CX:
--> 558         tbl = cx.read_sql(
    559             conn=connection_uri,
    560             query=sql,

~/miniconda3/envs/test/lib/python3.8/site-packages/connectorx/__init__.py in read_sql(conn, query, return_type, protocol, partition_on, partition_range, partition_num)
    126             raise ValueError("You need to install pyarrow first")
    127 
--> 128         result = _read_sql(
    129             conn,
    130             "arrow",

PanicException: Could not retrieve i64 from Value

`
@wangxiaoying
Copy link
Contributor

wangxiaoying commented Aug 16, 2021

Hi @ztsweet, in order to reproduce the error, can you let us know the database source and dataframe destination that you are using? And also what data types are included in your query result?

@ztsweet
Copy link
Author

ztsweet commented Aug 16, 2021

@wangxiaoying I have edited the question again

@wangxiaoying
Copy link
Contributor

@ztsweet Thanks for the details. Can you also check what is the data type of ORDER_ID in MySQL?

@ztsweet
Copy link
Author

ztsweet commented Aug 16, 2021

@wangxiaoying in mysql the type is ``ORDER_ID bigint(19) DEFAULT NULL

@wangxiaoying
Copy link
Contributor

wangxiaoying commented Aug 16, 2021

@ztsweet It seems like we have a bug when parsing NULL values in MySQL source. Will fix this issue it in our next release (0.2.1).

@wangxiaoying wangxiaoying changed the title PanicException: Could not retrieve i64 from Value MySQL source parsing NULL value error Aug 16, 2021
@wangxiaoying wangxiaoying added this to the 0.2.1 milestone Aug 16, 2021
@ztsweet
Copy link
Author

ztsweet commented Aug 16, 2021

@wangxiaoying thanks

@wangxiaoying
Copy link
Contributor

Hi @ztsweet, we just released the connectorx-0.2.1a1 which has fixed the NULL value bug in mysql. You can try it by using pip install -U connectorx==0.2.1a1.

If all tests go well, we plan to release the 0.2.1 version in early October. Please feel free to have a try and let us know if you found any issue.

@hamzamohdzubair
Copy link

hamzamohdzubair commented Dec 10, 2021

Are you using Python or Rust?

Python.

What version of polars are you using?

polars==0.10.27
connectorx==0.2.2

What operating system are you using polars on?

CentOS Linux release 7.9.2009 (Core)

Describe your bug.

read_sql returns following exceptions if any column contains missing values, i.e. None

Note: No error occurs if all rows have values

Errors

import polars as pl
pl.read_sql("select column1 from Table", <url_to_database>)

Replicated the issue using cx.read_sql. Same errors

If column1 TYPE = MEDIUMTEXT

RuntimeError: Invalid argument error: column types must match schema types, expected Time64(Nanosecond) but found LargeBinary at column index 0
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-67-2867bd04669a> in <module>
----> 1 df_pl = pl.read_sql("""
      2 select Replaced_Query_Bill
      3 from Event_Master limit 10""", cart)#.to_pandas()
      4 # Follows_Waterfall, Condition, Event_Limit, Slug, Conversation, Campaign_Key
      5 # Arms, Is_MAB_Driven, Is_Good_Time, Is_On_Demand

~/ipynbs/.venv/lib/python3.9/site-packages/polars/io.py in read_sql(sql, connection_uri, partition_on, partition_range, partition_num)
    802     """
    803     if _WITH_CX:
--> 804         tbl = cx.read_sql(
    805             conn=connection_uri,
    806             query=sql,

~/ipynbs/.venv/lib/python3.9/site-packages/connectorx/__init__.py in read_sql(conn, query, return_type, protocol, partition_on, partition_range, partition_num, index_col)
    133             raise ValueError("You need to install pyarrow first")
    134 
--> 135         result = _read_sql(
    136             conn,
    137             "arrow",

RuntimeError: Invalid argument error: column types must match schema types, expected Time64(Nanosecond) but found LargeBinary at column index 0

If column1 TYPE = DATETIME

PanicException: Could not retrieve chrono::naive::datetime::NaiveDateTime from Value
---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
<ipython-input-101-17df85ddf35d> in <module>
----> 1 cart.read_sql('select * from Event_Master limit 10')

<ipython-input-97-7fb8f6c351ab> in read_sql(self, query)
     25     def read_sql(self, query):
     26         try:
---> 27             result = pl.read_sql(query, self.__url)
     28         except Exception as e:
     29             print('Polar failed trying pandas')

~/ipynbs/.venv/lib/python3.9/site-packages/polars/io.py in read_sql(sql, connection_uri, partition_on, partition_range, partition_num)
    802     """
    803     if _WITH_CX:
--> 804         tbl = cx.read_sql(
    805             conn=connection_uri,
    806             query=sql,

~/ipynbs/.venv/lib/python3.9/site-packages/connectorx/__init__.py in read_sql(conn, query, return_type, protocol, partition_on, partition_range, partition_num, index_col)
    133             raise ValueError("You need to install pyarrow first")
    134 
--> 135         result = _read_sql(
    136             conn,
    137             "arrow",

PanicException: Could not retrieve chrono::naive::datetime::NaiveDateTime from Value

What is the expected behavior?

Read empty values in dataframe just like pandas

@wangxiaoying
Copy link
Contributor

Hi @hamzamohdzubair thanks for reporting the issue. It looks like the null data does not match with the type we fetched. May I ask which database you are using? Also what is the type for column1 defined in your database?

@hamzamohdzubair
Copy link

hamzamohdzubair commented Dec 10, 2021

@wangxiaoying updated my previous comment
Database = mysql Ver 15.1 Distrib 10.5.9-MariaDB
The first error comes when TYPE = MEDIUMTEXT
second error comes when TYPE = DATETIME

@hamzamohdzubair
Copy link

@wangxiaoying Also, why am i unable to catch the PanicException in a try except clause. I am trying to do this:
read a dataframe using polars, if it fails read it using pandas, but i am unable to catch the PanicException

@wangxiaoying
Copy link
Contributor

wangxiaoying commented Dec 11, 2021

@hamzamohdzubair , thanks for the info. It is weird since we have the mysql test case when DATETIME is null. I also test on MEDIUMTEXT and it also passes. We use the official docker image of mysql in our environment. We haven't test on mariadb yet, I will try to set up and test on it to see whether I can reproduce the error.

As for the PanicException, I think the panic happens in the underlying mysql driver. I will also take a look at it if I can reproduce this error.

@wangxiaoying
Copy link
Contributor

wangxiaoying commented Dec 12, 2021

Hi @hamzamohdzubair, thanks for waiting.

For the first error on MEDIUMTEXT, turns out it is a bug in the type mapping in arrow. It has been fixed here: 46bf99a. We will release an alpha version to this bug fix. Will let you know once it's done.

For the second error onDATETIME, I cannot reproduce the error using mysql / mariadb docker image when output pandas / arrow. Is it possible if you could provide an example SQL script for constructing the table (create table and insert values to the table) that can cause the error?

UPDATE: 0.2.3-alpha.3 is released! Please feel free to have a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants