Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polars read_database does not really respect iter_batches = True when using sqlalchemy/oracledb #15470

Closed
2 tasks done
njesp opened this issue Apr 4, 2024 · 5 comments · Fixed by #15504
Closed
2 tasks done
Assignees
Labels
bug Something isn't working python Related to Python Polars

Comments

@njesp
Copy link

njesp commented Apr 4, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import oracledb
import polars as pl


sql = """
    select cast (rownum as number) id,
        cast (rownum as float) id2,
        cast ('textxxxxxx' as varchar2 (10 char)) v
    from (select rownum r
            from (select     rownum r
                        from dual
                    connect by rownum <= 1000) a,
                (select     rownum r
                        from dual
                    connect by rownum <= 1000) b,
                (select     rownum r
                        from dual
                    connect by rownum <= 1000) c
            where rownum <= 1000000000)
"""
oracledb.init_oracle_client()
con = oracledb.connect("@xxx")
d = pl.read_database(sql, con, iter_batches=True, batch_size=100)
cnt = 0
for _d in d:
    cnt += _d.height
    print(cnt)
con.close()

Log output

No response

Issue description

The example code in principle runs, but it only returns the iterator after fetching all 1 billion rows. This results in practice in memory error. The batch processing then does not solve the memory consumption problem.

oracledb==2.0.1

Expected behavior

The same pandas code runs without using any memory, fetching as described in chunksize.


import oracledb
import pandas as pd


sql = """
    select cast (rownum as number) id,
        cast (rownum as float) id2,
        cast ('textxxxxxx' as varchar2 (10 char)) v
    from (select rownum r
            from (select     rownum r
                        from dual
                    connect by rownum <= 1000) a,
                (select     rownum r
                        from dual
                    connect by rownum <= 1000) b,
                (select     rownum r
                        from dual
                    connect by rownum <= 1000) c
            where rownum <= 1000000000)
"""
oracledb.init_oracle_client()
con = oracledb.connect("@xxx")
d = pd.read_sql(sql, con, chunksize=100)
cnt = 0
for _d in d:
    cnt += len(_d)
    print(cnt)
con.close()

Installed versions

--------Version info---------
Polars:               0.20.18
Index type:           UInt32
Platform:             Windows-10-10.0.19044-SP0
Python:               3.11.8 | packaged by Anaconda, Inc. | (main, Feb 26 2024, 21:34:05) [MSC v.1916 64 bit (AMD64)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
nest_asyncio:         <not installed>
numpy:                1.26.4
openpyxl:             <not installed>
pandas:               2.2.1
pyarrow:              14.0.2
pydantic:             <not installed>
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           2.0.28
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@njesp njesp added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Apr 4, 2024
@alexander-beedie alexander-beedie self-assigned this Apr 6, 2024
@alexander-beedie alexander-beedie removed the needs triage Awaiting prioritization by a maintainer label Apr 6, 2024
@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Apr 6, 2024

Ouch; I've identified the issue - will make a PR to address it shortly 👌
Thanks for the test-case; very helpful.

@njesp
Copy link
Author

njesp commented Apr 7, 2024

That was fast. Thank you 👊

@alexander-beedie
Copy link
Collaborator

That was fast. Thank you 👊

No problem - fix is in the new 0.20.19 release; let me know if you still see any issues 😅

@njesp
Copy link
Author

njesp commented Apr 9, 2024

I cannot get it working. I get this:

Traceback (most recent call last):
  File "/home/njn/polars/poc.py", line 24, in <module>
    for _d in d:
  File "/home/njn/.conda/envs/polars_poc/lib/python3.11/site-packages/polars/io/database/_executor.py", line 260, in <genexpr>
    frames = (
             ^
  File "/home/njn/.conda/envs/polars_poc/lib/python3.11/site-packages/polars/io/database/_executor.py", line 175, in _fetchmany_rows
    rows = result.fetchmany(batch_size)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/njn/.conda/envs/polars_poc/lib/python3.11/site-packages/oracledb/cursor.py", line 793, in fetchmany
    self._verify_fetch()
  File "/home/njn/.conda/envs/polars_poc/lib/python3.11/site-packages/oracledb/cursor.py", line 136, in _verify_fetch
    self._verify_open()
  File "/home/njn/.conda/envs/polars_poc/lib/python3.11/site-packages/oracledb/cursor.py", line 146, in _verify_open
    errors._raise_err(errors.ERR_CURSOR_NOT_OPEN)
  File "/home/njn/.conda/envs/polars_poc/lib/python3.11/site-packages/oracledb/errors.py", line 181, in _raise_err
    raise error.exc_type(error) from cause
oracledb.exceptions.InterfaceError: DPY-1006: cursor is not open

It is the same on Windows 10 and Ubuntu Linux 2022.04.

My Python Anaconda environment is defined thus

name: polars_poc
dependencies:
  - python=3.11.8
  - pip=23.3.1
  - pip:
    - oracledb==2.1.1
    - polars==0.20.19

/Niels

@njesp
Copy link
Author

njesp commented Apr 12, 2024

@alexander-beedie did you see the above comment? It does not work for me.

Regards Niels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python Related to Python Polars
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants