Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: read_parquet generates a memory allocation error #729

Open
BlackArbsCEO opened this issue Oct 2, 2023 · 1 comment
Open

BUG: read_parquet generates a memory allocation error #729

BlackArbsCEO opened this issue Oct 2, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@BlackArbsCEO
Copy link

Describe the bug

A clear and concise description of what the bug is.

when I try to read a large parquet file using pd.read_parquet('my_large_file.pqt') it generates the below stack trace. I know it fits in memory because pandas can read it albeit slowly. The files are between 4.5 GB and 1.5 GB in size.

2023-10-02 14:15:39,732 xorbits._mars.deploy.oscar.local 25232 WARNING  Web service started at http://127.0.0.1:59977
  0%|          |   0.00/100 [00:00<?, ?it/s]2023-10-02 14:15:39,929 xorbits._mars.services.scheduling.worker.execution 25232 ERROR    Failed to run subtask eDnnloDuO0VyUkOf3tneQUCY on band numa-0
Traceback (most recent call last):
  File "C:\Users\kngka\miniconda3\envs\agd\lib\site-packages\xorbits\_mars\services\scheduling\worker\execution.py", line 494, in internal_run_subtask
    subtask_info.result = await self._retry_run_subtask(
  File "C:\Users\kngka\miniconda3\envs\agd\lib\site-packages\xorbits\_mars\services\scheduling\worker\execution.py", line 618, in _retry_run_subtask
    return await _retry_run(subtask, subtask_info, _run_subtask_once)
  File "C:\Users\kngka\miniconda3\envs\agd\lib\site-packages\xorbits\_mars\services\scheduling\worker\execution.py", line 192, in _retry_run
    raise ex
  File "C:\Users\kngka\miniconda3\envs\agd\lib\site-packages\xorbits\_mars\services\scheduling\worker\execution.py", line 154, in _retry_run
    return await target_async_func(*args)
  File "C:\Users\kngka\miniconda3\envs\agd\lib\site-packages\xorbits\_mars\services\scheduling\worker\execution.py", line 527, in _run_subtask_once
    await quota_ref.request_batch_quota(batch_quota_req)
  File "xoscar\\core.pyx", line 284, in __pyx_actor_method_wrapper
  File "xoscar\\core.pyx", line 287, in xoscar.core.__pyx_actor_method_wrapper
  File "C:\Users\kngka\miniconda3\envs\agd\lib\site-packages\xorbits\_mars\services\scheduling\worker\quota.py", line 119, in request_batch_quota
    raise ValueError(
ValueError: Cannot allocate quota size 19629902034.0 larger than total capacity 13668492902.
2023-10-02 14:15:39,932 xorbits._mars.services.scheduling.worker.execution 25232 ERROR    Failed to run subtask GeGwp96LjmNNdpVR6oS8x325 on band numa-0
Traceback (most recent call last):
  File "C:\Users\kngka\miniconda3\envs\agd\lib\site-packages\xorbits\_mars\services\scheduling\worker\execution.py", line 494, in internal_run_subtask
    subtask_info.result = await self._retry_run_subtask(
  File "C:\Users\kngka\miniconda3\envs\agd\lib\site-packages\xorbits\_mars\services\scheduling\worker\execution.py", line 618, in _retry_run_subtask
    return await _retry_run(subtask, subtask_info, _run_subtask_once)
  File "C:\Users\kngka\miniconda3\envs\agd\lib\site-packages\xorbits\_mars\services\scheduling\worker\execution.py", line 192, in _retry_run
    raise ex
  File "C:\Users\kngka\miniconda3\envs\agd\lib\site-packages\xorbits\_mars\services\scheduling\worker\execution.py", line 154, in _retry_run
    return await target_async_func(*args)
  File "C:\Users\kngka\miniconda3\envs\agd\lib\site-packages\xorbits\_mars\services\scheduling\worker\execution.py", line 527, in _run_subtask_once
    await quota_ref.request_batch_quota(batch_quota_req)
  File "xoscar\\core.pyx", line 284, in __pyx_actor_method_wrapper
  File "xoscar\\core.pyx", line 287, in xoscar.core.__pyx_actor_method_wrapper
  File "C:\Users\kngka\miniconda3\envs\agd\lib\site-packages\xorbits\_mars\services\scheduling\worker\quota.py", line 119, in request_batch_quota
    raise ValueError(
ValueError: Cannot allocate quota size 26955130596.0 larger than total capacity 13668492902.

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Your Python version
    python 3.10
  2. The version of Xorbits you use
    xorbits 0.6.3 pypi_0 pypi
  3. Versions of crucial packages, such as numpy, scipy and pandas
  4. Full stack of the error.
  5. Minimized code to reproduce the error.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.

@XprobeBot XprobeBot added the bug Something isn't working label Oct 2, 2023
@XprobeBot XprobeBot added this to the v0.7.0 milestone Oct 2, 2023
@aresnow1
Copy link
Contributor

aresnow1 commented Oct 7, 2023

Thanks for your reporting, could you provide your schema of parquet file?

@XprobeBot XprobeBot modified the milestones: v0.7.0, v0.7.1 Oct 23, 2023
@XprobeBot XprobeBot modified the milestones: v0.7.1, v0.7.2 Nov 21, 2023
@XprobeBot XprobeBot modified the milestones: v0.7.2, v0.7.3 Jan 5, 2024
@XprobeBot XprobeBot modified the milestones: v0.7.3, v0.7.4 Aug 22, 2024
@luweizheng luweizheng removed this from the v0.7.4 milestone Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants