Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Dataset from Parquet #2247

Closed

Conversation

albertvillanova
Copy link
Member

Implement instantiation of Dataset from Parquet file.

@albertvillanova albertvillanova changed the title Dataset from Parquet Implement Dataset from Parquet Apr 22, 2021
@albertvillanova albertvillanova added the enhancement New feature or request label Apr 22, 2021
@albertvillanova albertvillanova added this to the 1.7 milestone Apr 22, 2021
Returns:
:class:`Dataset`
"""
from .arrow_reader import ParquetReader
Copy link
Member

@lhoestq lhoestq Apr 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for diving into this !

The ParquetReader unfortunately always bring data in memory. The memory-map option is here only for reading performance. Under the hood the data from parquet are compressed so you can't just memory map the file since there's no on-the-fly decoding afaik.

I think the best option would be to implement a ArrowBasedBuilder as we did for csv/json/etc.

@albertvillanova albertvillanova modified the milestones: 1.7, 1.8 May 31, 2021
@albertvillanova albertvillanova modified the milestones: 1.8, 1.9 Jun 8, 2021
@lhoestq
Copy link
Member

lhoestq commented Jun 22, 2021

Hi @albertvillanova , I'll implement the parquet builder as an ArrowBasedBuilder if you don't mind

@lhoestq
Copy link
Member

lhoestq commented Jul 26, 2021

closing in favor of #2537 that is already merged

@lhoestq lhoestq closed this Jul 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants