Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate Parquet/Avro usage in Enso #8772

Open
JaroslavTulach opened this issue Jan 16, 2024 · 15 comments
Open

Investigate Parquet/Avro usage in Enso #8772

JaroslavTulach opened this issue Jan 16, 2024 · 15 comments
Assignees
Labels
-compiler -libs Libraries: New libraries to be implemented l-apache-arrow InMemory Table move to Apache Arrow x-on-hold

Comments

@JaroslavTulach
Copy link
Member

Build up on the Arrow work and investigate how to read/generate a Parque file into a Table.

@JaroslavTulach JaroslavTulach converted this from a draft issue Jan 16, 2024
@JaroslavTulach JaroslavTulach added -compiler -libs Libraries: New libraries to be implemented l-apache-arrow InMemory Table move to Apache Arrow labels Jan 16, 2024
@jdunkerley jdunkerley changed the title Investigate Parque usages in Enso Investigate Parquet/Avro usage in Enso Feb 6, 2024
@hubertp hubertp moved this from 📤 Backlog to 🔧 Implementation in Issues Board Feb 21, 2024
@enso-bot
Copy link

enso-bot bot commented Feb 22, 2024

Hubert Plociniczak reports a new STANDUP for yesterday (2024-02-21):

Progress: Resurecting work on Arrow. Filed follow up ticket for things missed in the first iteration. Improving packaging of Arrow. Still addressing some failures in semantic versioning (#8692, needed to support more of the spec). It should be finished by 2024-02-29.

Next Day: Next day I will be working on the #8772 task. Continue Arrow work

@enso-bot
Copy link

enso-bot bot commented Feb 23, 2024

Hubert Plociniczak reports a new STANDUP for yesterday (2024-02-22):

Progress: Addressing follow up issues blocking the ticket. PR is up. Looking into Parquet. Meeting on local project manager work. It should be finished by 2024-02-29.

Next Day: Next day I will be working on the #8772 task. Continue Arrow work

@enso-bot
Copy link

enso-bot bot commented Feb 26, 2024

Hubert Plociniczak reports a new STANDUP for the provided date (2024-02-23):

Progress: Adding more tests to the PR demonstrating usage (gave up on extending the grammar). Investigating Parquet and Arrow integration. It should be finished by 2024-02-29.

Next Day: Next day I will be working on the #8772 task. Continue Arrow/Parquet investigation

@enso-bot
Copy link

enso-bot bot commented Feb 27, 2024

Hubert Plociniczak reports a new STANDUP for yesterday (2024-02-26):

Progress: Looking into examples of reading/writing Parquet to/from Arrow. It should be finished by 2024-02-29.

Next Day: Next day I will be working on the #8772 task. Continue the investigation

@enso-bot
Copy link

enso-bot bot commented Feb 28, 2024

Hubert Plociniczak reports a new STANDUP for yesterday (2024-02-27):

Progress: Investigating padding of data/bitmap buffers that was missing in the first version. It should be finished by 2024-02-29.

Next Day: Next day I will be working on the #8772 task. Continue the investigation

@enso-bot
Copy link

enso-bot bot commented Feb 29, 2024

Hubert Plociniczak reports a new STANDUP for yesterday (2024-02-28):

Progress: Lots of meetings, adding padding as required by Arrow specification. Will need to improve arrow api to allow for mutable builder. It should be finished by 2024-02-29.

Next Day: Next day I will be working on the #8772 task. Continue the investigation

@enso-bot
Copy link

enso-bot bot commented Mar 1, 2024

Hubert Plociniczak reports a new STANDUP for yesterday (2024-02-29):

Progress: Mostly OOO, added the padding logic. It should be finished by 2024-02-29.

Next Day: Next day I will be working on the #8772 task. Continue the investigation

@enso-bot
Copy link

enso-bot bot commented Mar 4, 2024

Hubert Plociniczak reports a new 🔴 DELAY for the provided date (2024-03-01):

Summary: There is 8 days delay in implementation of the Investigate Parquet/Avro usage in Enso (#8772) task.
It will cause 0 days delay for the delivery of this weekly plan.

Delay Cause: Addressing missing implementation details before I can add Parquet/Arrow integration.

@enso-bot
Copy link

enso-bot bot commented Mar 4, 2024

Hubert Plociniczak reports a new STANDUP for the provided date (2024-03-01):

Progress: Added Arrow builder to demonstrate how to build immutable Vectors from mutable Arrow Arrays. Investigating memory allocation restrictions from the specification. It should be finished by 2024-03-08.

Next Day: Next day I will be working on the #8772 task. Continue the investigation

@enso-bot
Copy link

enso-bot bot commented Mar 5, 2024

Hubert Plociniczak reports a new STANDUP for yesterday (2024-03-04):

Progress: Made sure the allocation of memory follow Arrow specific requirements. Looking how to fit local implementation into Arrow's serde capabilities. Lots of meetings. It should be finished by 2024-03-08.

Next Day: Next day I will be working on the #8772 task. Continue the investigation

@enso-bot
Copy link

enso-bot bot commented Mar 6, 2024

Hubert Plociniczak reports a new STANDUP for yesterday (2024-03-05):

Progress: Addressing PR review, working towards Parquet serde. Looking into widgets/visualization slowdown. It should be finished by 2024-03-08.

Next Day: Next day I will be working on the #8772 task. Continue the investigation

@enso-bot
Copy link

enso-bot bot commented Mar 7, 2024

Hubert Plociniczak reports a new STANDUP for yesterday (2024-03-06):

Progress: More work on PR review, started looking into performance problems of #9278. Investigated problems with MacOS CI build which in the end turned out to be configuration problem. It should be finished by 2024-03-08.

Next Day: Next day I will be working on the #8772 task. Address comments, tackle #9278.

@enso-bot
Copy link

enso-bot bot commented Mar 8, 2024

Hubert Plociniczak reports a new STANDUP for yesterday (2024-03-07):

Progress: Addressing review comments - pushing builder pattern directly to Arrow implementation. Discussing problems with imports and FQN (#9329). It should be finished by 2024-03-08.

Next Day: Next day I will be working on the #9278 task. Address any remaining problems with PR and pick up next ticket.

@enso-bot
Copy link

enso-bot bot commented Mar 11, 2024

Hubert Plociniczak reports a new STANDUP for the provided date (2024-03-08):

Progress: Addressed remaining problems. PR merged. Started investigating #9278. Various PR reviews. It should be finished by 2024-03-08.

Next Day: Next day I will be working on the #9278 task. Investigate next ticket.

@hubertp hubertp moved this from 🔧 Implementation to ⚙️ Design in Issues Board Mar 19, 2024
@JaroslavTulach
Copy link
Member Author

JaroslavTulach commented Feb 11, 2025

Using pyarrow to Load Parquet Files

GraalPy supports pyarrow. It is not completely smooth, it requires a bit of tricks (export CC=clang) and patience (compilation takes time), but it is doable. Here are the steps to create Enso project with pyarrow support by using graalpy-community-24.1.2:

$ /enso-engine/bin/enso --new Enso_Parquet
$ mkdir Enso_Parquet/polyglot
$ export CC=clang
$ /graalpy-community-23.1.0-linux-amd64/bin/graalpy -m venv Enso_Parquet/polyglot/python
$ ./Enso_Parquet/polyglot/python/bin/graalpy -m pip install pyarrow

with that one can modify the Enso_Parquet/src/Main.enso to load a Parquet file:

from Standard.Base import all
from Standard.Table import all
import Standard.Visualization

foreign python read_parquet file = """
    import pyarrow.parquet as pq

    return pq.read_table(file)

main =
    file1 = read_parquet enso_project.root.parent/"AthletesCSV"/"merged.parquet"

and we need to get the file loaded!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-compiler -libs Libraries: New libraries to be implemented l-apache-arrow InMemory Table move to Apache Arrow x-on-hold
Projects
Status: ⚙️ Design
Development

No branches or pull requests

2 participants