Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support additional output types #31

Open
JohnPaton opened this issue Mar 5, 2022 · 5 comments
Open

Support additional output types #31

JohnPaton opened this issue Mar 5, 2022 · 5 comments
Labels
enhancement New feature or request
Milestone

Comments

@JohnPaton
Copy link
Owner

Right now we only support CSV, which is what the portal provides. We could convert to other file formats (parquet, avro) on the fly for easier processing later.

@JohnPaton JohnPaton added the enhancement New feature or request label Mar 5, 2022
@JohnPaton JohnPaton added this to the v1.0 milestone Mar 5, 2022
@avaldebe
Copy link
Collaborator

avaldebe commented Sep 7, 2022

Hi @JohnPaton

At work we plan to extend the CLI to handle parquet files. Are you interested on a PR?

It would be a post processing CLI script, something like airbase-to-parquet data-path parquet-path, with an command line option to define a partition for the dataset.

We'll base our work from #38, as Poetry allows console scripts that depends on an extra. That way pip install airbase[parquet] will install the required dependencies and the new CLI.

We could further integrate with the existing CLI and download utilities, but we can discuss the details on the eventual PR...

@JohnPaton
Copy link
Owner Author

Hey, I think more output formats would be great and parquet is an obvious choice, though I guess we'll need to make some smart choices about partitioning

@JohnPaton
Copy link
Owner Author

Maybe we could start a separate module for postprocessing, and add to the CLI in a followup PR?

@avaldebe
Copy link
Collaborator

Sure. This can be done using a plugin architecture that allows to add post processing formats on a different package by declaring entry points. I have done something like this on two projects before. Will prepare a draft PR to illustrate the methodology.

@JohnPaton
Copy link
Owner Author

Alright, I have no experience in this direction so I'm happy to see what you come up with!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants