Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add "parquet source" mapped to "GraphSource" to support parquet sink support #490

Merged
merged 2 commits into from
Apr 15, 2024

Conversation

sierra-moxon
Copy link
Member

  • also: remove hardcoding of kgx version number instead using importlib

@sierra-moxon sierra-moxon requested a review from caufieldjh April 15, 2024 19:24
@caufieldjh
Copy link
Collaborator

Thanks @sierra-moxon !

@caufieldjh
Copy link
Collaborator

Still throwing an error from prepare_output_args() in cli_utils.py - I think I have a fix

@caufieldjh
Copy link
Collaborator

False alarm - I was running the incorrect kgx version again.

But I'll take this chance to add material on parquet sink to the docs.

@justaddcoffee
Copy link

I see parquet as an available output option now as expected:

$ poetry run kgx transform --help 
Usage: kgx transform [OPTIONS] [INPUTS]...

  Transform a Knowledge Graph from one serialization form to another.

Options:
  -i, --input-format TEXT         The input format. Can be one of ('tsv',
                                  'csv', 'graph', 'json', 'jsonl', 'obojson',
                                  'obo-json', 'trapi-json', 'neo4j', 'nt',
                                  'owl', 'sssom', 'parquet')
  -c, --input-compression TEXT    The input compression type
  -o, --output PATH               Output
  -f, --output-format TEXT        The output format. Can be one of ('tsv',
                                  'csv', 'graph', 'json', 'jsonl', 'obojson',
                                  'obo-json', 'trapi-json', 'neo4j', 'nt',
                                  'owl', 'sssom', 'parquet')
[snip]

Should this be working now or no?

$ poetry run kgx transform -f parquet -o tempout tests/resources/rdf/test1.nt 
[KGX][__init__.py][   transform_wrapper] ERROR: kgx.transform error: Type None not yet supported

@caufieldjh
Copy link
Collaborator

It appears to be working for me:

~/kgx$ poetry run kgx transform -f parquet -o tempout tests/resources/rdf/test1.nt -i nt
[KGX][rdf_source.py][               parse] INFO: Done parsing tests/resources/rdf/test1.nt

The distinction being that you'll still have to specify the input format, too

@justaddcoffee
Copy link

Excellent! Thanks @caufieldjh works for me too:

$ poetry run kgx transform -f parquet -o tempout tests/resources/rdf/test1.nt -i nt
[KGX][rdf_source.py][               parse] INFO: Done parsing tests/resources/rdf/test1.nt

@sierra-moxon sierra-moxon merged commit 45203eb into master Apr 15, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants