Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: add Avro schema #1

Open
wants to merge 5 commits into
base: latest
Choose a base branch
from
Open

WIP: add Avro schema #1

wants to merge 5 commits into from

Conversation

ctb
Copy link

@ctb ctb commented Jan 1, 2023

This PR adds Apache Avro as an output format.

The schema was copied over from a Python avro reader/writer plugin, here.

The tricky bit was the schema, which is defined here. The good news? It compiles! And, in Python, it seems to work! The bad news - more work needs to be done on the Rust side; when I do:

cargo run ../sourmash/podar-ref/1.fa.sig -o xxx

I get:

Error: Value does not match schema

@ctb
Copy link
Author

ctb commented Jan 3, 2023

ctb added a commit to sourmash-bio/sourmash that referenced this pull request Jan 7, 2023
…ts new signature saving & loading mechanisms (#2428)

Implement support for `load_from` and `save_to` plugins via
`importlib.metadata` entry points.

This supports a few of the plugins suggested in
#1353

I am nominating this as an experimental feature that is not under
semantic versioning/not public yet.

Documentation page [here, in
dev_plugins.html](https://sourmash--2428.org.readthedocs.build/en/2428/dev_plugins.html).

A template repo for new plugins is at
https://github.com/sourmash-bio/sourmash_plugin_template.

## Implementation/this PR

This PR refactors the `_load_database` loading and
`SaveSignaturesToLocation` saving code to build a prioritized list of
functions to try in order, and then adds hooks in via the new
`sourmash.plugins` module that insert additional loading/saving
functions into that list.

This PR also moves the current saving/loading functions out of
`sourmash.sourmash_args` into the `sourmash.save_load` submodule, and
simplifies the code a bit.

## Example plugins:

- read JSON sigs and manifests from URLs:
https://github.com/sourmash-bio/sourmash_plugin_load_urls
- read and write signatures in Apache Avro:
https://github.com/sourmash-bio/sourmash_plugin_avro - use extension
`.avrosig` to write.

Specific TODOs:
- [x] provide a minimal "getting started" template repo
- [x] add tests for multiple plugins & priorities
- [ ] maybe try writing CSV export/import as a plugin?
#1098

For later:
- think about other kinds of plugins - new CLI entry points, picklist
classes, tax loading, tax structure, ??.
- work on getting avro support into rust over in
luizirber/2021-02-11-sourmash-binary-format#1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant