20240624 developer call notes #1236
tomwhite
started this conversation in
Meeting Notes
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
20240624
Pre-notes
PRs
Issues
Discussions
Notes
Attendees
Discussion
Sgkit next
JK: made a vcztools prototype using Tom’s sgkit code
TW: next thing could be CLI using Cubed and JAX under it
JK: hard bit is to do bcftools filtering language - quite arcane (regex)
JH: our lab once implemented a filter language with PEG.js that was not so bad: https://github.com/hammerlab/cycledash/blob/master/grammars/querylanguage.pegjs
JH: get AI to code gen? Feed it https://vcftools.github.io/man_latest.html.
BJ: new Claude is very good
JK: would be nice to implement a subset of plink too (for GWAS - e.g. BoltLMM) - to get users to use it - since they can’t run plink on UKB today.
JH: VCF manipulation may be more amenable to getting new users
JK: long term goal is to get methods authors to write new methods against our format
TW: NumPy 2, Zarr 3
JK: NumPy 2 looks straightforward
TW: cyvcf already supports NumPy 2, Numba does not (it’s compatible, but doesn’t reproduce behaviour yet)
TW: Let’s have people use bio2zarr for VCF reading
TW: Zarr 3 has broken the API more than expected; have raised several upstream issues while working on Cubed
JK: will give it a week then try bio2zarr again
JH: Who are the main Zarr 3 implementers?
TW: Joe Hamman, Davis Bennett, Norman Rzepka at Scalable Minds
TW: Xarray has pinned to Zarr < 3
TW: Has done work to pull hypothesis VCF out of sgkit, will move to new project under sgkit-dev
TW: https://scikit.bio has had a reboot
Beta Was this translation helpful? Give feedback.
All reactions