-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build info/version info inside ADAM-generated files #188
Comments
Also, I may have missed some previous discussions on how we do this, but I recently converted hg19 to a Parquet file of ADAMNucleotideConfigFragments. It seems there's no way to recover the reference version information - or am I missing something? The AVRO record contig fields don't store this. Can we shove it in the Parquet metadata somewhere? |
Calling out @massie here (when you get back from vacation, Matt) -- he's had some thoughts on embedding information into the Parquet metadata. |
Ping @massie |
Once we upgrade to Parquet 1.6.0, we'll be able to read/write arbitrary metadata much more easily. We can easily drop the version info (introduced in #138) into the metadata to help with debugging. The upgrade to 1.6.0 is going well but three tests are failing because of issues with predicates (UnboundRecordFilter). |
Is this worth another look? Parquet dependency is now at version 1.8.x. |
Perhaps we can write this with our various metadata? |
We should resolve this as part of #1257. |
This will be resolved as part of #1257. Closing as dupe. |
We should build off of Sebastian's work in #138 to output ADAM version info inside files generated by ADAM, so that we can version files containing ADAMRecords, ADAMNucleotideFragments, ADAMVariants, etc.
The text was updated successfully, but these errors were encountered: