-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add yellow-fever to manifest #1083
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good to me, but I was extremely confused why yellow-fever was not showing up at https://nextstrain-s-add-yellow-zarguy.herokuapp.com/pathogens
Turns out the current yellow-fever dataset on S3 does not have a version id
$ aws s3api list-object-versions --bucket nextstrain-data --prefix yellow-fever_meta.json
{
"Versions": [
{
"ETag": "\"1390b7046efbd1f0150bbf2656206c0c\"",
"Size": 758,
"StorageClass": "STANDARD",
"Key": "yellow-fever_meta.json",
"VersionId": "null",
"IsLatest": true,
"LastModified": "2017-05-03T21:30:48.000Z"
}
],
"RequestCharged": null
}
So it gets filtered out during the resource index generation. Is there a way to add a version id to an S3 object or do we have to download and re-upload it?
(From memory) that's because it predates us using versioning in the bucket. I'd suggest uploading a current YF build (that one's from 2017) to the bucket. |
Ah that makes sense! If we want to retain the version history on nextstrain.org/pathogens that links back to the 2017 build, it would still need the version id though. |
As this Stack Overflow post points out, the object does have a version id: it's the literal string |
Also bumps resource index revision number in testing and production configs, and updates an old comment that was not accurate now that yellow-fever is in the manifest.
26b9565
to
86a6850
Compare
Only Amazon S3 generates version IDs, and they cannot be edited, and similarly for last modification date. So we can re-upload, and it'll get a (new) version ID, but then it'll look like a 2024 dataset to the resource indexer. If you really want to show this as from 2017 you could extend the indexer to allow a "special cases" mapping of versionId to date. |
Ah yes, the distinction between I forgot the resource indexer doesn't check the actual S3 objects. It pulls metadata from the S3 inventory, which reports the
The empty string matches the falsy check and gets filtered out of the resource index.
@genehack how important is it to show the dataset is from 2017 in the snapshots chart? |
You
I mean, it would certainly be nice to do so — I wonder if the user metadata thing mentioned in that SO post would be a way to get a version ID without changing the last modified date… I may experiment with that over the weekend… |
So, I have mucked around with this in a test bucket, and done some more web searching, and TL;DR I don't see any way to get a version ID on that that S3 object that doesn't also change the last-modified date. I will chat with @trvrb to confirm this, but I suspect if the two options are "display the old dataset but with a modern date" and "don't display the old dataset at all", the latter is better. I wonder if the resource indexer could be updated so that it didn't turn the |
The CSV is generated by AWS S3 inventory so we don't have control over the empty string version id. I found some docs on converting the empty strings to null strings, but it seems like a manual operation not a config setting. |
womp womp |
If we want to preserve that original yellow-fever version, we can make our indexer not filter out objects with |
…which will also surface all the other things that are currently dropped due to being old/unversioned. That feels like the pull tab on a can of worms? |
After reflection and much discussion, the consensus (which I agree with) was that surfacing that old data was not going to be worth the trouble. Going ahead and merging this; will enable |
Also bumps resource index revision number in testing and production configs.
Note: needs to merge semi-concurrently with the work for nextstrain/yellow-fever#25.