-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Dataverse to UI. Fixes #900 #969
Add Dataverse to UI. Fixes #900 #969
Conversation
Could you explain a bit why we need to involve an additional service instead of something like the following: class DataverseProvider(RepoProvider):
"""Provide contents of a Dataverse record
Users must provide a spec consisting of the Dataverse DOI.
"""
name = Unicode("Dataverse")
@gen.coroutine
def get_resolved_ref(self):
client = AsyncHTTPClient()
req = HTTPRequest("https://doi.org/{}".format(self.spec),
user_agent="BinderHub")
r = yield client.fetch(req)
self.record_id = urllib.parse.parse_qs(urllib.parse.urlparse(r.effective_url).query)
return self.record_id
def get_repo_url(self):
# While called repo URL, the return value of this function is passed
# as argument to repo2docker, hence we return the spec as is.
return self.spec
def get_build_slug(self):
return "dataverse-{}".format(self.record_id) I tried to get a response as JSON but that doesn't seem to work :-( |
In order to avoid implementing this and this again.
From
|
If you're not interested in verifying that DOI belongs to Dataverse, why bother with resolving it at all? You can check if it's doi via regex and and generate some checksum of it for |
I think there is no need to know if a DOI points to a dataverse instance or not, repo2docker will tell us that once we give it the DOI. What BinderHub needs is an ID that identifies the content and that we can use as a cache key. For Git this means we resolve a "branch name" to a SHA1 which is used as cache key. For DOIs from Zenodo we resolve the DOI to the Zenodo ID because the same DOI can point to different Zenodo IDs over time (that is how they implemented versioning). That is IMHO the main reason for resolving it and we can show an error earlier to the user if the DOI doesn't exist (someone made a typo). I'm pretty sure you could enter a Dataverse DOI and select Zenodo from the dropdown on mybinder.org right now and it would work (maybe, kinda). |
Makes sense. That's a cleaner solution, I wasn't sure that's an acceptable choice.
That applies to dataverse too. However, that information (dataset version) is not a part of the url unfortunately. By default the DOI is pointing to the latest version. Getting that info would require additional API call. I'm not sure how to proceed with it...
Yeah, that's why I wondered whether |
I think until we support "many" (for some definition of many) DOI targets (or what ever we call Zenodo, figshare, etc) it feels better to make it explicit that "type a FooBar DOI here, not any other kind. If you do you shouldn't expect this to work" in terms of messaging to the user.
I was hoping that with |
d22cb3b
to
8424d1c
Compare
That's the best I could come up with:
I'm not 100% it's always going to be the case. Maybe @pdurbin can chime in. |
Hi! Sorry, I thought I was subscribed to this pull request but I didn't get an email until I was mentioned today. Zenodo and Dataverse treat DOI differently. (I know less about Figshare, sorry.)
I hope this helps! Please keep the questions coming! |
@betatim it looks like @Xarthisius removed "WIP" from the title of this pull request. Are there any more questions I can answer? Thanks! |
@betatim my curiosity got the better of me and I finally just tried this. I plugged in the DOI from my dataset at https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/TJCLKP and it looks like files are getting downloaded from Dataverse (Harvard Dataverse in this case) into Binder. But then it hangs? I'll attach some screenshots. (time passes) By the way, awesome shout out to Binder at https://www.nature.com/articles/d41586-019-03366-x All of this is on my mind because we're talking a lot these days about this "Capsulization and Packaging of Replication Objects in Dataverse" issue at IQSS/dataverse#6085 and you and others are very welcome to join in the conversation! |
I just thought to myself, "Who would be excited to see Binder working with Dataverse DOIs." The answer, of course, is @atrisovic and we picked a DOI at random from her recent "Building reproducible workflows for earth sciences - Responding to reproducibility challenges from physics to social sciences" talk: https://vimeo.com/366217811 (slides at https://events.ecmwf.int/event/116/contributions/937/attachments/239/432/Repwork-WS-Trisovic.pdf by the way. We punched in "10.7910/DVN/VZSO5S" from this dataset: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/VZSO5S And it worked! When I got back to my desk it said "image already built" like in the screenshots below but I was able to play with the code in Binder. I feel like we're so close to having the "Binderverse" we dreamed of in IQSS/dataverse#4714 🎉 🎉 🎉 @betatim @choldgraf et al., is there anything I can do to help move this pull request forward? 😄 |
Hey all - just looking through the review and it seems reasonable to me. I'm also a fan of the "make the DOI target an explicit thing instead of just totally generic DOIs" approach - that way we can choose a subset of DOIs that we know will play nicely with BinderHubs. Are there any lingering "to-do" items on this one? Or are we "just" waiting for a merge? |
I believe I addressed all Tim's comments. I guess it needs a 2nd pair of eyes? |
c3cfc67
to
1d59c7e
Compare
@@ -294,6 +295,44 @@ def get_build_slug(self): | |||
return "figshare-{}".format(self.record_id) | |||
|
|||
|
|||
class DataverseProvider(RepoProvider): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks pretty reasonable to me. One thing I'd ask is if you could write out some short documentation for the spec for this provider. Could you:
- Add more contextual information to the docstrings in this class
- Add an entry for the Dataverse URL structure to the BinderHub docs here: https://binderhub.readthedocs.io/en/latest/api.html#providers ?
- (optionally) not necessary for this PR but, add one for the other repoproviders that are support but not listed there, if you want extra kudos :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that's fixed by Phil's commit.
Hey all - just a few quick comments from me, mostly regarding documentation. I think it looks good though! One quick thought: I saw that this also implements the "resolve the URL with |
OK, we just got #1017 merged, so can you follow the instructions here: In particular the bit about documenting this repoprovider, and then we can merge? |
@choldgraf hi! I just opened Xarthisius#1 to merge some docs into this pull request by @Xarthisius Can you please take a look? Thanks! This is the the main commit to look at: Xarthisius@df83cc4 Here's how it looks on my machine: developer pagereference page |
@choldgraf @betatim the way this pull request is going it looks like the URL in "Copy the URL below" is going to look something like this: https://mybinder.org/v2/dataverse/10.7910/DVN/TJCLKP/ Here it is in context: This pull request (DOI in path)Instead of having the DOI in the path like this, would it be possible to have the DOI in a query parameter? I mean something like this: https://mybinder.org/v2/dataverse?datasetPid=doi:10.7910/DVN/RLLL1V Here's how that would look: Proposed (DOI in query parameter)The main reason I'm asking this question is that on that call last week (thanks again for having us!) I mentioned that I could give you a JSON file to document how to add and "Explore" button to Dataverse for Binder. The problem is that Dataverse's "external tool" framework (the thing I was calling a glorified URL builder) does not support dynamically changing the path in the external tool. Rather, the base URL of the tool is set and query parameters are used to send dynamic information (like the DOI). That is to say, this is the JSON I had in mind to add a Binder button to Dataverse (as we discussed
And that JSON will send the Dataverse user who clicks "Explore" and then "Binder" to a URL like this (the same as above with a query parameter): https://mybinder.org/v2/dataverse?datasetPid=doi:10.7910/DVN/RLLL1V The problem, of course, is that I don't believe the Binder will know what to do with a URL with a query parameter like that. I noticed in #712 some discussion about query parameters but it seemed to be more about sending additional information beyond the base spec (the provider and the DOI) rather than permitting the base spec (the provider and DOI) to handle query parameters. In short, if this pull request gets merged, I'd love to be able to help you get a JSON file ready to put in the Binder docs of how to add a Binder button to Dataverse but (again) the Dataverse "external tool" framework always wants to send the DOI of the dataset as a query parameter. The Dataverse external tool framework currently does not have the ability to put the DOI of the dataset in the path of the URL. I hope this makes sense! Please let me know if anything is unclear! |
The UI addition both look good to me! For your second issue about URL parameters etc, I think that'd best be discussed in a new issue, unless you'd like to block this PR on it. What do you think? |
@choldgraf by UI addition I assume you mean the doc addition. The screenshots of the two pages in the docs I updated. Thanks! I almost created a new issue about the query parameter thing. (And maybe some day I'll open another separate issue about supporting both DOIs and Handles. And yet another issue about accepting a version number for a dataset (1.1, 2.0, etc.) I'm totally fine with merging this as is and will excitedly tell the Dataverse community about it. I'm a little bummed that an Dataverse external tool won't "just work" right now but I think we can get there eventually. If it's easiest to merge this and cherry pick my doc commit at Xarthisius@df83cc4 that's fine with me. Whatever works for you. Thanks! 😄 |
I'd vote against discussing the adding of the ability for some repo providers to use query parameters in this PR. |
@betatim sounds fine. Is there anything else you need from me or @Xarthisius ? |
I believe that @bitnik noticed there are two other methods that should be created for the repository provider, see his comment here: Specifically, So to me there are 3 things needed to move forward:
|
1d59c7e
to
b562b61
Compare
@choldgraf thanks. It looks like @Xarthisius addressed these:
Is there anything you need for "Confirm that the repo2docker contentprovider is ready for Dataverse"? |
As per #969 (comment) repo2docker handles Dataverse DOIs on main Binderhub instance even without this PR. |
Yeah. @atrisovic also gave a more recent demo using a Dataverse DOI with the Zenodo content provider a couple weeks ago. Here's the dataset: https://doi.org/10.7910/DVN/EOYZKH or https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/EOYZKH Here's how we used a Dataverse DOI with the Zenodo content provider: https://mybinder.org/v2/zenodo/10.7910/DVN/EOYZKH/ One can find notes from this demo in the "JupyterHub and BinderHub Team Meeting" doc at https://hackmd.io/u2ghJJUCRWK-zRidCFid_Q?view @choldgraf please advise! 😄 |
The changes look good to me, and thanks for adding tests as well. I'll plan on merging this tomorrow morning to give time for anybody else to comment or raise a flag, otherwise LGTM |
ok, merged! let's see how this looks in the deployed sites once the henchbot makes its PR |
@choldgraf it works!!!! 🎉 🎉 🎉 Thanks for merging! Here are some screenshots: Entering a Dataverse DOI in BinderThe files from the Dataverse dataset behind the DOI are in Jupyter LabClicking "Visit repo" returns me to the dataset in DataverseThank you so much! I can't wait to let the Dataverse community know! |
Ok, I just gave a shout out to @Xarthisius (❤️ 🎉 ❤️ 🎉 ) in a post I made called "Binder supports Dataverse DOIs for Jupyter Notebooks" on the Dataverse mailing list and encouraged everyone to install Binder: https://groups.google.com/d/msg/dataverse-community/K7vKWSPLaQY/8go3fvV3AwAJ |
Thanks a lot for persevering with getting this over the line! Some feedback on how I experienced this PR. The enthusiasm is appreciated, understandable and shared! However a lot of comments in this thread are very long. In my opinion they don't contribute to moving things forward proportional to the amount of screenspace they take up. For me it is even the other way around, more comments with more content it makes it harder to move things forward. This is because I take a quick peek at issues with activity and if I can deal with in in 5min I try to do it then and there, otherwise I schedule it for a later moment when I have time, patience and generally in the right frame of mind for dealing with potentially tricky discussions. This means the iteration time increases a lot. I don't feel like I could just skip comments because people might refer to them or because they might contain relevant insights. So for me a short comment like "yay, it just works if you select Zenodo, try it mybinder.org/v2/zenodo/1234..." is great because it lets us know someone tried it and that it works. Everything else we can probably figure out as people working on PRs are usually "deep into" the technical stuff. If there is a question we can always go deeper on request. When things don't work as expected then more details are needed. To sum it up, for PRs and issues I try to keep that famous quote of "I apologise for sending such a long letter, I didn't have time to write a shorter one" in mind :) and I sometimes get carried away with excitement as well and post too much. |
@betatim short and sweet from now on! 🍦🍦🍦 Thanks for everything!!! 🎉 ❤️ |
Note: This PR depends on an unreleased version of repo2docker
This adds Dataverse DOI as a valid target for BinderHub UI. Tests are expected to fail right now as I developed it against unreleased version of WT, where there's a slight change to DOI format returned by the lookup endpoint ("doi:10..." vs "10..."). However, it doesn't affect functionality and this PR in any way. Everything should be working as is.