-
Notifications
You must be signed in to change notification settings - Fork 47
Get full text links for closed articles via pubmed? #36
Comments
hey @dwinter - I think getting full text links is a good use case for sure. The anyway, yeah, I think another interface we could have in addition to |
Hi @sckott, Cool, the elink branch is almost ready to merge back with master, so will definitely be all in place for the stable release. Let me know if I can help integrating it into fulltext as and when you get to it |
Great, sounds good. Gotta think about this a little bit. That is, should this feature/use case just be a subset within the search interface here, or a separate set of functions. Maybe it makes most as downstream after seaerching, so:
|
@dwinter working on this now. What's the best way to get data for many DOIs? Seems like we can do |
Yeah -- if you start from dois you first have to search to get the pubmed IDs, and this is the best way. FWIW, when I've been trying to think of ways to automatically generate the search syntax, the best I've come up with is dois <- c("10.1371/journal.pone.0086169", "10.1016/j.ympev.2010.07.013")
paste(paste0(dois, "[doi]"), collapse=" OR ")
At some point, with very many DOIs the REST URL will get too long. I've never found any documentation for how long is too long, but Happy to help on any of this. |
Right, that's the same way I combine the DOIs Right, the URI too long code, 414 maybe I could see with text mining use case how one may pass in far too many DOIs for the URI length restriction. Would be great to have a way around this :) |
So, a bit of trial and error suggest "too long" is around 7000 chars. With these dois that's about 80. termify <- function(dois) paste(paste0(dois, "[doi]"), collapse=" OR ")
entrez_search(db="pubmed", term=termify(rep(dois, 90)))
entrez_search(db="pubmed", term=termify(rep(dois, 80)))
So maybe the solution is to check that there aren't more than ~70 or so dois, and "chunk" the requests if there are more. There might also be a problem with searching for multiple IDs -- there is no guarantee that the returned object is going have the records in the same order as they appear in the search term. The summary records have both the DOI and the PMID, so it is possible to reconstruct that relationship. But there might also be an easier way to bulk convert DOIs to PMIDs? |
Thanks for looking into this! Chunking sounds like the way forward.
I did notice that - that the returned data isn't the same length as the input, so I can't reliably attach the input DOIs to the output - but I didn't know DOI was also in there, I'll use that to reconstruct. |
#36 a few data sources still have no plugin ready yet, so are not available
@dwinter initial attempt commited, havent made any changed per our discussion above yet |
Now Getting other plugins to work for a suite of publishers |
in addition, got more publisher plugins for ft_links working arxiv and biorxiv still not working yet fix to some code in plugins_search
essentially implemented, will open new issues as needed for this fxn |
Nice work! I will check out the details and play around with it this weekend if you want a tester :) |
yeah, plz do |
Hi scott.
My drive to complete
rentrez
turned up something that might be helpful for fulltext. It turns out theelink
endpoint form NCBI can find links to outside providers for a given paper via it's PMID.This example requires the work I've been doing in the
elink
feature branch (progress being tracked on ropensci/rentrez#39):The links can include things like ResearchGate.
I don't know if just finding full text links is within the scope of
fulltext
, but thought I'd give you a head ups about this now in case it's helpful 😄The text was updated successfully, but these errors were encountered: