Get full text links for closed articles via pubmed? #36

dwinter · 2015-03-16T01:59:27Z

Hi scott.

My drive to complete rentrez turned up something that might be helpful for fulltext. It turns out the elinkendpoint form NCBI can find links to outside providers for a given paper via it's PMID.

This example requires the work I've been doing in the elink feature branch (progress being tracked on ropensci/rentrez#39):

rec <- entrez_link(db="pubmed", dbfrom="pubmed", cmd="llinks", id=19822631)
rec

elink object with contents
 $linkouts links to external websites

rec$linkouts

$ID_19822631
$ID_19822631[[1]]
Linkout from HighWire 
 $Url: http://ctj.sagepub.com/cgi ...

$ID_19822631[[2]]
Linkout from EBSCO 
 $Url: http://openurl.ebscohost.c ...

$ID_19822631[[3]]
Linkout from ProQuest 
 $Url: http://gateway.proquest.co ...

$ID_19822631[[4]]
Linkout from COS Scholar Universe 
 $Url: http://www.scholaruniverse ...

$ID_19822631[[5]]
Linkout from Genetic Alliance 
 $Url: http://www.diseaseinfosear ...

$ID_19822631[[6]]
Linkout from MedlinePlus Health Information 
 $Url: http://www.nlm.nih.gov/med ...

The links can include things like ResearchGate.

I don't know if just finding full text links is within the scope of fulltext, but thought I'd give you a head ups about this now in case it's helpful 😄

The text was updated successfully, but these errors were encountered:

sckott · 2015-03-16T19:11:44Z

hey @dwinter -

I think getting full text links is a good use case for sure. The rcrossref pkg has a function that does this, or at least attempts to - there's some publishers that don't share the appropriate metadata, so the links that come back can be wrong, have the wrong content types, etc.

anyway, yeah, I think another interface we could have in addition to search and get_full_text, is get_link_to_full_text (tehse aren't actual function names) - Perhaps someone would want to get links, then take them elsewhere in R or another language.

dwinter · 2015-03-16T19:45:15Z

Hi @sckott,

Cool, the elink branch is almost ready to merge back with master, so will definitely be all in place for the stable release. Let me know if I can help integrating it into fulltext as and when you get to it

sckott · 2015-03-16T20:41:31Z

Great, sounds good.

Gotta think about this a little bit. That is, should this feature/use case just be a subset within the search interface here, or a separate set of functions. Maybe it makes most as downstream after seaerching, so:

user searches, e.g, res <- ft_search(query='ecology', from='plos')
user wants full text links, so e.g.., (using a new function) ft_links(res) takes DOIs from the output of the call to ft_search(), or any user defined subset, and either gives back full text links if they are already in the result metadata, or go out and try to get them

sckott · 2015-09-29T00:18:12Z

@dwinter working on this now. What's the best way to get data for many DOIs? Seems like we can do 10.1371/journal.pone.0086169[doi] OR 10.1016/j.ympev.2010.07.013[doi] and so on for X DOIs, but is that best practice?

dwinter · 2015-09-29T16:17:31Z

Yeah -- if you start from dois you first have to search to get the pubmed IDs, and this is the best way.

FWIW, when I've been trying to think of ways to automatically generate the search syntax, the best I've come up with is

dois <- c("10.1371/journal.pone.0086169", "10.1016/j.ympev.2010.07.013")
paste(paste0(dois, "[doi]"), collapse=" OR ")

[1] "10.1371/journal.pone.0086169[doi] OR 10.1016/j.ympev.2010.07.013[doi]"

At some point, with very many DOIs the REST URL will get too long. I've never found any documentation for how long is too long, but rentrez should at least pass on a useful error message.

Happy to help on any of this.

sckott · 2015-09-29T17:47:28Z

Right, that's the same way I combine the DOIs

Right, the URI too long code, 414 maybe

I could see with text mining use case how one may pass in far too many DOIs for the URI length restriction. Would be great to have a way around this :)

dwinter · 2015-09-29T18:30:10Z

So, a bit of trial and error suggest "too long" is around 7000 chars. With these dois that's about 80.

termify <- function(dois) paste(paste0(dois, "[doi]"), collapse=" OR ") 

entrez_search(db="pubmed", term=termify(rep(dois, 90)))

Error in entrez_check(response) : 
  HTTP failure 414, the request is too large. For large requests, try using web history as described in the tutorial

entrez_search(db="pubmed", term=termify(rep(dois, 80)))

 Search term (as translated):  10.1371/journal.pone.0086169[doi] OR 10.1016/j.ymp ...

So maybe the solution is to check that there aren't more than ~70 or so dois, and "chunk" the requests if there are more.

There might also be a problem with searching for multiple IDs -- there is no guarantee that the returned object is going have the records in the same order as they appear in the search term. The summary records have both the DOI and the PMID, so it is possible to reconstruct that relationship. But there might also be an easier way to bulk convert DOIs to PMIDs?

sckott · 2015-09-29T19:26:33Z

Thanks for looking into this!

Chunking sounds like the way forward.

There might also be a problem with searching for multiple IDs -- there is no guarantee that the returned object is going have the records in the same order as they appear in the search term. The summary records have both the DOI and the PMID, so it is possible to reconstruct that relationship

I did notice that - that the returned data isn't the same length as the input, so I can't reliably attach the input DOIs to the output - but I didn't know DOI was also in there, I'll use that to reconstruct.

#36 a few data sources still have no plugin ready yet, so are not available

sckott · 2015-09-30T23:06:25Z

@dwinter initial attempt commited, havent made any changed per our discussion above yet

sckott · 2015-10-01T19:17:06Z

Now Getting other plugins to work for a suite of publishers

in addition, got more publisher plugins for ft_links working arxiv and biorxiv still not working yet fix to some code in plugins_search

sckott · 2015-10-01T23:49:44Z

essentially implemented, will open new issues as needed for this fxn

dwinter · 2015-10-02T00:21:09Z

Nice work! I will check out the details and play around with it this weekend if you want a tester :)

sckott · 2015-10-02T00:21:35Z

yeah, plz do

sckott added the usecases label Mar 16, 2015

sckott self-assigned this Sep 28, 2015

sckott added this to the v0.1.2 milestone Sep 28, 2015

sckott added a commit that referenced this issue Sep 30, 2015

initial implementation for ft_links to get full text links

b68af40

#36 a few data sources still have no plugin ready yet, so are not available

sckott added a commit that referenced this issue Sep 30, 2015

chagnes to ft_links man file #36

7d7b7df

sckott added a commit that referenced this issue Oct 1, 2015

fixes to ft_links to accept ft_ind objects, #36

3928df0

in addition, got more publisher plugins for ft_links working arxiv and biorxiv still not working yet fix to some code in plugins_search

sckott closed this as completed Oct 1, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get full text links for closed articles via pubmed? #36

Get full text links for closed articles via pubmed? #36

dwinter commented Mar 16, 2015

sckott commented Mar 16, 2015

dwinter commented Mar 16, 2015

sckott commented Mar 16, 2015

sckott commented Sep 29, 2015

dwinter commented Sep 29, 2015

sckott commented Sep 29, 2015

dwinter commented Sep 29, 2015

sckott commented Sep 29, 2015

sckott commented Sep 30, 2015

sckott commented Oct 1, 2015

sckott commented Oct 1, 2015

dwinter commented Oct 2, 2015

sckott commented Oct 2, 2015

Get full text links for closed articles via pubmed? #36

Get full text links for closed articles via pubmed? #36

Comments

dwinter commented Mar 16, 2015

sckott commented Mar 16, 2015

dwinter commented Mar 16, 2015

sckott commented Mar 16, 2015

sckott commented Sep 29, 2015

dwinter commented Sep 29, 2015

sckott commented Sep 29, 2015

dwinter commented Sep 29, 2015

sckott commented Sep 29, 2015

sckott commented Sep 30, 2015

sckott commented Oct 1, 2015

sckott commented Oct 1, 2015

dwinter commented Oct 2, 2015

sckott commented Oct 2, 2015