Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try translating PDF URLs based on URL #70

Open
dstillman opened this issue Dec 23, 2018 · 7 comments
Open

Try translating PDF URLs based on URL #70

dstillman opened this issue Dec 23, 2018 · 7 comments

Comments

@dstillman
Copy link
Member

Related to #38, but a few translators are able to function based on the URL, even when it's a PDF page. We should try to support those cases, before either trying PDF recognition (from #38) or failing (if PDF recognition isn't enabled). This includes DOIs in the URL as well as certain sites where we recognize PDF URLs (since people sometimes click "Save to Zotero" when viewing a PDF without going back to the article page). I can try to find an example of such a translator if necessary.

This might be a little tricky, because we may need to provide a fake empty document to run detect on, but we won't want to fall back to generic webpage saving.

@mrtcode
Copy link
Member

mrtcode commented Jan 7, 2019

Can we wait for #59 or do we want this for the current t-s version?

So I think not only PDF but all URLs should be tried.

I.e. this doesn't work because it's returning a JSON content type.

  1. If it's HTML or XML content type, it already goes through translation architecture, otherwise:
  2. Create an empty document
  3. Do a separate translation
  4. If successful, return the translated metadata
  5. If it's a PDF, upload and process it
  6. If not a PDF, return invalid content type error

And we don't want to translate URLs that return an HTTP error code?

@dstillman
Copy link
Member Author

This can wait for #59 if that's easier.

And we don't want to translate URLs that return an HTTP error code?

I think that's right.

@mrtcode
Copy link
Member

mrtcode commented Jan 9, 2019

I already implemented a fix that does what is described in this issue, but it's based on #59, therefore it will need to wait. Another requirement is zotero/translators#1799, because the current DOI translator can't extract from URL. For now it's better to just do #72.

@phiresky
Copy link

@mrtcode This probably isn't the right place to ask, but what is the reason that Zotero Connect can get the actual citation from something like https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf but Translation Server can't? Also I'm having a hard time figuring out how Zotero Connect does that at all...

@mrtcode
Copy link
Member

mrtcode commented Aug 20, 2019

@phiresky Zotero Connector uses 'Neural Information Processing Systems' translator which is actually slicing off the '.pdf' extension and extracting metadata from the web page behind this paper https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks . Technically translation-sever should be capable to do the same. I think we have to fix that. Good observation.

@phiresky
Copy link

Thanks. Here are some more examples that work fine via Zotero but not via Translation Server:

Are those the same issue?

My motivation here by the way is that I'm writing papers in markdown and I wrote a tool to transparently convert URLs to citations without having to use a reference manager: https://github.com/phiresky/pandoc-url2cite

@mvolz
Copy link
Contributor

mvolz commented Dec 18, 2019

This has been brought up again on the email list: https://groups.google.com/forum/#!msg/zotero-dev/9AmwvQqBCBY/H57ukdE9AgAJ

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants