Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue querying the Bioregistry SPARQL endpoint from various triplestores #775

Closed
vemonet opened this issue Mar 17, 2023 · 7 comments · Fixed by biopragmatics/curies#46
Closed
Labels

Comments

@vemonet
Copy link
Contributor

vemonet commented Mar 17, 2023

Hi @cthoyt

This issue follows up on #686 and #773

I tried to run federated queries to the new Bioregistry SPARQL endpoint from various triplestores using a simple SPARQL query:

PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT DISTINCT ?o WHERE {
    SERVICE <https://bioregistry.io/sparql> {
        <http://purl.obolibrary.org/obo/CHEBI_24867> owl:sameAs ?o
    }
}

From OpenLink Virtuoso

From a Virtuoso triplestore v7.2.9: https://bio2rdf.org/sparql (latest version of open source virtuoso) we get the following response:

Virtuoso RDFZZ Error DB.DBA.SPARQL_REXEC('https://bioregistry.io/sparql', ...) returned Content-Type 'text/html' status 'HTTP/1.1 200 OK
'
{"results": {"bindings": [{"o": {"type": "uri", "value": "https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:24867"}}, {"o": {"type": "uri", "value": "http://bioregistry.io/CHEBI:24867"}}, {"o": {"type": "uri", "value": "http://bioregistry.io/CHEBIID:24867"}}, {"o": {"type": "uri", "value": "http://bioregistry.io/ChEBI:24867"}}, {"o": {"type": "uri", "value": "http://bioregistry.io/chebi:24867"}}, {"o": {"type": "uri", "value": "http://identifiers.org/CHEBI/24867"}}, {"o": {"type": "uri", "value": "http://identifiers.org/CHEBI:24867"}}, {"o": {"type": "uri", "value": "http://identifiers.org/chebi/CHEBI:24867"}}, {"o": {"type": "uri", "value": "http://n2t.net/chebi:24867"}}, {"o": {"type": "uri", "value": "http://purl.obolibrary.org/obo/CHEBI_24867"}}, {"o": {"type": "uri", "value": "http://www.ebi.ac.uk/chebi/displayImage.do?defaultImage=true&imageIndex=0&chebiId=24867"}}, {"o": {"type": "uri", "value": "http://www.ebi.ac.uk/chebi/searchId.do?chebiId=24867"}}, {"o": {"type": "uri", 

SPARQL query:
define sql:big-data-const 0
#output-format:text/html
define sql:signal-void-variables 1
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT DISTINCT ?o WHERE {
    SERVICE <https://bioregistry.io/sparql> {
        <http://purl.obolibrary.org/obo/CHEBI_24867> owl:sameAs ?o
    }
}

It seems like the query is well processed, but the results are sent with the wrong content-type (text/html)

From Ontotext GraphDB

GraphDB 10.1.0 using RDF4J 4.2.0: https://graphdb.dumontierlab.com/repositories/test

Error 500: Internal Server Error
Query evaluation error: <!doctype html>
<html lang=en>
<title>400 Bad Request</title>
<h1>Bad Request</h1>
<p>The browser (or proxy) sent a request that this server could not understand.</p> (HTTP status 500)

From Blazegraph

Not sure which version: http://kg-hub-rdf.berkeleybop.io/blazegraph/sparql

Server Error (#500)

SPARQL-QUERY: queryStr=PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT DISTINCT ?o WHERE {
    SERVICE <https://bioregistry.io/sparql> {
        <http://purl.obolibrary.org/obo/CHEBI_24867> owl:sameAs ?o
    }
}
java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: org.openrdf.query.QueryEvaluationException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.Exception: task=ChunkTask{query=48bf76d9-daf6-452b-a5b6-62e11c08cab6,bopId=1,partitionId=-1,sinkId=2,altSinkId=null}, cause=java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: com.bigdata.rdf.sail.webapp.client.HttpException: Status Code=400, Status Line=BAD REQUEST, Response=<!doctype html>
<html lang=en>
<title>400 Bad Request</title>
<h1>Bad Request</h1>
<p>The browser (or proxy) sent a request that this server could not understand.</p>

I think it's mostly related to the content-types they are expecting

@vemonet vemonet changed the title Issue querying the Bioregistry SPARQL endpoint from a Virtuoso triplestore Issue querying the Bioregistry SPARQL endpoint from various triplestores Mar 17, 2023
@vemonet
Copy link
Contributor Author

vemonet commented Mar 18, 2023

Hi @cthoyt, I realized after writing vemonet/rdflib-endpoint#8 that you were talking about integrating the curie mapping endpoint to the bioregistry flask app 😅

I implemented the use of rdflib-endpoint to serve the SPARQL endpoint in place of Flask in curies, and deployed the flask app + rdflib-endpoint in the bioregistry app. We just need to change some of the params that I passed as string to use proper variables

You can find the changes done in the branch add-rdflib-endpoint-for-mappings on my fork of bioregistry and curies:

Let me know if it fits your requirements, I did not see any impact on performance when serving the flask app locally through FastAPI

Using SparqlEndpoint should solve SERVICE queries from most triplestores, and a YASGUI interface will be automatically served to users accessing /sparql through the browser

I am facing issues when deploying with the current gunicorn config though, I think you will need to change the workers class to gunicorn -k uvicorn.workers.UvicornWorker so I currently added a quick fix to the CLI option to start the web app with uvicorn in development (it gets fast hot reload, which is really convenient when developing).

rdflib-endpoint was added to the fastapi optional dependencies in curies, not sure if this is the right place to put it!

I also implemented the custom processor in SparqlEndpoint, so the queries with values on left join are working:

PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT REDUCED * WHERE {
    ?child owl:sameAs ?child_mapped .
}
VALUES (?child) {
    (<http://purl.obolibrary.org/obo/CHEBI_1>) 
    (<http://purl.obolibrary.org/obo/CHEBI_2>)
}

I'll clean up the code and send a pull request if this implementation works for you

@cthoyt
Copy link
Member

cthoyt commented Mar 18, 2023

Let's plan to chat on monday morning - I want to make sure any changes in curies are supported for both flask and fastapi, both on the blueprint/router level and explicitly not by making full fledged apps. This needs to be possible to easily mount on existing apps (whether they're in flask or fastapi), and I don't want to spend time on the minutiae of gunicorn/uvicorn/etc

@vemonet
Copy link
Contributor Author

vemonet commented Mar 18, 2023

Yes, let's do this on monday

This needs to be possible to easily mount on existing apps (whether they're in flask or fastapi)

It is possible with FastAPI :) you can mount any thing on it as far as I know, then just serve with uvicorn/gunicorn (FastAPI just leverage existing standard to describe your API, it's quite well built). I even serve pre-compiled React progressive web apps with it in production without issues!

I am not sure it is possible to do with Flask though (could be, but I did not find anything in my searches)

The switch to use FastAPI adds just 2 clear lines of code in bio2registry, and does not loose any existing capabilities of the webapp (to be tested more though!)

and I don't want to spend time on the minutiae of gunicorn/uvicorn/etc

For serving gunicorn/uvicorn etc is quite simple, and you have already done 99% of the job since it is already served through gunicorn, so you just need to enable to use the uvicorn worker class in your gunicorn setup

cthoyt added a commit to biopragmatics/curies that referenced this issue Mar 18, 2023
Closes biopragmatics/bioregistry#775. 

This PR adds handling of headers to both the Flask and FastAPI
implementations of the apps.

- [x] Add Flask implementation
- [x] Add FastAPI implementation
- [x] Add Flask tests
- [x] Add FastAPI tests
- [ ] Should the `output` parameter be supported?

CC @vemonet. Ideally, I'd like to use
https://github.com/vemonet/rdflib-endpoint and not re-implement this
code, but we'll have to work through a few issues first (improving code
modularity, documentation, and figuring out Flask suppot) before I can
give that a try
@cthoyt
Copy link
Member

cthoyt commented Mar 18, 2023

I implemented a more principled approach for handling content types in biopragmatics/curies#46 and improved response types, but I will re-open this since there are other solutions possible.

@vemonet how about 13.00 CET on monday? i'll email you a zoom link

@cthoyt cthoyt reopened this Mar 18, 2023
@cthoyt
Copy link
Member

cthoyt commented Mar 18, 2023

I think https://flask.palletsprojects.com/en/2.2.x/patterns/appdispatch/#combining-applications might be appropriate for mounting fastapi on to flask

@cthoyt cthoyt added the website label Mar 18, 2023
@vemonet
Copy link
Contributor Author

vemonet commented Mar 18, 2023

Interesting, that might be a solution but will probably require some additional patching, because FastAPI is ASGI, and this is for WSGI apps

This question seems to contain some interesting remarks: https://stackoverflow.com/questions/68769247/how-do-i-write-an-asgi-compliant-middleware-while-staying-framework-agnostic

@cthoyt
Copy link
Member

cthoyt commented Mar 26, 2023

As of #780, this appears to be fixed 🚀

@cthoyt cthoyt closed this as completed Mar 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants