Skip to content

Basic GraphQL API for an Example Bioinformatics Case (Flask + GraphQL with Ariadne)

License

Notifications You must be signed in to change notification settings

furkanmtorun/GraphQL_API_in_Bioinformatics

Repository files navigation

🧪 Basic GraphQL API for a Bioinformatics Case

CI - Basic Bioinformatics GraphQL API Py Vers License

💡 TL;DR: What is this at all?

Here is a basic GraphQL API built for a bioinformatics (genomics) case by empowering Python & Flask (web microframework) and Ariadne (for GraphQL API) together with the explanatory documentation!

⭐ If you like it, please do not forget to give a star!

Screenshot An example query for fetching a specific gene and its all transcript-related info

Are you looking for an daily life analogy instead of `gene`, `transcript`, `expression_profiles` terms? You could think of an author instead of a gene. So, our authors (genes) might have more than one book (transcript). Also, each book (transcript) could have more than one version/type and - so, their meta information also change!

💭 What is GraphQL and was I aiming for with this side project?

Together with the advancements in the disparate areas of bioinformatics/computational biology, the need for accessing the data lakes/warehouses increased! Yet, it is challanged to make those data resources accessible to the community.

Definition of GraphQL by GraphQL (graphql.org)

GraphQL is a query language for APIs and a runtime for fulfilling those queries with your existing data.

GraphQL, created by Facebook, is designed to provide a better web API standard by building fast, flexible and also developer-friendly APIs.

However, the vast majority of bioinformatics resources use REST (representational state transfer) APIs and use of GraphQL in bioinformatics is still limited even though it comes with several advantages!

Hopefully this basic example makes the transition to GraphQL easier than ever thought!

Available GraphQL Libraries in Python:

To see the difference, check out: https://morioh.com/p/f9d3c644b20b.


💎 Why should you prefer empowering GraphQL over REST?

REST vs GraphQL? As always, the answer is "okay, depends on your task/project!". Here are the reasons you should prefer using GraphQL.

In this repo, a bioinformatics case met all these criteria! (For sure, this could be generalized to other domains as well!)

  • Reason 1: Endpoints & Versioning

    • To build the same example here with REST, you should have at least 3 endpoints: /gene, /transcript, /expression_profiles. For now, it could be easy to maintain and version the API (by keeping the conventions as the API evolves over time) however most of the time the number is so larger than 3! Here, you have only one endpoint, /graphql and there is no need to worry about versions and URL changes!
  • Reason 2: Nested queries

    • You can find yourself querying several endpoints for several different resources/topics. Here, in this example, to have the expression profiles of each of transcripts for all genes (check the Data Schema section to see the relation), we need to have nested queries! In REST arch., you need to make at least 3 request and create a loop. However, with GraphQL, you can fetch all of them (according to your needs) within a single query! Yes, this is pretty cool!
  • Reason 3: Saving bandwidth (A bit tricky)

    • In RESTful API, most of the time, it brings over-fetching issue (excessive data is fetched at once) when you send a request even though some of them are not needed! Other way around, under-fetching could be a case as well! However, by specifiying the fields you need within the GraphQL queries, you can reduce the amount of data transmitted and improve the overall performance!

    Note: The tricky part is the following: If the clients send a query asking all of the fields, it will end up with performance issues :)

  • For more details on the reasons to use and NOT to use, please have a look this post and this post, respectively!


🧵 Installation and Running

  • Installation:

    • Clone the repository:

      • git clone https://github.com/furkanmtorun/GraphQL_API_in_Bioinformatics
    • Navigate into the directory:

      • cd GraphQL_API_in_Bioinformatics
    • Install the requirements in your environment (recommended):

      • pip install -r requirements.txt
  • Running:

    Note: I already enabled the Flask app to run in debug mode in main.py, so no need to set environment variable for the file name, meaning that you directly run it!


🎯 Data Schema and Query Examples

  • Check schema.graphql file out to see the schema to be used by GraphQL API

    Note: you could also see this on the web UI by toggling the Schema button at the right side of the page!

  • Here, I provided some queries to demonstrate the use of GraphQL API. For detailed explanations and examples, navigate to Example_Queries.md page!

    Feel free to play around with them to learn more!

  • As it would be out of scope for this repo and to make this API Example simpler, I avoid to integrate any type of DB connection, rather directly store them in data_fetch.py.


🗂 File/Code Organization

Find the explanation below for each file in this repo to make it everything crystal clear!

.
├── Example_Queries.md  <- Explains the syntax and lists some example queries
├── README.md           <- You are here!
├── api
│   ├── __init__.py     <- Defines Flask app and set DB link
│   ├── data_fetch.py   <- Stores data (but should be DB connection/querying functions in ideal case)
│   └── resolvers.py    <- Defines resolvers for the fields
├── requirements.txt    <- Environment needs to install
├── main.py             <- Defines GraphQL endpoints and relate the resolvers into fields 
└── schema.graphql      <- Data schema to be read & used by GraphQL

🎈 What could be added to improve?

  • Mutations, (see here for more details)
  • DB Connection for data fetching rather than storing them directly in data_fetch.py (I did so for simplicitiy)

🚀 Author and Developer

Moreover, please do not hesitate to comment via opening an issue via GitHub if you have any suggestions or feedback!


ℹ️ References & Resources

About

Basic GraphQL API for an Example Bioinformatics Case (Flask + GraphQL with Ariadne)

Topics

Resources

License

Stars

Watchers

Forks

Languages