The ResearchHub Django API
Our mission is to accelerate the pace of scientific research 🚀
We believe that by empowering scientists to independently fund, create, and publish academic content we can revolutionize the speed at which new knowledge is created and transformed into life-changing products.
💡 Got an idea or request? Open issue on Github.
🐛 Found a bug? Report it here.
➕ Want to contribute to this project? Introduce yourself in our Discord community
📰 Read the ResearchCoin White Paper
👷 See what we are working on
There are two different methods for running this project: Dev Containers with VSCode and a native installation.
Install Docker, Visual Studio Code and the Dev Containers extension. Please review the Installation section in the Visual Studio Code Dev Container documentation.
On MacOS with Homebrew, the installation can be achieved with the following commands:
brew install docker
brew install visual-studio-code
code --install-extension ms-vscode-remote.vscode-remote-extensionpack
Clone the repository and create an initial configuration by copying the sample configuration files to config_local
:
cp db_config.sample.py src/config_local/db.py
cp keys.sample.py src/config_local/keys.py
Make adjustments to the new configuration files as needed.
When opening the code in VSCode, tt will recognize the Dev Containers configuration and will prompt to Rebuild and Reopen in Container. Alternatively, select Rebuild and Reopen in Container manually from the command palette. This will pull and run all necessary auxiliary services including ElasticSearch, PostgreSQL, and Redis.
During the creation of the dev container, all Python dependencies are downloaded and installed and an initial database migration is also performed. After dev container creation, proceed with seeding the database as needed.
Run the application by typing the following into integrated terminal:
cd src
python manage.py runserver
Alternatively, debugging of the application is possible with the following launch configuration (in .vscode/launch.json
):
{
"version": "0.2.0",
"configurations": [
{
"name": "Python: Django",
"type": "debugpy",
"request": "launch",
"program": "${workspaceFolder}/src/manage.py",
"args": ["runserver", "[::]:8000"],
"django": true,
"autoStartBrowser": false
}
]
}
-
Create a fork of the repository in your GitHub account, and clone it.
-
Prepare the database:
Create a db file in config
touch src/config/db.py
Add the following:
NAME = 'researchhub' HOST = 'localhost' PORT = 5432 USER = 'rh_developer' # replace as needed PASS = 'not_secure' # replace as needed
-
Use posgres.app to install Posgres DB. The latest available DB version should be fine.
Good UI tool for interacting with PostgreSQ: Postico
-
The project virtual environment is managed using Poetry.
pip3 install poetry
-
Go to the
src
directory and run the following commands in order to activate the virtual environment:cd src # activates a Python virtual environment and enters shell poetry shell # installs the project virtual environment and packages poetry install
The following commands should all be run in the virtual environment (
poetry shell
), in thesrc
folder:
-
Install python dependencies stored in
requirements.txt
:pip3 install -r requirements.txt --no-deps
-
Create the database schema:
python manage.py makemigrations python manage.py migrate
-
The backend worker queue is managed using
redis
. Before you start the backend, in a separate terminal, runredis-server
:brew install redis redis-server
-
Start
celery
, the tool that runs the worker viaredis
. In a separate terminal:# celery: in poetry shell, run: cd src ./start-celery.sh
Celery may need to be added to your env PATH manually.
-
In order for the UI to work properly, some data needs to be seeded into the database. Seed category data:
python manage.py create-categories
-
Seed hub data. There's a CSV file in
/misc/hub_hub.csv
with hub data that you can use to seed hubs data. This can be done in two ways:- in
Postico
: right-click on thehub_hub
table, and selectImport CSV...
. You will encounter problems importing the CSV due to the tool thinking that empty fields are nulls foracronym
anddescription
columns. Temporarily updatehub_hub
table to allow null values for those columns:
ALTER TABLE hub_hub ALTER COLUMN description DROP NOT NULL; ALTER TABLE hub_hub ALTER COLUMN acronym DROP NOT NULL;
Import CSV, then change all nulls to empty in the two columns, and revert the columns to not null:
UPDATE hub_hub set acronym='', description=''; ALTER TABLE hub_hub ALTER COLUMN description SET NOT NULL; ALTER TABLE hub_hub ALTER COLUMN acronym SET NOT NULL;
OR
- in Python: run
python manage.py shell_plus
to open a Python terminal in the virtual environment. Then, paste the following code:
- in
import pandas as pd
from hub.models import Hub
hub_df = pd.read_csv("../misc/hub_hub.csv")
hub_df = hub_df.drop("slug_index", axis=1)
hub_df = hub_df.drop("acronym", axis=1)
hub_df = hub_df.drop("hub_image", axis=1)
hubs = [Hub(**row.to_dict()) for _, row in hub_df.iterrows()]
Hub.objects.bulk_create(hubs)
python manage.py runserver
pre-commit install
# create a superuser and retrieve an authentication token
python manage.py createsuperuser --username=florin --email=florin@researchhub.com
# p: not_secure
python manage.py drf_create_token florin@researchhub.com
Note that for paths under
/api
, e.g./api/hub/
, you don't need a token.
curl --silent \
--header 'Authorization: Token <token>' \
http://localhost:8000/api/
-
Install the REST Client extension.
-
Create a file called
api.rest
with the following contents (insert token):GET http://localhost:8000/api/ HTTP/1.1 content-type: application/json Authorization: Token <token>
Then press
Send Request
in vscode, above the text.
For this to work, the celery worker needs to be running (see above). This calls two methods that are temporarily disabled, in
src/paper/tasks.py
:pull_crossref_papers()
andpull_papers()
. First, comment the first line of the methods, that cause the methods to be disabled. Then, change thewhile
loops to finish after pulling a small number of papers (enough to populate local environment):
def pull_papers(start=0, force=False):
# Temporarily disabling autopull
return # <-- this line needs to be commented out
...
while True: # <-- change this to while i < 100:
...
def pull_crossref_papers(start=0, force=False):
# Temporarily disabling autopull
return # <-- this line needs to be commented out
...
while True: # <-- change this to while offset < 100:
Then, run:
python manage.py shell_plus # enters Python shell within poetry shell
from paper.tasks import pull_crossref_papers, pull_papers
pull_crossref_papers(force=True)
pull_papers(force=True)
Make sure to revert that file once you're done seeding the local environment.
# add a package to the project environment
poetry add package_name
# update requirements.txt which is used by elastic beanstalk
poetry export -f requirements.txt --output requirements.txt
In a new shell, run this Docker image script (make sure Redis is running in the background redis-server
)
# Let this run for ~30 minutes in the background before terminating, be patient :)
./start-es.sh
Back in the python virtual environment, build the indices
python manage.py search_index --rebuild
Optionally, start Kibana for Elastic dev tools
./start-kibana.sh
To view elastic queries via the API, add DEBUG_TOOLBAR = True
to keys.py
. Then, visit an API url such as http://localhost:8000/api/search/paper/?publish_date__gte=2022-01-01
Create a wallet file in config
touch src/config/wallet.py
Add the following to wallet.py (fill in the blanks)
KEYSTORE_FILE = ''
KEYSTORE_PASSWORD = ''
Add the keystore file to the config directory
Ask a team member for the file or create one from MyEtherWallet https://www.myetherwallet.com/create-wallet
Run the test suite:
# run all tests
# Note: Add --keepdb flag to speed up the process of running tests locally
python manage.py test
# run tests for the paper app, excluding ones that require AWS secrets
python manage.py test paper --exclude-tag=aws
# run a specific test example:
run python manage.py test note.tests.test_note_api.NoteTests.test_create_workspace_note --keepdb
Run in the background for async tasks:
celery -A researchhub worker -l info
Run in the background for periodic tasks (needs celery running)
celery -A researchhub beat -l info
Both celery commands in one (for development only)
celery -A researchhub worker -l info -B
Ask somebody to provide you with CLIENT_ID
and SECRET
config, and run this SQL query (with updated configs) to seed the right data for Google login to work:
insert into socialaccount_socialapp (provider, name, client_id, secret, key)
values ('google','Google','<CLIENT_ID>', '<SECRET>');
insert into django_site (domain, name) values ('http://google.com', 'google.com');
insert into socialaccount_socialapp_sites (socialapp_id, site_id) values (1, 1);
(make sure that IDs are the right one in the last query)