Search and visualize public procurements for EU countries http://tenders.exposed/
Previously, tenders.exposed was powered by tenders-exposed/elvis-backend but we decided to rewrite it completely because:
- the previous architecture based on MongoDB didn't allow us to efficiently implement clustering and splitting nodes
- the data provider changed and the data structure along with it
We chose Node.js because:
- we decided to use a graph database (OrientDB) and the Node.js ecosystem offered good tools for this
- Node.js is more widespread across the open spending and open contracting community
Check out the API documentation made with Swagger:
https://api.tenders.exposed/docs
-
Download OrientDB
2.2.30
:docker run --name orientdb -p 2424:2424 -p 2480:2480 -e ORIENTDB_ROOT_PASSWORD={yourRootPass} orientdb:2.2.30
-
Create databases:
docker exec -it orientdb /orientdb/bin/console.sh
In the ODB console:
CREATE DATABASE remote:localhost/{yourDBName} root {yourRootPass} plocal graph
And a test db, preferably in memory:
CREATE DATABASE remote:localhost/{yourTestingDBName} root {yourRootPass} memory graph
-
Clone this repo:
git clone https://github.com/tenders-exposed/elvis-backend-node.git
-
Configure environment variables:
cd elvis-backend-node
In the root of the project make a new file called
.env
from the.env.example
file:cp .env.example .env
Edit
.env
with your settings. If you went with the ODB defaults like above, it will look like this:ORIENTDB_HOST=localhost ORIENTDB_PORT=2424 ORIENTDB_DB={yourDBName} # Admin is the default ODB user ORIENTDB_USER=admin ORIENTDB_PASS=admin ORIENTDB_TEST_DB={yourTestingDBName}
-
Install dependencies:
npm install
-
Create the database schema for the dev db:
The test db is migrated automatically before every test.
npm run migrate
-
Open OrientDB Studio in a browser at
http://localhost:2480/studio/index.html
to see if the database contains the schema we migrated -
Run the tests with:
npm run test
-
Run the linter with:
npm run lint
-
Install OrientJS globally to get access to their CLI. For example to create a new migration:
orientjs -h localhost -p 2424 -n elvis -U admin -P admin migrate create {newMigrationName}
-
Pull latest changes.
-
Update configuration in
.env
based on.env.example
if necessary. -
Build the containers:
ORIENTDB_ROOT_PASSWORD={password} docker-compose build elvis_api
-
Start the containers:
docker-compose up --no-deps -d elvis_api
If this is the first deploy run:
docker-compose up -d elvis_api
-
Migrate:
docker-compose run --name=migrate --rm elvis_api npm run migrate
The amount of data we have is overwhelming for a single Node process. Not only does the import take long but it reaches Heap out of memory error even with up to 15GB of RAM.
To speed things up and avoid overwhelming an individual process we are now running a Node process for each file instead of passing multile files to the same process. To achieve this we make a docker container to import each file and we orchestrate the containers with GNU parallel:
find /folder/with/data/files -iname '*.json' -printf "%f\n" | \
parallel --progress -I"{}" -j5 \
docker-compose run --name="elvis_import_"{} --rm elvis_api \
node --max-old-space-size=4096 ./scripts/import_data.js -c 1000 -r 1 /rawdata/data/exported_by_country/{}
With -j5
we are telling parallel
to process 5 containers at once. The option --max-old-space-size=4096
allows the node process up to use up to 4GB of RAM. The import scripts also takes options:
-r
to set number of retries for each line-c
to set how many concurrent lines should be processed at once
We also have to import static data for countries:
docker-compose run --name=import_countries --rm elvis_api node ./scripts/import_countries.js
and CPVs:
docker-compose run --name=import_cpvs --rm elvis_api node ./scripts/import_cpvs.js