This section is for those interested in contributing to the development of fabric8-analytics. Please read through our glossary in case you are not sure about terms used in the docs.
Git, and possibly other packages, depending on how you want to run the system (see below).
First of all, clone the common
repo (this one). This includes all
the configuration for running the whole system as well as some helper
scripts and docs.
In order to have a good local development experience, the code repositories are mounted inside containers, so that changes can be observed live or after container restart (without image rebuilds).
In order to achieve that, all the individual fabric8-analytics repos have to be
checked out. The helper script setup.sh
is here to do that. Run setup.sh -h
and follow the instructions (most of the time, you'll be fine with running
setup.sh
with no arguments).
Requirements:
- docker >= 1.10.0
- docker-compose >= 1.6.0
Fedora 24, 25 and 26 have docker-compose > 1.6 and docker > 1.10.0. You should be able to run on Fedora 24/25/26 without any workarounds.
Then run:
$ sudo docker-compose up
To get the system up.
Please note that some error messages might be displayed during startup for the
data-model-importer
module. Such errors are caused by the Gremlin-https that
takes some time to start before serving any requests. After some time, the
data-model-importer
will be started properly.
If you want a good development setup (source code mounted inside the containers, ability to rebuild images using docker-compose), use:
$ sudo ./docker-compose.sh up
docker-compose.sh
will effectively mount source code from checked out
fabric8-analytics sub-projects into the containers, so any changes made to the local
checkout will be reflected in the running container. Note, that some
containers (e.g. server) will pick this up interactively, others (e.g. worker)
will need a restart to pick the new code up.
$ sudo ./docker-compose.sh build --pull
Some parts (GithubTask, BlackDuckTask) need credentials
for proper operation. You can provide environment variables in worker_environment
in docker-compose.yml
.
When running locally via docker-compose, you will likely not need to scale most of the system components. You may, however, want to run more workers, if you're running more analyses and want them finished faster. By default, only a single worker is run, but you can scale it to pretty much any number. Just run the whole system as described above and then in another terminal window execute:
$ sudo docker-compose scale worker-api=2 worker-ingestion=2
This will run additional 2 workers, giving you a total of 4 workers running. You can use this command repeatedly with different numbers to scale up and down as necessary.
TBD :)
When the whole application is started, there are several services you can
access. When running through docker-compose, all of these services will be
bound to localhost
. When running with OpenShift, TODO
- fabric8-analytics Server itself (see server-service.yaml) - port
32000
- fabric8-analytics Jobs API - port
34000
- Celery Flower (task queue monitor, see flower-service.yaml) - port
31000
Celery Flower is only run if you run with-f docker-compose.debug.yml
- PGWeb (web UI for database, see pgweb-service.yaml) - port
31003
PGWeb is only run if you run with-f docker-compose.debug.yml
- Minio S3 - port
33000
All services log to their stdout/stderr, which makes their logs viewable through Docker:
- When using docker-compose, all logs are streamed to docker-compose output
and you can view them there in real-time. If you want to see output of a single
container only, use
docker logs -f <container>
, e.g.docker logs -f coreapi-server
(the-f
is optional and it switches on the "follow" mode).
Refer to the integration testing README
Worker, by its design, is a monolith that takes care of all tasks available in the system. However there is a possibility to let a worker serve only some particular tasks. This can be accomplished by supplying the following environment variables (disjoint):
WORKER_EXCLUDE_QUEUES
- a comma separated list of regexps describing name of queues that should be excluded from worker serving, all others are includedWORKER_INCLUDE_QUEUES
- a comma separated list of regexps describing name of queues that should be included from worker serving, all others are excluded
This can be especially useful when performing prioritization or throttling of some particular tasks.
Even though the whole fabric8-analytics is run in Amazon AWS, there are basically no strict limitations in local deployment.
The following AWS resources are used in the cloud deployment.
- Amazon SQS
- Amazon DynamoDB
- Amazon S3
- Amazon RDS
However, there are used alternatives in the local setup:
- RabbitMQ - an alternative to Amazon SQS
- Local DynamoDB
- Minio S3 - an alternative to Amazon S3
- PostgreSQL - an alternative to Amazon RDS
You can use directly RabbitMQ instead of Amazon SQS. To do so, do not provide
AWS_SQS_ACCESS_KEY_ID
and AWS_SQS_SECRET_ACCESS_KEY
environment variables.
The RabbitMQ broker should be then running on a host described by
RABBITMQ_SERVICE_SERVICE_HOST
environment variable (defaults to coreapi-broker
).
As RabbitMQ scales pretty flawlessly, there shouldn't be any strict downside of using RabbitMQ instead of Amazon's SQS.
The local setup is already prepared to work with local DynamoDB. This is handled in data-model repository.
To substitute Amazon S3 in local deployment and development setup, there was chosen Minio S3 as an alternative. However Minio S3 is not fully compatible alternative (e.g. it does not support versioning nor encryption), but it does not restrict basic functionality whatsoever.
You can find credentials to the Minio S3 web console in docker-compose
file
(search for MINIO_ACCESS_KEY
and MINIO_SECRET_KEY
).
The only difference in using PostgreSQL instead of Amazon's RDS is in supplying connection string to PostgreSQL/Amazon RDS instance. This connection string is constructed from the following environment variables:
POSTGRESQL_USER
- defaults tocoreapi
in the local setupPOSTGRESQL_PASSWORD
- defaults tocoreapi
in the local setupPOSTGRESQL_DATABASE
- defaults tocoreapi
in the localPGBOUNCER_SERVICE_HOST
- defaults tocoreapi-pgbouncer
(note that PosgreSQL/Amazon RDS is accessed through PGBouncer, thus naming)