Provide stable local environment #135

mijicd · 2017-12-25T13:56:14Z

Due to multiple reported issues with the docker-compose approach using bash script to spawn containers, it is a top-priority to provide a stable environment for local development / demonstrations. Such system can be distributed either in Vagrant box(es) or as Minikube deployment.

chombium · 2018-01-16T10:19:40Z

Hi @mijicd,

I also experienced the same problem with the docker-compose approach and I started testing something which will fix it. I will create a pull request tonight.

Running Mainflux on Minikube will also be nice and I was also thinking to start experimenting with Minkube when I finish with the fix for the docker-compose approach.

Best Regards,
Jovan

chombium · 2018-02-06T13:13:48Z

Hi,

I also started reading and testing how we can improve the stability of the development environment.
The problem with docker-compose is that we need to ensure proper start up order of the services regarding their dependencies.

Possible solutions:

Docker
The usual docker-compose way is to define dependencies between the containers and add commands to which will check if (wait until) the dependencies are ready. That is usually done with the depends_on usually by adding a script which will check if some tcp port of the other container is opened or even calling some service in the container in order to make sure that the needed service is running. The other way of doing this is by using healthchecks.
The problem is that the depends_on is deprecated and will not be ported to docker-compose v3. The healthchecks are also ignored if someone uses "docker deploy" instead "docker-compose".
It seems that some people have used depends_on with docker-compose v3 1, 2.
I'll check this out.
Kubernetes
If we plan to use minikube for our devenv, Kubernetes has two probes for checking container health, the Liveness and Readiness Probes. The first one shows if the container is alive and the second shows if the container is ready to accept traffic. We can either define a check command or provide http or tcp endpoints with which the health of the container will be checked.

In both cases we need to define commands or api endpoints with which we can check the health status of the containers.

I think the best way to solve this issue is to adjust/refactor the Mainflux services in such a way that they can survive failures of their dependencies and advertise their health status accordingly. For example, if the manager loses the connection to the database it should be able to reconnect when the database it is available again and to update it health status (no db connection = unhealthy, db connection = healthy)

Action points

Revise (re-define) the exact dependencies of the containers
What should happen with the data if a connection to another service has been lost? Should the service continue to accept new messages?
Define for each service separately what should happen when a connection to a service it depends on fails. Should the messages be queued or maybe dropped?
Adjust the mainflux* container images Dockerfiles, so that we can add extra tools. At the moment all the images are build from the "scratch" meta image,
which means the container has only the mainflux* app inside. I would suggest that we re-build the Mainflux container images and take Alpine as a base image. This way we can add few more tools like netcat(nc) and curl to make the health checks by checking ports and calling URLs.
Restructure the dependencies and add the checks
Add some extra data to the health endpoints in order the other services can check the status of the services they depend on.
external_links and links are also not recommended for use in docker-compose v3. We should use networks instead.

@drasko @mijicd What do you think about this? Should we fix the dependencies and the start up order of the containers for the version 1.0 and adjust the healthchecks later or we are going to do it properly now?

@drasko: we can join forces and work together on this ;)

Best Regards,
Jovan

mijicd · 2018-02-11T12:22:05Z

Deprecated by #159. Changes in the underlying infrastructure will drop all of the issues that we discussed above.

drasko · 2018-02-11T23:50:29Z

@chombium health check already exists in Mainfux: https://github.com/mainflux/mainflux/issues/152. These are endpoints of this type: https://github.com/mainflux/mainflux/blob/master/manager/api/transport.go#L121

Besides that, there are logs and especially Prometheus metrics that can be used for monitoring.

However, let's try to simplify the arch as proposed by @mijicd in #159 and see later if something else is needed.

All help is more than welcome!

chombium · 2018-02-14T01:03:53Z

@drasko the proposition by @mijicd looks really good. I also really like the fact that the messages will not be persisted and the users will have the possibility to use a database (persistence) which they like.

I think the only thing left would be adjustment of the services so that they can survive outages of the services they depend on. I will take a better look and write what should eventually be changed.

Best Regards,
Jovan

drasko · 2018-02-14T08:51:36Z

@chombium for the persistence, I was thinking that we publish our untested connectors (for example here we have mongodb ones: https://github.com/MainfluxLabs), and put it in a separate community-maintained repo/organization. We do not have resources (nor a particular need) to maintain all of these, but community might be interested. In any case, we have starting point for C*, MongoDB and InfluxDB.

Regarding fault tolerance of the services, I implemented the first version using exponential backoffs with timeout, but @mijicd explicitly insisted that we remove this and to apply circuit-breaker pattern. I will let him explain more, but I am willing to examine how we can make the system more fault tolerant (sometimes I regret that we did not wrote all of this in Erlang :)).

mijicd · 2018-02-14T08:58:08Z

@drasko Please note that the circuit breaker is not a replacement for exponential backoff. Exponential backoff was present and used only on startup, and only toward the database. Circuit breaker is introduced between manager service and any other service utilizing its client.

@chombium Currently there is a dependency between the adapter(s) and manager service. All the services using manager's client library will automatically use circuit breaker. Unfortunately, this is not yet implemented in MQTT adapter, since that adapter is written in javascript, therefore does not use manager's go client. It will be resolved in the (near) future.

drasko · 2018-02-14T09:08:46Z

@mijicd is there a particular reason to remove this exponential backoff. Also, on the disconnection from DB (or other services), should we apply the backoff (i.e. should we try to recover) or die?

@chombium circuit breaker comes with go-kit (one of the nice features) and can be easily enabled for all our Go services. Our plan is to rewrite MQTT borker in Go also and have complete golang solution.

mijicd · 2018-02-14T09:32:29Z

@drasko I want the service to fail fast when database is not available on startup. Disconnections occurring in runtime will cause errors, resulting in 5xx responses from manager service. Technically, we can wrap the DB interaction in CB as well, but it might be non-trivial to implement.

drasko · 2018-02-14T09:36:12Z

OK - so:

No backoff, on startup die loudly if dependent service is not up
Upon intermittent disconnections 5xx will be returned and activated CB will handle this case

Sounds good to me.

mijicd · 2018-02-14T09:55:46Z

Once again, the only place where CB is used is in interaction with manager service, with exception being MQTT adapter.

* Add experimental flag to environments Signed-off-by: Darko Draskovic <darko.draskovic@gmail.com> * Add experimental environment file Signed-off-by: Darko Draskovic <darko.draskovic@gmail.com> * Add default environment Signed-off-by: Darko Draskovic <darko.draskovic@gmail.com> * Remove trailing whitespace Signed-off-by: Darko Draskovic <darko.draskovic@gmail.com> * Change build command Signed-off-by: Darko Draskovic <darko.draskovic@gmail.com>

* fix(e2e): Add admin role and permission This commit adds a new constant and modifies the `createUser` function. The function now creates a user with an admin role and a domain with admin permissions. Signed-off-by: Rodney Osodo <28790446+rodneyosodo@users.noreply.github.com> * chore(ci): add e2e testing on CI Signed-off-by: Rodney Osodo <28790446+rodneyosodo@users.noreply.github.com> * Remove unnecessary time.Sleep calls and optimize code execution Signed-off-by: Rodney Osodo <28790446+rodneyosodo@users.noreply.github.com> --------- Signed-off-by: Rodney Osodo <28790446+rodneyosodo@users.noreply.github.com>

mijicd added enhancement help-wanted labels Dec 25, 2017

This was referenced Dec 25, 2017

Add Minikube Support #82

Closed

Message Writer Problem #98

Closed

mijicd added this to the 1.0.0-rc.1 milestone Dec 25, 2017

chombium mentioned this issue Feb 3, 2018

change timeout for waiting for cassandra to 100 seconds; #151

Closed

drasko self-assigned this Feb 6, 2018

mijicd closed this as completed Feb 11, 2018

chombium mentioned this issue Mar 14, 2018

docker-compose fails #179

Closed

mijicd unassigned drasko May 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide stable local environment #135

Provide stable local environment #135

mijicd commented Dec 25, 2017

chombium commented Jan 16, 2018

chombium commented Feb 6, 2018

mijicd commented Feb 11, 2018

drasko commented Feb 11, 2018

chombium commented Feb 14, 2018 •

edited

Loading

drasko commented Feb 14, 2018

mijicd commented Feb 14, 2018 •

edited

Loading

drasko commented Feb 14, 2018

mijicd commented Feb 14, 2018

drasko commented Feb 14, 2018

mijicd commented Feb 14, 2018

Provide stable local environment #135

Provide stable local environment #135

Comments

mijicd commented Dec 25, 2017

chombium commented Jan 16, 2018

chombium commented Feb 6, 2018

mijicd commented Feb 11, 2018

drasko commented Feb 11, 2018

chombium commented Feb 14, 2018 • edited Loading

drasko commented Feb 14, 2018

mijicd commented Feb 14, 2018 • edited Loading

drasko commented Feb 14, 2018

mijicd commented Feb 14, 2018

drasko commented Feb 14, 2018

mijicd commented Feb 14, 2018

chombium commented Feb 14, 2018 •

edited

Loading

mijicd commented Feb 14, 2018 •

edited

Loading