Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide stable local environment #135

Closed
mijicd opened this issue Dec 25, 2017 · 11 comments
Closed

Provide stable local environment #135

mijicd opened this issue Dec 25, 2017 · 11 comments

Comments

@mijicd
Copy link
Contributor

mijicd commented Dec 25, 2017

Due to multiple reported issues with the docker-compose approach using bash script to spawn containers, it is a top-priority to provide a stable environment for local development / demonstrations. Such system can be distributed either in Vagrant box(es) or as Minikube deployment.

This was referenced Dec 25, 2017
@mijicd mijicd added this to the 1.0.0-rc.1 milestone Dec 25, 2017
@chombium
Copy link
Collaborator

Hi @mijicd,

I also experienced the same problem with the docker-compose approach and I started testing something which will fix it. I will create a pull request tonight.

Running Mainflux on Minikube will also be nice and I was also thinking to start experimenting with Minkube when I finish with the fix for the docker-compose approach.

Best Regards,
Jovan

@chombium
Copy link
Collaborator

chombium commented Feb 6, 2018

Hi,

I also started reading and testing how we can improve the stability of the development environment.
The problem with docker-compose is that we need to ensure proper start up order of the services regarding their dependencies.

Possible solutions:

  • Docker
    The usual docker-compose way is to define dependencies between the containers and add commands to which will check if (wait until) the dependencies are ready. That is usually done with the depends_on usually by adding a script which will check if some tcp port of the other container is opened or even calling some service in the container in order to make sure that the needed service is running. The other way of doing this is by using healthchecks.
    The problem is that the depends_on is deprecated and will not be ported to docker-compose v3. The healthchecks are also ignored if someone uses "docker deploy" instead "docker-compose".
    It seems that some people have used depends_on with docker-compose v3 1, 2.
    I'll check this out.

  • Kubernetes
    If we plan to use minikube for our devenv, Kubernetes has two probes for checking container health, the Liveness and Readiness Probes. The first one shows if the container is alive and the second shows if the container is ready to accept traffic. We can either define a check command or provide http or tcp endpoints with which the health of the container will be checked.

In both cases we need to define commands or api endpoints with which we can check the health status of the containers.

I think the best way to solve this issue is to adjust/refactor the Mainflux services in such a way that they can survive failures of their dependencies and advertise their health status accordingly. For example, if the manager loses the connection to the database it should be able to reconnect when the database it is available again and to update it health status (no db connection = unhealthy, db connection = healthy)

  • Action points
  1. Revise (re-define) the exact dependencies of the containers
    What should happen with the data if a connection to another service has been lost? Should the service continue to accept new messages?
  2. Define for each service separately what should happen when a connection to a service it depends on fails. Should the messages be queued or maybe dropped?
  3. Adjust the mainflux* container images Dockerfiles, so that we can add extra tools. At the moment all the images are build from the "scratch" meta image,
    which means the container has only the mainflux* app inside. I would suggest that we re-build the Mainflux container images and take Alpine as a base image. This way we can add few more tools like netcat(nc) and curl to make the health checks by checking ports and calling URLs.
  4. Restructure the dependencies and add the checks
  5. Add some extra data to the health endpoints in order the other services can check the status of the services they depend on.
  6. external_links and links are also not recommended for use in docker-compose v3. We should use networks instead.

@drasko @mijicd What do you think about this? Should we fix the dependencies and the start up order of the containers for the version 1.0 and adjust the healthchecks later or we are going to do it properly now?

@drasko: we can join forces and work together on this ;)

Best Regards,
Jovan

@mijicd
Copy link
Contributor Author

mijicd commented Feb 11, 2018

Deprecated by #159. Changes in the underlying infrastructure will drop all of the issues that we discussed above.

@mijicd mijicd closed this as completed Feb 11, 2018
@drasko
Copy link
Contributor

drasko commented Feb 11, 2018

@chombium health check already exists in Mainfux: https://github.com/mainflux/mainflux/issues/152. These are endpoints of this type: https://github.com/mainflux/mainflux/blob/master/manager/api/transport.go#L121

Besides that, there are logs and especially Prometheus metrics that can be used for monitoring.

However, let's try to simplify the arch as proposed by @mijicd in #159 and see later if something else is needed.

All help is more than welcome!

@chombium
Copy link
Collaborator

chombium commented Feb 14, 2018

@drasko the proposition by @mijicd looks really good. I also really like the fact that the messages will not be persisted and the users will have the possibility to use a database (persistence) which they like.

I think the only thing left would be adjustment of the services so that they can survive outages of the services they depend on. I will take a better look and write what should eventually be changed.

Best Regards,
Jovan

@drasko
Copy link
Contributor

drasko commented Feb 14, 2018

@chombium for the persistence, I was thinking that we publish our untested connectors (for example here we have mongodb ones: https://github.com/MainfluxLabs), and put it in a separate community-maintained repo/organization. We do not have resources (nor a particular need) to maintain all of these, but community might be interested. In any case, we have starting point for C*, MongoDB and InfluxDB.

Regarding fault tolerance of the services, I implemented the first version using exponential backoffs with timeout, but @mijicd explicitly insisted that we remove this and to apply circuit-breaker pattern. I will let him explain more, but I am willing to examine how we can make the system more fault tolerant (sometimes I regret that we did not wrote all of this in Erlang :)).

@mijicd
Copy link
Contributor Author

mijicd commented Feb 14, 2018

@drasko Please note that the circuit breaker is not a replacement for exponential backoff. Exponential backoff was present and used only on startup, and only toward the database. Circuit breaker is introduced between manager service and any other service utilizing its client.

@chombium Currently there is a dependency between the adapter(s) and manager service. All the services using manager's client library will automatically use circuit breaker. Unfortunately, this is not yet implemented in MQTT adapter, since that adapter is written in javascript, therefore does not use manager's go client. It will be resolved in the (near) future.

@drasko
Copy link
Contributor

drasko commented Feb 14, 2018

@mijicd is there a particular reason to remove this exponential backoff. Also, on the disconnection from DB (or other services), should we apply the backoff (i.e. should we try to recover) or die?

@chombium circuit breaker comes with go-kit (one of the nice features) and can be easily enabled for all our Go services. Our plan is to rewrite MQTT borker in Go also and have complete golang solution.

@mijicd
Copy link
Contributor Author

mijicd commented Feb 14, 2018

@drasko I want the service to fail fast when database is not available on startup. Disconnections occurring in runtime will cause errors, resulting in 5xx responses from manager service. Technically, we can wrap the DB interaction in CB as well, but it might be non-trivial to implement.

@drasko
Copy link
Contributor

drasko commented Feb 14, 2018

OK - so:

  • No backoff, on startup die loudly if dependent service is not up
  • Upon intermittent disconnections 5xx will be returned and activated CB will handle this case

Sounds good to me.

@mijicd
Copy link
Contributor Author

mijicd commented Feb 14, 2018

Once again, the only place where CB is used is in interaction with manager service, with exception being MQTT adapter.

mteodor pushed a commit to mteodor/mainflux that referenced this issue Jun 9, 2021
* Add experimental flag to environments

Signed-off-by: Darko Draskovic <darko.draskovic@gmail.com>

* Add experimental environment file

Signed-off-by: Darko Draskovic <darko.draskovic@gmail.com>

* Add default environment

Signed-off-by: Darko Draskovic <darko.draskovic@gmail.com>

* Remove trailing whitespace

Signed-off-by: Darko Draskovic <darko.draskovic@gmail.com>

* Change build command

Signed-off-by: Darko Draskovic <darko.draskovic@gmail.com>
arvindh123 pushed a commit to arvindh123/supermq that referenced this issue Dec 17, 2023
* fix(e2e): Add admin role and permission

This commit adds a new constant and modifies the `createUser` function. The function now creates a user with an admin role and a domain with admin permissions.

Signed-off-by: Rodney Osodo <28790446+rodneyosodo@users.noreply.github.com>

* chore(ci): add e2e testing on CI

Signed-off-by: Rodney Osodo <28790446+rodneyosodo@users.noreply.github.com>

* Remove unnecessary time.Sleep calls and optimize code execution

Signed-off-by: Rodney Osodo <28790446+rodneyosodo@users.noreply.github.com>

---------

Signed-off-by: Rodney Osodo <28790446+rodneyosodo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants