-
Notifications
You must be signed in to change notification settings - Fork 674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide stable local environment #135
Comments
Hi @mijicd, I also experienced the same problem with the docker-compose approach and I started testing something which will fix it. I will create a pull request tonight. Running Mainflux on Minikube will also be nice and I was also thinking to start experimenting with Minkube when I finish with the fix for the docker-compose approach. Best Regards, |
Hi, I also started reading and testing how we can improve the stability of the development environment. Possible solutions:
In both cases we need to define commands or api endpoints with which we can check the health status of the containers. I think the best way to solve this issue is to adjust/refactor the Mainflux services in such a way that they can survive failures of their dependencies and advertise their health status accordingly. For example, if the manager loses the connection to the database it should be able to reconnect when the database it is available again and to update it health status (no db connection = unhealthy, db connection = healthy)
@drasko @mijicd What do you think about this? Should we fix the dependencies and the start up order of the containers for the version 1.0 and adjust the healthchecks later or we are going to do it properly now? @drasko: we can join forces and work together on this ;) Best Regards, |
Deprecated by #159. Changes in the underlying infrastructure will drop all of the issues that we discussed above. |
@chombium health check already exists in Mainfux: https://github.com/mainflux/mainflux/issues/152. These are endpoints of this type: https://github.com/mainflux/mainflux/blob/master/manager/api/transport.go#L121 Besides that, there are logs and especially Prometheus metrics that can be used for monitoring. However, let's try to simplify the arch as proposed by @mijicd in #159 and see later if something else is needed. All help is more than welcome! |
@drasko the proposition by @mijicd looks really good. I also really like the fact that the messages will not be persisted and the users will have the possibility to use a database (persistence) which they like. I think the only thing left would be adjustment of the services so that they can survive outages of the services they depend on. I will take a better look and write what should eventually be changed. Best Regards, |
@chombium for the persistence, I was thinking that we publish our untested connectors (for example here we have Regarding fault tolerance of the services, I implemented the first version using exponential backoffs with timeout, but @mijicd explicitly insisted that we remove this and to apply circuit-breaker pattern. I will let him explain more, but I am willing to examine how we can make the system more fault tolerant (sometimes I regret that we did not wrote all of this in Erlang :)). |
@drasko Please note that the circuit breaker is not a replacement for exponential backoff. Exponential backoff was present and used only on startup, and only toward the database. Circuit breaker is introduced between manager service and any other service utilizing its client. @chombium Currently there is a dependency between the adapter(s) and manager service. All the services using manager's client library will automatically use circuit breaker. Unfortunately, this is not yet implemented in MQTT adapter, since that adapter is written in javascript, therefore does not use manager's go client. It will be resolved in the (near) future. |
@mijicd is there a particular reason to remove this exponential backoff. Also, on the disconnection from DB (or other services), should we apply the backoff (i.e. should we try to recover) or die? @chombium circuit breaker comes with |
@drasko I want the service to fail fast when database is not available on startup. Disconnections occurring in runtime will cause errors, resulting in 5xx responses from manager service. Technically, we can wrap the DB interaction in CB as well, but it might be non-trivial to implement. |
OK - so:
Sounds good to me. |
Once again, the only place where CB is used is in interaction with manager service, with exception being MQTT adapter. |
* Add experimental flag to environments Signed-off-by: Darko Draskovic <darko.draskovic@gmail.com> * Add experimental environment file Signed-off-by: Darko Draskovic <darko.draskovic@gmail.com> * Add default environment Signed-off-by: Darko Draskovic <darko.draskovic@gmail.com> * Remove trailing whitespace Signed-off-by: Darko Draskovic <darko.draskovic@gmail.com> * Change build command Signed-off-by: Darko Draskovic <darko.draskovic@gmail.com>
* fix(e2e): Add admin role and permission This commit adds a new constant and modifies the `createUser` function. The function now creates a user with an admin role and a domain with admin permissions. Signed-off-by: Rodney Osodo <28790446+rodneyosodo@users.noreply.github.com> * chore(ci): add e2e testing on CI Signed-off-by: Rodney Osodo <28790446+rodneyosodo@users.noreply.github.com> * Remove unnecessary time.Sleep calls and optimize code execution Signed-off-by: Rodney Osodo <28790446+rodneyosodo@users.noreply.github.com> --------- Signed-off-by: Rodney Osodo <28790446+rodneyosodo@users.noreply.github.com>
Due to multiple reported issues with the docker-compose approach using bash script to spawn containers, it is a top-priority to provide a stable environment for local development / demonstrations. Such system can be distributed either in Vagrant box(es) or as Minikube deployment.
The text was updated successfully, but these errors were encountered: