Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A more resilient circuit breaker strategy #533

Closed
5 tasks done
mageddo opened this issue Aug 1, 2024 · 0 comments
Closed
5 tasks done

A more resilient circuit breaker strategy #533

mageddo opened this issue Aug 1, 2024 · 0 comments
Labels
enhancement A little behavior change which so small that can't be considered a feature feature Definition of a feature to be implemented

Comments

@mageddo
Copy link
Owner

mageddo commented Aug 1, 2024

Summary & Motivation

Currently the circuit breaker strategy is set as below:

CircuitBreaker
  .builder()
  .failureThreshold(3)
  .failureThresholdCapacity(10)
  .successThreshold(5)
  .testDelay(Duration.ofSeconds(20))
  .build();

It's a very unflexible config, remote servers that are totally offline will be reactivated every 20 seconds and stable
servers which got 3 timeouts will be removed from the pool for 20 seconds, which is not ideal because it will make some
requests slower than they could be and not solve some requests for some time because of a temporary and isolated
denial of service of one remote server.

Description

  • Define a new circuit break strategy that will be more flexible and will be able to be configured by
    the user, strategy will be named CANARY_RATE_THRESHOLD
  • Actual strategy will be named: STATIC_THRESHOLD
  • When no config file exists the STATIC_THRESHOLD will be configured by default
  • When there is already a config file with circuit breaker config defined it will be parsed as STATIC_THRESHOLD
    even if no type was defined, making DPS compatible with previous versions.

Estratégia 2

Unit Test use cases

CircuitBreaker
  .builder()
  .failureRateThreshold(21f)
  .minimumNumberOfCalls(50)
  .permittedNumberOfCallsInHalfOpenState(10)
  .build();
....
"solverRemote" : {
    "circuitBreaker" : { 
      "strategy": "CANARY_RATE_THRESHOLD",
      "failureRateThreshold" : 21, // If the failure rate is equal to or greater than the threshold, the CircuitBreaker will transition to open. criteria: values greater than 0 and not greater than 100.
      "minimumNumberOfCalls" : 50, // Configures the minimum number of calls which are required (per sliding window period) before the CircuitBreaker can calculate the error rate.
      "permittedNumberOfCallsInHalfOpenState" : 10 // Configures the number of permitted calls when the CircuitBreaker is half open.
    }
  }

Circuito Fechado - Servidor cai

No circuit breaker, depois de 50 chamadas feitas, se 10 execuções tiverem 20% de erro ou mais, abre o circuito

Circuito Aberto - Servidor Volta

Worker que roda a cada 1 segundo checa o circuito de 1 até 3 chamadas, se uma delas retornar OK, muda o estado
para meio aberto.

Circuito Meio Aberto - Servidor Saudável

No circuit breaker, 10 execuções com menos de 20% de erro, o circuito muda o estado para fechado

Circuito Meio Aberto - Servidor Não Saudável

No circuit breaker, 10 execuções com mais de 20% de erro, o circuito mudar o estado para aberto

Tasks

  • Create an abstraction of circuit breaker implementation - Keep failsafe dep as resilience4j can't reproduce STATIC_THRESHOLD usecase, then I will need to abstract circuit breaker solution to be able to use the two libs at the same time
  • Create CANARY_RATE_THRESHOLD strategy
  • Make it possible to expose the new circuit breaker strategy at the config file
  • Expose new circuit breaker strategy config at the JSON file
  • Consider all circuits as open on app start this will evict to app get resolution fails right on the start because the first server on the remote servers list is offiline

Alternatives

Estratégia 1

  • Um servidor com circuito fechado tem que falhar pelo menos 50% das vezes com uma quantidade mínima de 10
    tentativas num período de 20 segundos
  • Um servidor com circuito aberto será testado a cada 3 segundos, apenas uma requisição irá ser enviada a ele,
    se 70% das requisições derem sucesso ele fecha novamente

Risks and Assumptions

  • The new strategy will be created but not defined as default for now as it would be a breaking change.
@mageddo mageddo added the enhancement A little behavior change which so small that can't be considered a feature label Aug 1, 2024
@mageddo mageddo moved this to To Do in DNS Proxy Server Aug 28, 2024
mageddo added a commit that referenced this issue Sep 4, 2024
mageddo added a commit that referenced this issue Sep 4, 2024
* Fixing SolverRemote NPE #533

* creating test
mageddo added a commit that referenced this issue Sep 19, 2024
* Create FUNDING.yml (#512)

* #513 Refactoring  on ConfigJson Module (#514)

* testing no remote sovlers

* written failing test which found the bug

* test was wrong, feature is working

* extracting to specific class

* refactoring

* created tests

* Fixing `noRemoteServers` wasn't being respected at the JSON file #513 (#515)

* bug and test fixed

* fixing one more test

* release notes

* [Gradle Release Plugin] - new version commit:  '3.24.1-snapshot'.

* Mitigate arm release failure due inexistence of required debian package #517 (#518)

* try to download the deb from the two possible urls

* release notes

* [Gradle Release Plugin] - new version commit:  '3.24.2-snapshot'.

* release notes

* one more option

* Use Virtual Threads (#519)

* migrating thread pools to virtual threads

* using virtual thread executor when querying remote dns servers

* fixing test

* stress test related code

* configuring supervisor

* adjusting default dns

* created stress test

* created stress test

* creating docs of how to use stress tests

* more asserts

* creating collector structure

* skipping login page and set as admin

* configuring metrics

* fixing test conflict with running dps on machine

* configuring default dashboards

* changing filter time

* updating the docs

* adjusting the docs order

* linking doc

* clean code

* clean code

* release notes

* [Gradle Release Plugin] - new version commit:  '3.25.0-snapshot'.

* Ensure thread will have names at the logs, behavior changed since `3.25.0`  (#520)

* release notes

* [Gradle Release Plugin] - new version commit:  '3.25.1-snapshot'.

* upgrading logback

to be compatible with virtual threads at logging, see qos-ch/logback#737

* release notes

* Internal feature toggle to disable virtual threads (#521)

* toggle to swap between virtual and physical threads

* creating system property

* Migrating cache to caffeine (#523)

* testing cache ttl

* testing

* adjusting test

* release notes

* clean codfe

* clean code

* clean code

* [Gradle Release Plugin] - new version commit:  '3.25.2-snapshot'.

* Disable connection check at every query (#525)

* testing ping call

* creating toggle

* release notes

* [Gradle Release Plugin] - new version commit:  '3.25.3-snapshot'.

* testing ping on specified port

* testing ping api

* More Effective RemoteSolver Circuit Check #526 (#528)

* refactoring

* release notes, Log Remote Servers circuit states

* [Gradle Release Plugin] - new version commit:  '3.25.4-snapshot'.

* ignoring to half open and from half open to open transition

* refactoring

* refactoring

* comment

* add logs

* fixme notes

* comments

* adjusting fixme notes

* docs

* clean code

* Change the caching strategy to minimize the locks and prevent deadlocks #522 (#529)

* testing deadlock

* removing calculate code from lock statement to prevent deadlocks

* clean code

* use single threaded queue to performe cache clear to prevent deadlocks

* clear cache in background

* add asserts

* release notes

* [Gradle Release Plugin] - new version commit:  '3.25.5-snapshot'.

* change thread name

* change docs

* Refactoring SolverRemote module to implement circuit optimization #526 (#530)

* release notes

* refactoring and creating test for the use case

* refactoring

* clean code

* refactoring packages

* release notes

* refactoring

* creating temp test

* release notes

* [Gradle Release Plugin] - new version commit:  '3.25.6-snapshot'.

* SolverRemote: Exclude remote servers with open circuits. #526 (#531)

* release notes

* removing old behavior test and activating the new one

* refactoring and testing

* refactoring tests

* refactoring

* [Gradle Release Plugin] - new version commit:  '3.25.7-snapshot'.

* disabling feature

* release notes

* SolverRemote: Exclude remote servers with open circuits. #526 (#532)

* release notes

* [Gradle Release Plugin] - new version commit:  '3.25.8-snapshot'.

* implementing and adjusting test to the new behavior

* testing circuit status refresh

* refactoring names

* fixing scenario

* cd deps fix (#534)

* add missing package (#535)

* installing deps (#536)

* must differ types considering their generic types (#538)

* Creating flag to force dns server start even when in test mode #480 (#539)

* creating flag to forcec dns server start even when in test mode

* testing

* must build default json config with default values (#540)

* caching in test method, leading with usecase when a second thread is running the test (#541)

* Handling and logging fatal errors (#542)

* handling and logging fatal errors

* release notes

* [Gradle Release Plugin] - new version commit:  '3.25.9-snapshot'.

* Create `IntTests` suite, they are comp tests which can be run within native image #480 (#537)

* upgrading rest assured due groovy error with nativeTest task

Error encountered while parsing java.lang.invoke.MutableCallSite.setTarget(MutableCallSite.java:155)
Parsing context:
   at java.lang.invoke.SwitchPoint.invalidateAll(SwitchPoint.java:225)
   at org.codehaus.groovy.vmplugin.v8.IndyInterface.invalidateSwitchPoints(IndyInterface.java:186)
   at org.codehaus.groovy.vmplugin.v8.IndyInterface$$Lambda/0x00000007c2399648.updateConstantMetaClass(Unknown Source)
�

* ignoring rest assured error and leaving it fail at tests if the problematic code will be used anyway

* int test it's working

* clean code

* fixing bug

* configuring native test at the same source set as test

* enabling native image test

* adjusting reflection generation

* adjusts

* test is working

* adjusting conf path

* print test logs to console

* trying to get logs to check why test is failing

* finding 'stream closed' cause

* handling fatal errors

* testing fatal error handling

* delete hello world int test

* re-enabling restassured

* reverting restassured feature disabling

* clean code

* updating the docs about native image test

* refactoring

* refactoring

* refactoring

* refactoring

* explaining about the kind of tests

* [Gradle Release Plugin] - new version commit:  '3.25.9-snapshot'.

* fixing test

* release notes

* refactoring the docs

* unnecessary path

* clean code

* make test repeatable

* testing test class and refactoring

* refactoring class

* fixing test

* caching

* testing

* creating flags

* wasn't using the default config when file was empty

* make test repeatable

* hostnames list must be changable

* refactoring

* refactoring

* troubleshooting

* refactorings

* testing

* refactoring

* testing

* finally fixing bug which gets the wrong list

* leading with methods

* fixing field parsing

* reverting

* clean code

* Run the automated tests over the native image (#543)

* trying to exclude groovy from classpath

* enabling int tests at the ci

* enabling int tests

* caffeine native image reflection configs

* adjusting resources config

* adjusting resources include config

* add missing graal resources metadata

* clean code

* flag is now unnecessary

* release notes

* [Gradle Release Plugin] - new version commit:  '3.25.10-snapshot'.

* Feature Request: New Template proposal (#547)

* new template proposal

* specifying what is required and what is not

* adjusting labels and creating general content

* unify summary and motivation

* empty content

* Agnostic interface to support multiple circuit breaker strategies config (#550)

* setup resilience4j

* specifying circuit breaker strategy name

* creating agnostic interface to support multiple circuit breaker strategy types

* fixme

* [Gradle Release Plugin] - new version commit:  '3.25.11-snapshot'.

* release notes

* Creating an abstraction of circuit breaker implementation (#551)

* setup resilience4j

* specifying circuit breaker strategy name

* creating agnostic interface to support multiple circuit breaker strategy types

* fixme

* sppliting circuit breaker factory

* refactoring is done

* test is passing

* test is passing

* test is passing

* refactoring package

* [Gradle Release Plugin] - new version commit:  '3.25.11-snapshot'.

* release notes

* [Gradle Release Plugin] - new version commit:  '3.25.12-snapshot'.

* release notes

* fixme note

* fixme note

* Unifying circuit breaker abstractions  (#553)

* setup resilience4j

* specifying circuit breaker strategy name

* creating agnostic interface to support multiple circuit breaker strategy types

* fixme

* sppliting circuit breaker factory

* refactoring is done

* test is passing

* test is passing

* test is passing

* refactoring package

* [Gradle Release Plugin] - new version commit:  '3.25.11-snapshot'.

* release notes

* [Gradle Release Plugin] - new version commit:  '3.25.12-snapshot'.

* release notes

* fixme note

* fixme note

* refactoring to support multiple delegates

* refactoring name

* refactoring

* implementing non resilient strategy

* refactoring and test

* removing unnecessary test

* refactoring and fixing test

* release notes

* [Gradle Release Plugin] - new version commit:  '3.25.13-snapshot'.

* Fixing SolverRemote NPE #533 (#556)

* Fixing SolverRemote NPE #533

* creating test

* Specify config file path by env and fixing arm release (#557)

* Implementing env config file path option

* fixing compiling errors

* fixing test

* fixing test

* testing

* updating the docs

* [Gradle Release Plugin] - new version commit:  '3.26.0-snapshot'.

* release notes

* fixing arm build mirror and ajusting test

* new mirrors

* formatting

* refactoring

* clean code

* Wait module (#558)

* module to wait things to happen while checking on them

* testing the module

* fixing arm release (#559)

* found previously used arm deb build (#561)

* CommandLine module upgrades (#560)

* command lines module upgrades

* new tests

* missing dep

* missing dep

* release notes

* [Gradle Release Plugin] - new version commit:  '3.27.0-snapshot'.

* build optimizations (#562)

* build optmizations

* reduce int test time

* reduce int test time

* change level to b

* Specifying config source (#563)

* specifying config source

* adjusting tests

* updating test config files

* release notes

* Int test for all solvers (#564)

* release notes

* [Gradle Release Plugin] - new version commit:  '3.29.0-snapshot'.

* Creating support for testing DPS really like a integration test, crating int test for solver remote happy pah

* clean code

* clean code

* configuring templates

* configuring templates

* option to get answer ip

* comptest task wont run int test

* creating task to run all tests

* adjusting ci steps

* adjusting ci

* Eager module (#565)

* eager module classes

* eager module classes

* tests

* new json utils method

* release notes

* [Gradle Release Plugin] - new version commit:  '3.30.0-snapshot'.

* Improvements (#566)

* setup resilience4j

* specifying circuit breaker strategy name

* creating agnostic interface to support multiple circuit breaker strategy types

* fixme

* sppliting circuit breaker factory

* refactoring is done

* test is passing

* test is passing

* test is passing

* refactoring package

* [Gradle Release Plugin] - new version commit:  '3.25.11-snapshot'.

* release notes

* [Gradle Release Plugin] - new version commit:  '3.25.12-snapshot'.

* release notes

* fixme note

* fixme note

* refactoring to support multiple delegates

* refactoring name

* refactoring

* implementing non resilient strategy

* refactoring and test

* removing unnecessary test

* refactoring and fixing test

* creating a comp test to check integratio between modules

* new test detecting bug

* fixing bug and release notes

* [Gradle Release Plugin] - new version commit:  '3.25.14-snapshot'.

* comp test wont be able to be mocked, creating a int test instead

* creating dps binary executable finder

* configuring sandbox

* creating sandbox and tests

* fixing test

* setup execution

* creating a dummy signal based healthcheck

* creating a dummy signal based healthcheck

* creating a dummy signal based healthcheck

* registering and testing eager beans

* fixing test

* creating a lot of features related to watch process execution

* new way to excute command line  and specify stream handler

* killing processes after test execution

* fixing test

* fixing test

* fixing test

* fixing test

* generating all jar dep for int test automatically

* Fixing int test binary executor

* unnecessary method

* refactoring test to use json config file only

* all working

* adjusting test

* fixing test

* fixing bugs and creating tests

* refactoring inner class to upper

* reduce native image optimization for int test to reduce build time

* fixing test

* unnecessary

* clean code

* deleteing unnecessary file

* release notes

* [Gradle Release Plugin] - new version commit:  '3.29.0-snapshot'.

* Creating support for testing DPS really like a integration test, crating int test for solver remote happy pah

* clean code

* clean code

* configuring templates

* configuring templates

* option to get answer ip

* clean code

* deleting healthcheck feature

* using pair of right library

* unnecessary code (#567)

* Canary Rate Strategy Implementation (#570)

* creating empty strategy

* creating tests to validate reslience4j behavior

* more tests

* new test

* new tests and features

* notes

* refactoring and testing

* creating healthchecker

* refactoring

* implementing

* adjusting test

* clean code

* refactoring

* release notes

* [Gradle Release Plugin] - new version commit:  '3.30.1-snapshot'.

* Bump send and express in /app

Bumps [send](https://github.com/pillarjs/send) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together.

Updates `send` from 0.18.0 to 0.19.0
- [Release notes](https://github.com/pillarjs/send/releases)
- [Changelog](https://github.com/pillarjs/send/blob/master/HISTORY.md)
- [Commits](pillarjs/send@0.18.0...0.19.0)

Updates `express` from 4.18.2 to 4.21.0
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.0/History.md)
- [Commits](expressjs/express@4.18.2...4.21.0)

---
updated-dependencies:
- dependency-name: send
  dependency-type: indirect
- dependency-name: express
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Elvis Souza <edigitalb@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@mageddo mageddo added the feature Definition of a feature to be implemented label Nov 6, 2024
@mageddo mageddo closed this as completed Nov 6, 2024
@github-project-automation github-project-automation bot moved this from In progress to Done in DNS Proxy Server Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A little behavior change which so small that can't be considered a feature feature Definition of a feature to be implemented
Projects
Status: Done
Development

No branches or pull requests

1 participant