Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG, WORKER_WATCHER] RR stops responding when new worker fails to initialize #772

Closed
Flygrounder opened this issue Aug 26, 2021 · 1 comment · Fixed by #771
Closed

[BUG, WORKER_WATCHER] RR stops responding when new worker fails to initialize #772

Flygrounder opened this issue Aug 26, 2021 · 1 comment · Fixed by #771
Assignees
Labels
B-bug Bug: bug, exception F-need-verification Y-Release blocker Priority: Release blocker
Milestone

Comments

@Flygrounder
Copy link

Scenario:
I start server with file break missing. After workers initialize I create it, wait until any worker reaches TTL and RR fails to initialize a new one. Then I remove the file.

Worker:

<?php

use Spiral\RoadRunner;
use Nyholm\Psr7;

include "vendor/autoload.php";

$worker = RoadRunner\Worker::create();
$psrFactory = new Psr7\Factory\Psr17Factory();

$worker = new RoadRunner\Http\PSR7Worker($worker, $psrFactory, $psrFactory, $psrFactory);

if (file_exists('break')) {
	throw new Exception('oops');
}

while ($req = $worker->waitRequest()) {
    try {
        $rsp = new Psr7\Response();
        $rsp->getBody()->write('Hello world!');

        $worker->respond($rsp);
    } catch (\Throwable $e) {
        $worker->getWorker()->error((string)$e);
    }
}

I expected to see this happen: RR periodically tries to create a new worker and after file was deleted it will recover and continue responding to requests.

Instead, this happened: It does not recover and never responds to requests.

The version of RR used: 2.3.2

My .rr.yaml configuration is:

server:
  command: "php psr-worker.php"

http:
  address: 0.0.0.0:8080
  pool:
    num_workers: 2
    supervisor:
      ttl: 20s

Logs

2021-08-26T12:15:57.185+0300	DEBUG	rpc         	rpc/plugin.go:85	Started RPC service	{"address": "tcp://127.0.0.1:6001", "plugins": ["informer", "resetter"]}
2021-08-26T12:15:57.206+0300	DEBUG	server      	server/plugin.go:231	worker constructed	{"pid": 48204}
2021-08-26T12:15:57.224+0300	DEBUG	server      	server/plugin.go:231	worker constructed	{"pid": 48205}
2021-08-26T12:16:17.225+0300	WARN	server      	server/plugin.go:225	worker TTL reached	{"pid": 48204}
github.com/spiral/roadrunner/v2/plugins/server.(*Plugin).collectPoolEvents
	/home/flygrounder/go/pkg/mod/github.com/spiral/roadrunner/v2@v2.4.0-beta.1/plugins/server/plugin.go:225
github.com/spiral/roadrunner/v2/pkg/events.(*HandlerImpl).Push
	/home/flygrounder/go/pkg/mod/github.com/spiral/roadrunner/v2@v2.4.0-beta.1/pkg/events/general.go:39
github.com/spiral/roadrunner/v2/pkg/pool.(*supervised).control
	/home/flygrounder/go/pkg/mod/github.com/spiral/roadrunner/v2@v2.4.0-beta.1/pkg/pool/supervisor_pool.go:151
github.com/spiral/roadrunner/v2/pkg/pool.(*supervised).Start.func1
	/home/flygrounder/go/pkg/mod/github.com/spiral/roadrunner/v2@v2.4.0-beta.1/pkg/pool/supervisor_pool.go:96
2021-08-26T12:16:17.225+0300	WARN	server      	server/plugin.go:225	worker TTL reached	{"pid": 48205}
github.com/spiral/roadrunner/v2/plugins/server.(*Plugin).collectPoolEvents
	/home/flygrounder/go/pkg/mod/github.com/spiral/roadrunner/v2@v2.4.0-beta.1/plugins/server/plugin.go:225
github.com/spiral/roadrunner/v2/pkg/events.(*HandlerImpl).Push
	/home/flygrounder/go/pkg/mod/github.com/spiral/roadrunner/v2@v2.4.0-beta.1/pkg/events/general.go:39
github.com/spiral/roadrunner/v2/pkg/pool.(*supervised).control
	/home/flygrounder/go/pkg/mod/github.com/spiral/roadrunner/v2@v2.4.0-beta.1/pkg/pool/supervisor_pool.go:151
github.com/spiral/roadrunner/v2/pkg/pool.(*supervised).Start.func1
	/home/flygrounder/go/pkg/mod/github.com/spiral/roadrunner/v2@v2.4.0-beta.1/pkg/pool/supervisor_pool.go:96
2021-08-26T12:16:17.249+0300	INFO	server      	server/plugin.go:265	[php debug] 
Fatal error: Uncaught Exception: oops in /home/flygrounder/Projects/roadrunner-freeze/psr-worker.php:14
Stack trace:
#0 {main}
  thrown in /home/flygrounder/Projects/roadrunner-freeze/psr-worker.php on line 14
 []
2021-08-26T12:16:17.250+0300	ERROR	server      	server/plugin.go:221	pool error	{"error": "worker_watcher_wait: WorkerAllocate:\n\tworker_watcher_allocate_new:\n\tfactory_spawn_worker_with_timeout: fetch_pid: CRC mismatch; signal: killed; process_wait: signal: killed"}
github.com/spiral/roadrunner/v2/plugins/server.(*Plugin).collectPoolEvents
	/home/flygrounder/go/pkg/mod/github.com/spiral/roadrunner/v2@v2.4.0-beta.1/plugins/server/plugin.go:221
github.com/spiral/roadrunner/v2/pkg/events.(*HandlerImpl).Push
	/home/flygrounder/go/pkg/mod/github.com/spiral/roadrunner/v2@v2.4.0-beta.1/pkg/events/general.go:39
github.com/spiral/roadrunner/v2/pkg/worker_watcher.(*workerWatcher).wait
	/home/flygrounder/go/pkg/mod/github.com/spiral/roadrunner/v2@v2.4.0-beta.1/pkg/worker_watcher/worker_watcher.go:252
github.com/spiral/roadrunner/v2/pkg/worker_watcher.(*workerWatcher).Watch.func1
	/home/flygrounder/go/pkg/mod/github.com/spiral/roadrunner/v2@v2.4.0-beta.1/pkg/worker_watcher/worker_watcher.go:63
2021-08-26T12:17:54.678+0300	WARN	server      	server/plugin.go:219	no free workers in pool{"error": "static_pool_exec: NoFreeWorkers:\n\tworker_watcher_get_free_worker:\n\tcontext deadline exceeded"}
github.com/spiral/roadrunner/v2/plugins/server.(*Plugin).collectPoolEvents
	/home/flygrounder/go/pkg/mod/github.com/spiral/roadrunner/v2@v2.4.0-beta.1/plugins/server/plugin.go:219
github.com/spiral/roadrunner/v2/pkg/events.(*HandlerImpl).Push
	/home/flygrounder/go/pkg/mod/github.com/spiral/roadrunner/v2@v2.4.0-beta.1/pkg/events/general.go:39
github.com/spiral/roadrunner/v2/pkg/pool.(*StaticPool).takeWorker
	/home/flygrounder/go/pkg/mod/github.com/spiral/roadrunner/v2@v2.4.0-beta.1/pkg/pool/static_pool.go:232
github.com/spiral/roadrunner/v2/pkg/pool.(*StaticPool).Exec
	/home/flygrounder/go/pkg/mod/github.com/spiral/roadrunner/v2@v2.4.0-beta.1/pkg/pool/static_pool.go:145
github.com/spiral/roadrunner/v2/pkg/pool.(*supervised).Exec
	/home/flygrounder/go/pkg/mod/github.com/spiral/roadrunner/v2@v2.4.0-beta.1/pkg/pool/supervisor_pool.go:53
github.com/spiral/roadrunner/v2/pkg/worker_handler.(*Handler).ServeHTTP
	/home/flygrounder/go/pkg/mod/github.com/spiral/roadrunner/v2@v2.4.0-beta.1/pkg/worker_handler/handler.go:145
github.com/spiral/roadrunner/v2/plugins/http.(*Plugin).ServeHTTP
	/home/flygrounder/go/pkg/mod/github.com/spiral/roadrunner/v2@v2.4.0-beta.1/plugins/http/plugin.go:283
net/http.serverHandler.ServeHTTP
	/usr/local/go/src/net/http/server.go:2878
net/http.(*conn).serve
	/usr/local/go/src/net/http/server.go:1929

@Flygrounder Flygrounder added B-bug Bug: bug, exception F-need-verification labels Aug 26, 2021
@rustatian rustatian added this to the 2.4.0 milestone Aug 26, 2021
@rustatian rustatian added the Y-Release blocker Priority: Release blocker label Aug 26, 2021
@rustatian rustatian changed the title [BUG] Daemon stops responding when new worker fails to initialize [BUG, WORKER_WATCHER] RR stops responding when new worker fails to initialize Aug 26, 2021
@rustatian
Copy link
Member

rustatian commented Aug 26, 2021

Thanks for the report @Flygrounder . Will be fixed in the next beta and finally in the 2.4.0 release the next week.

rustatian added a commit that referenced this issue Aug 31, 2021
…db`, #772: fix(`worker_watcher`): bug with failed worker while TTL-ed

#770: feat(`driver,jobs`): local persistent driver based on the `boltdb`, #772: fix(`worker_watcher`): bug with failed worker while TTL-ed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-bug Bug: bug, exception F-need-verification Y-Release blocker Priority: Release blocker
Projects
None yet
2 participants