-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Nodejs clustering to take advantage of multi-core hardware #68626
Comments
Pinging @elastic/kibana-platform (Team:Platform) |
Is this under platform or operation's scope? |
This is Platform. Operations is continuing to focus more on Build & CI rather than server performance or runtime. |
So what exactly do we want here? Would we just spawn multiple forks that are exact copies of the master process and let the node An example coming to mind would be telemetry: would each fork send the telemetry data on their own, or should the forks send their data to the master using IPC, which would be aggregated and sent as a single request from the master? |
I don't think we need anything fancy, just letting node cluster load balance across several processes. And then we need to make sure everything keeps working the way it should, like that multiple processes logging to a single file keeps on working. For plugins, the only kind of work I can think of that might cause problems is if there's a process that relies on We could expose an |
This potentially complicates the concurrency controls that Reporting takes advantage of via esqueue currently and task-manager in the future. When a report is run, it spawns a headless browser that opens Kibana and takes a screenshot. This headless browser consumes quite a bit of memory and CPU from the server running Kibana. As such, Reporting has limits in place that only allow a single instance of the browser per Kibana process to be run at a time. When there are multiple Kibana processes running on the same machine, this becomes more complicated. Allowing Reporting to only run on the "master" seems reasonable enough. Are we specifying |
In the 7.12 release, the team is going to investigate the basic architecture and possible breaking changes. |
We need to take into account logging here (especially rotating logs #56291)
|
Summary of the preliminary investigation (see the POC PR: #93380) Architecture summaryIn 'classic' mode, the Kibana server is started in the main NodeJS process. In clustering mode, the main NodeJS process would only start the The Cross-worker communicationWe will be adding a new For some of our changes (such as the The base API for such communication will be exposed from the export interface ClusteringServiceSetup {
// [...]
broadcast: (type: string, payload?: ClusterMessagePayload, options?: BroadcastOptions) => void;
addMessageHandler: (type: string, handler: MessageHandler) => MessageHandlerUnsubscribeFn;
} Note: to reduce cluster/non-cluster mode divergence, in non-clustering mode, these APIs would just be no-ops. It will avoid to force (most) code to check in which mode Kibana is running before calling them. Note: we could eventually use an Observable pattern instead of an handler one for Executing code on a single workerIn some scenario, we would like to have some part of the code executed from a single process. The SO migration would be a good example: we don't need to have each worker try to perform the migration, and we'd prefer to have one performing/trying the migration, and the other wait for it. Due to the architecture, we can't have the There are various ways to address such use-case. What seems to be the best compromise right know would be the concept of 'main worker'. The coordinator would arbitrary elect a worker as the 'main' one. The Note: in non-clustering mode, export interface ClusteringServiceSetup {
// [...]
isMainWorker: () => boolean;
} To take the example of SO migration, the runMigration() {
if(clustering.isMainWorker()) {
this.runMigrationsInternal().then((result) => {
applyMigrationState(result);
clustering.broadcast('migration-complete', { payload: result }, { persist: true }); // persist: true will send message even if subscriber subscribe after the message was actually sent
})
}
else {
const unsubscribe = clustering.addMessageHandler('migration-complete', ({ payload: result }) => {
applyMigrationState(result);
});
}
} Sharing state between workersNot sure if we will really need that, or if IPC broadcast will be sufficient. If we do need shared state, we will probably have to use syscall libraries to share buffers such as mmap-io, and expose an higher level API for that from core. Performance testingTest performed against a local development machine, with a 8-core cpu ( Non-cluster modeCluster, 2 workersCluster, 4 workers
There is currently no easy way to test the performance improvements this could provide on Cloud. On Cloud, Kibana is running in a containerised environment using CPU CFS quota and CPU shares. If we want to investigate perf improvement on Cloud further, our only option currently is to setup a similar-ish environment locally (which wasn't done during the initial investigation) Technical impacts in CoreHandling multi-process logsWe need to decide how we will be handling multiple processes outputting logs. Example of log output in a 2 workers cluster:
The workers logs are interleaved, and, most importantly, there is no way to see which process the log is coming from. We will need to address that. Identified options currently are:
We could do that by automatically adding a suffix depending on the process to the appropriate appenders configuration. Note that this doesn’t solve the problem for the console appender, which is probably a no-go.
We could add the process name information to the log messages, and add a new conversion to be able to display it with the pattern layout, such as %process for example. the default pattern could evolve to (ideally, only when clustering is enabled)
This will require some changes in the logging system implementation, as currently the BaseLogger has no logic to enhance the log record with context information before forwarding it to its appenders, but looks like the correct solution. The rolling file appenderThe rolling process of the Identified options are:
By using a broadcast message based mutex mechanism, the appenders could acquire a ‘lock’ to roll a specific file, and release the lock once the rotation is done.
Another option would be to have the coordinator perform the rotation instead. When a rolling is required, the appender would send a message to the coordinator, which would perform the rolling and notify the workers once the process is complete. Note that this option is even more complicated than the previous one, as it forces to move the rolling implementation outside of the appender, without any significant upsides.
We could go further, and change the way the logging system works in clustering mode by having the coordinator centralize the logging system. The worker’s logger implementation would just send messages to the coordinator. If this may be a correct design, the main downside is that the logging implementation would be totally different in cluster and non cluster mode, and is overall way more work that the other options. If no option is trivial, I feel like option 1) is still the most pragmatic one. Option 3) is probably a better design, but represents more work. It would probably be fine if all our appenders were impacted, but as only the rolling-file one is, I feel like going with 1) would be ok. The status APIIn clustering mode, the workers will all have an individual status. One could have a connectivity issue with ES while the other ones are green. Hitting the We will need to add a centralized status service in the coordinator. Also, as the We may also want to have the PID fileWithout changes, each worker is going to try to write and read the same PID file. Also, this breaks the whole pid file usage, as the PID stored in the file will be a arbitrary worker’s PID, instead of the coordinator (main process) PID. In clustering mode, we will need to have to coordinator handle the PID file logic, and to disable pid file handling in the worker's environment service. SavedObjects migrationIn the current state, all workers are going to try to perform the migration. Ideally, we would have only one process perform the migration, and the other ones just wait for a ready signal. We can’t easily have the coordinator do it, so we would probably have to leverage the ‘main worker’ concept to do so. The SO migration v2 is supposed to be resilient to concurrent attempts though, as we already support multi-instances Kibana, so this probably can be considered as an improvement (cc @rudolf) Open questions / things to solveMemory consumptionIn cluster mode, node options such as Workers error handlingUsually when using Data folderThe data folder ( One easy solution would be, when clustering is enabled, to create a sub folder under instanceUUIDThe same instance UUID ( Note that this could be a breaking change, as the single The Dev CLIIn development mode, we are already spawning processes from the main process: The Kibana server running in the main process actually just kickstarts core to get the configuration and services required to instantiate the Even if not technically blocking, it would greatly simplify parts of the workers and coordinator logic if we were able to finally extract the dev cli to no longer depend on core and to instantiate a 'temporary' server to access the Legacy service. Note that extracting and refactoring the dev cli is going to be required anyway to remove the last bits of legacy, as it currently relies on the legacy configuration to run (and is running from the legacy service). Technical impact on pluginsIdentifying things that may break
Is there, for example, some part of the code that is accessing and writing files from the data folder (or anywhere else) and makes the assumption that it is the sole process actually writing to that file?
Is there, for example, schedulers that are using the instanceUUID a single process id, in opposition to a single Kibana instance id? Are there situations where having the same instance UUID for all the workers is going to be a problem?
Is there any part of the code that needs to be executed only once in a multi-worker mode, such as initialization code, or starting schedulers? An example would be Identified required changesProbably not exhaustive for now. (plugin owners, we need your help here!)
We will probably want to restrict to a single headless per Kibana instance (see #68626 (comment)). For that, we will have to change the logic in createQueueFactory to only have the 'main' worker pool for reporting tasks.
Sending the data to the remote telemetry server should probably only be performed from a single worker, see FetcherTask Note that it seems that sending the data multiple times doesn’t have any real consequences, so this should be considered non-blocking and only an improvement.
Do we want the task scheduler to be running on a single worker? (see TaskClaiming, TaskScheduling, TaskPollingLifecycle)
Do we need the alerting task runner ( Summary of the user-facing breaking changes
If we want to have the status API returns each individual worker's status, we will need to change the output of the status API in clustering mode. Note that the new format for
Depending on our decision regarding the
If we decide to have each worker output log in distinct files, we would have to change the logging configuration to add a prefix/suffix to each log file, which would be a breaking change. Note that this option is probably not the one we'll choose.
We may need to have each worker uses a distinct data folder. Should this be considered a breaking change? |
@pgayvallet Thank you for this awesome write up! For telemetry we have two main areas affected by enabling clustering:
Fixing the points above is an optimizations rather than blockers. We already handle receiving multiple reports and storing the data multiple times. |
I'll add to @Bamieh's comment another use case for telemetry: rollups (I don't know if it was considered in point 2.) :) |
Regarding routing, I'd like to add another use case: from time to time, we reconsider adding support to websockets in Kibana. When running an API in This changes the way the HTTP server is bootstrapped: The coordinator opens the port and adds a piece of logic to pick up the worker that should handle the incoming request: /** Coordinator */
const workers = [...]
// Open up the server and send sockets to child. Use pauseOnConnect to prevent
// the sockets from being read before they are sent to the child process.
const server = require('net').createServer({ pauseOnConnect: true });
server.on('connection', (socket) => {
const worker = findStickySessionWorker();
worker.send('socket', socket);
});
server.listen(1337);
/** Worker */
process.on('message', (m, socket) => {
if (m === 'socket') {
if (socket) {
// Check that the client socket exists.
// It is possible for the socket to be closed between the time it is
// sent and the time it is received in the child process.
socket.end(`Request handled with ${process.argv[2]} priority`);
}
}
}); This could be a solution to the |
I'm not sure alerting itself is affected, but presumably task manager is, and so alerting (and other things) are indirectly affected via task manager. I'll just focus on task manager here. Currently task manager does "claims" for jobs to run based on the server uuid. I think this would still work with workers - each task manager in the worker would be doing "claims" for the same server uuid, which I think is basically the same as setting your Probably the "real" story for this should be a little more involved. We probably only want one worker doing the task claims, and then when it gets tasks to run, sends them to a worker to run. But this seems like we'd need to have some separate management of those worker-based tasks, since we want to distribute the load. That implies having some introspection on when workers die, and when new workers start up to replace them. Or is that not going to happen? We start n workers at startup, and if any "die", we terminate Kibana? |
It seems all things registering task to the task manager are affected too, as the tasks are going to be registered multiple times?
Yea, mid/long term, this is the kind of architecture we'd like to go to. Shorter term, just executing the tasks on a single worker is probably good enough
This is still opened to discussion, see the |
For cases where tasks are scheduled at plugin startup, we have an API to ensure the task only gets scheduled once (one task document created) -
We actually have an issue open to allow configuring task manager with zero task workers. Basically for diagnostic reasons, because lots of stuff will stop working if no tasks will ever run. We perhaps we could use this to run task manager "normally" in the "main" process, but not run it in the workers by setting task workers to zero. Perhaps the "main" process would end up being reserved for one-off things that can't/shouldn't run in worker processes. |
The 'main' process is the coordinator. It doesn't load plugins, nor even start an actual Kibana server, so this would not be possible. This is why the 'main worker' concept was introduced. We could eventually change our Plugin and core services API to allow loading distinct code on the coordinator and the workers, but this seems overcomplicated and hard to 'backport' to non-cluster mode compared to the 'main worker' trick that is just always true on non-cluster mode. |
The way it works in cloud is that you get access to all cores BUT you have a CFS quota that is between So I think you will get at least some benefit? If you want to benchmark to understand better, you can just run the docker kibana with |
Closing at not planned for now since our main reason for pursuing it in the near term was to isolate background tasks, which we instead achieved by introducing |
Kibana uses a single Node process/thread to serve HTTP traffic. This means Kibana cannot take advantage of multi-core hardware making it expensive to scale out Kibana.
This should be an optional way to run Kibana since users running Kibana inside docker containers might choose to rather use their container orchestration to run a container (with a single kibana process) per host CPU.
Link to RFC: #94057
POC PR: #93380
Phase 0: Investigation
Phase 1: Initial implementation
process
field #104031[Breaking][core.metrics] Removeprocess
field from ops metrics & /stats API #104124[8.0] KibanaGET /api/stats
breaking changes beats#27569processes
metrics from kibana 8.0 #113149process
andprocesses
metrics across 7.x and 8.x indices #117614/stats
and/status
APIsPhase 2: Beta
Phase 3: GA
The text was updated successfully, but these errors were encountered: