Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(reports): sidecar container report generation #779

Merged
merged 24 commits into from
Jan 13, 2022

Conversation

andrewazores
Copy link
Member

@andrewazores andrewazores commented Dec 7, 2021

Related to cryostatio/cryostat#8
Fixes #324 (no longer necessary)

This abstracts report generation to reduce the coupling of the report caches and report handlers to the SubprocessReportGenerator, and then introduces an alternate report generator implementation that performs HTTP calls out to a separate sidecar container (https://github.com/cryostatio/cryostat-reports). The user selects which report generation strategy is used by setting the environment variable CRYOSTAT_REPORT_GENERATOR. If this env var is unset or empty then Cryostat will default to forking a subprocess to generate the report as usual. If it is set then it is expected to be the URL to a cryostat-reports instance (or a loadbalanced service in front of some replicas), and Cryostat will delegate report generation off to the sidecar(s).

@andrewazores andrewazores added the feat New feature or request label Dec 7, 2021
@andrewazores
Copy link
Member Author

@andrewazores andrewazores marked this pull request as ready for review December 8, 2021 16:25
@andrewazores andrewazores marked this pull request as draft December 8, 2021 16:26
@andrewazores andrewazores force-pushed the sidecar-reports branch 2 times, most recently from a5802d9 to aabfdcd Compare December 8, 2021 17:04
@andrewazores andrewazores marked this pull request as ready for review December 8, 2021 17:14
smoketest.sh Outdated Show resolved Hide resolved
smoketest.sh Outdated Show resolved Hide resolved
@hareetd
Copy link
Contributor

hareetd commented Dec 9, 2021

A general question before I start reviewing the actual code changes: if I'm understanding this PR correctly, by delegating report generation to a sidecar container used strictly for that purpose, there is more memory available for report generation (compared to a subprocess inside the Cryostat instance container which must also handle every other function of Cryostat), meaning OOM errors should not occur?

Edit: And this process of requesting report generation and receiving the generated report is done strictly through HTTP calls between the Cryostat container and the sidecar container?

@andrewazores
Copy link
Member Author

andrewazores commented Dec 9, 2021

A general question before I start reviewing the actual code changes: if I'm understanding this PR correctly, by delegating report generation to a sidecar container used strictly for that purpose, there is more memory available for report generation (compared to the Cryostat instance container which must also handle every other function of Cryostat), meaning OOM errors should not occur?

There isn't necessarily more memory available, since that depends on the container limits set by the container runtime platform.

I explained a bit of the resource requirements rationale in this comment: https://github.com/cryostatio/cryostat/pull/779#discussion_r766053568

The main problem is that report generation is going to be a relatively less-frequently used feature, but it requires by far the most resources. This leads to a classic resource provisioning problem where an end user would need to provision the Cryostat container with enough resources to be able to meet peak resource demands (during report generation), but the majority of the time those resources are not needed and end up being wasted. In a cloud environment, overprovisioned resources are a very direct waste of money at the end of the day.

Splitting out report generation into a sidecar does mean that report generation should never cause Cryostat itself to OOM (well, it's possible, but the memory usage on a reports request is now just whatever is required for the HTTP request plus enough to hold the HTML document in memory, basically - some number of KB). The cryostat-reports sidecar could still OOM, but this is isolated away from the main Cryostat container and should just turn up a 500 response.

By splitting the report generation into a separate container then it becomes much easier to provision resources appropriately. The main Cryostat container only needs enough resources to run its webserver and perform various relatively lightweight operations over HTTP and JMX, so its footprint can be small and therefore running it and keeping it deployed can be cheap. Users who only occasionally use automated reports can provision just a single container for it and only allocate it a small amount of resources, letting it fail if those resources are insufficient - or just continue using subprocess generation if they're really, low priority. The more you care about reports and the more often you want to see them, the more resources you can provision to the report container and/or the more replicas of the container you can spin up.

In the future I hope to have cryostat-reports built and runnable as a Quarkus native image as well, which would further reduce the resource footprint and also open the door to running it serverless, meaning that the cryostat-reports container would actually be started up on-demand when a reports generation request is made, and stopped after the response has been completed. This should make Cryostat as a whole just about the smallest and cheapest to run that it can be.

Edit: And this process of requesting report generation and receiving the generated report is done strictly through HTTP calls between the Cryostat container and the sidecar container?

Correct, the sidecar container has exactly one HTTP endpoint that expects a JFR binary to be POSTed to it and it responds with the automated analysis HTML document. So when you make a request to Cryostat for a report, then behind the scenes it actually delegates out to cryostat-reports with another HTTP request, and when it gets a response it wraps that up and sends you back another HTTP response.

hareetd
hareetd previously approved these changes Dec 10, 2021
Copy link
Contributor

@hareetd hareetd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested it out, looks good.

@andrewazores
Copy link
Member Author

It's a fairly large change, not just in line count/patch size but in functionality being moved around, so I'll wait until @jan-law gets another chance to take a look and @ebaron gets some eyes on this before merging.

@jan-law
Copy link
Contributor

jan-law commented Dec 13, 2021

After pulling your latest changes, I got a dependency error looking for the 2.4.0 cryostat-core jar.
Could not resolve dependencies for project io.cryostat:cryostat:jar:2.1.0-SNAPSHOT: Failure to find io.cryostat:cryostat-core:jar:2.4.0

When I looked at cryostat-core, I see the versioning changed from 2.3.1 to 2.5.0-SNAPSHOT. Your branch builds fine when I change <io.cryostat.core.version>2.4.0</io.cryostat.core.version> to 2.5.0-SNAPSHOT. Where is the 2.4.0 -core version from?

@andrewazores
Copy link
Member Author

andrewazores commented Dec 13, 2021

After pulling your latest changes, I got a dependency error looking for the 2.4.0 cryostat-core jar. Could not resolve dependencies for project io.cryostat:cryostat:jar:2.1.0-SNAPSHOT: Failure to find io.cryostat:cryostat-core:jar:2.4.0

When I looked at cryostat-core, I see the versioning changed from 2.3.1 to 2.5.0-SNAPSHOT. Your branch builds fine when I change <io.cryostat.core.version>2.4.0</io.cryostat.core.version> to 2.5.0-SNAPSHOT. Where is the 2.4.0 -core version from?

You'll have to check out and build the upstream v2 branch: https://github.com/cryostatio/cryostat-core/blob/v2/pom.xml#L8

That goes for anything after #731

@ebaron
Copy link
Member

ebaron commented Dec 16, 2021

I have a quick/dirty setup of this in OpenShift, but hit a reflection problem with the reports container:

2021-12-16 22:18:19,814 ERROR [io.qua.ver.htt.run.QuarkusErrorHandler] (executor-thread-0) HTTP Request to /report failed, error id: 64f46a06-c1e9-4822-932e-c8cd734f5498-1: java.lang.RuntimeException: java.lang.InstantiationException: Type `org.openjdk.jmc.flightrecorder.internal.parser.v1.StructTypes.JfrOldObject` can not be instantiated reflectively as it does not have a no-parameter constructor or the no-parameter constructor has not been added explicitly to the native image.
	at org.openjdk.jmc.flightrecorder.internal.parser.v1.ValueReaders$ReflectiveReader.read(ValueReaders.java:526)
	at org.openjdk.jmc.flightrecorder.internal.parser.v1.TypeManager$TypeEntry.readConstant(TypeManager.java:294)
	at org.openjdk.jmc.flightrecorder.internal.parser.v1.TypeManager.readConstants(TypeManager.java:412)
	at org.openjdk.jmc.flightrecorder.internal.parser.v1.ChunkLoaderV1.readConstantPoolEvent(ChunkLoaderV1.java:110)
	at org.openjdk.jmc.flightrecorder.internal.parser.v1.ChunkLoaderV1.call(ChunkLoaderV1.java:77)
	at org.openjdk.jmc.flightrecorder.internal.parser.v1.ChunkLoaderV1.call(ChunkLoaderV1.java:47)
	at java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.lang.Thread.run(Thread.java:829)
	at com.oracle.svm.core.thread.JavaThreads.threadStartRoutine(JavaThreads.java:567)
	at com.oracle.svm.core.posix.thread.PosixJavaThreads.pthreadStartRoutine(PosixJavaThreads.java:192)
Caused by: java.lang.InstantiationException: Type `org.openjdk.jmc.flightrecorder.internal.parser.v1.StructTypes.JfrOldObject` can not be instantiated reflectively as it does not have a no-parameter constructor or the no-parameter constructor has not been added explicitly to the native image.
	at java.lang.Class.newInstance(DynamicHub.java:911)
	at org.openjdk.jmc.flightrecorder.internal.parser.v1.ValueReaders$ReflectiveReader.read(ValueReaders.java:516)
	... 13 more

Should I be testing it in JVM mode for now?

extract subprocess report generation variable

extract platform configuration variables

extract max WS connections variable

extract CORS origin variable

extract DISABLE_SSL variable

extract webserver configuration variables

use Variables

extract JMX connection config variables

finish extracting variables

fix missed import

fix extracted variables in tests
request ordering is handled deeper in the stack by the SubprocessReportGenerator synchronizing lock, or is handled by the configured sidecar report generator setup and its potential load balancer
allows Cryostat too inspect the report generator for profiling and analysis
@andrewazores
Copy link
Member Author

Rebased, just one very minor conflict in MessagingModule since I moved the MAX_CONNECTIONS env var out of there, but Janelle added the PRUNER constant.

Copy link
Contributor

@jan-law jan-law left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me after the rebase.

The todo bot should detect and create issues for any todo comments, right?

@andrewazores
Copy link
Member Author

The todo bot is not active anymore, actually. I can't remember anymore why.

@jan-law
Copy link
Contributor

jan-law commented Jan 13, 2022

No worries. It would be nice, although not necessary, if the todos were tracked as issues somehow. None of the todos are high priority anyway.

Copy link
Contributor

@hareetd hareetd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@andrewazores andrewazores merged commit 8347e65 into cryostatio:main Jan 13, 2022
@andrewazores andrewazores deleted the sidecar-reports branch January 13, 2022 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Apply available memory-based heuristic when forking report subproces
4 participants