Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(scorecard): scorecard tests for recording management #698

Merged
merged 39 commits into from
Mar 5, 2024

Conversation

tthvo
Copy link
Member

@tthvo tthvo commented Dec 9, 2023

Welcome to Cryostat! 👋

Before contributing, make sure you have:

  • Read the contributing guidelines
  • Linked a relevant issue which this PR resolves
  • Linked any other relevant issues, PR's, or documentation, if any
  • Resolved all conflicts, if any
  • Rebased your branch PR on top of the latest upstream main branch
  • Attached at least one of the following labels to the PR: [chore, ci, docs, feat, fix, test]
  • Signed all commits: git commit -S -m "YOUR_COMMIT_MESSAGE"

Fixes: #504

Description of the change:

  • Added scorecard test for managing recordings with Cryostat.
    • This includes creating, getting, archiving, generating report, stopping, and deleting a recording on Cryostat itself.
    • The recording option is hard-coded to be continuous and using template ALL.
    • A JMX credential need to be created recording operations.
  • Added some logic to wait for the application URL to be reachable to handle intermittent 503 status when deploying on OpenShift (CRC).
  • Updated CI to fix flanky nginx ingress controller.
    • As in [Request] Consider using minikube for CI #701, the controller is dropping requests due worker processes failing.
    • A quick fix is to patch the ConfigMap that represents the controller configurations to lower the number of worker processes. Seems like 1 works well.
  • Updated CI to add an entry to /etc/hosts: ${kind-ip-address} testing.cryostat. This allows testing.cryostat to resolve to the kind container internal IP within its bridge network. Quite a neat workaround to avoid sending to ingress controller service directly by patching scorecard config file.
  • Also finally, updated the scorecard case config to split the built-in and custom tests into 2 stages. This allows built-in tests to be executed in parallel while custom ones must be in sequential.

Notes

API request can sometimes fail with status code 500: The client has been closed. The most common one is when generating report:

failed to generate report for the recording: API request failed with status code 500: 
org.openjdk.jmc.rjmx.services.jfr.FlightRecorderException: Could not clone the scorecard_test_rec recording caused by 
IOException: The client has been closed.

Motivation for the change:

See #504

How to manually test:

If using OpenShift (e.g. CRC), no further steps are needed.
If using Kubernetes (e.g. minikube, kind), obtain the ip address of the cluster and add an entry to /etc/hosts: ${ip-address} testing.cryostat

  • If minikube, minikube ip
  • If kind, docker inspect <kind-container-name> -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}'

Then, build the images, push to registry and run the tests.

export PLATFORMS=linux/amd64
export IMAGE_NAMESPACE=quay.io/<namespace>
make scorecard-build bundle bundle-build && podman push quay.io/<namespace>/cryostat-operator-bundle:2.5.0-dev && make test-scorecard

Sample run

Below is a successful run. Though, retries might be needed as failures can occur, described in notes section above.

https://github.com/tthvo/cryostat-operator/actions/runs/7316493026/job/19931285551

@tthvo tthvo force-pushed the recording-scorecard branch 3 times, most recently from d765626 to d912ace Compare December 24, 2023 17:28
@tthvo tthvo requested review from ebaron and a team December 24, 2023 17:55
@tthvo tthvo marked this pull request as ready for review December 24, 2023 17:55
@tthvo
Copy link
Member Author

tthvo commented Dec 24, 2023

Expected output when the test runs successfully.

--------------------------------------------------------------------------------
Image:      quay.io/thvo/cryostat-operator-scorecard:2.5.0-20231224193927
Entrypoint: [cryostat-scorecard-tests cryostat-recording]
Labels:
	"test":"cryostat-recording"
	"suite":"cryostat"
Results:
	Name: cryostat-recording
	State: pass

	Log:
		deployment cryostat-recording is not yet found
		deployment cryostat-recording is not yet found
		deployment cryostat-recording is not yet found
		deployment cryostat-recording is not yet found
		deployment cryostat-recording is not yet found
		deployment cryostat-recording is not yet found
		deployment cryostat-recording is not yet found
		deployment cryostat-recording is not yet available
		deployment cryostat-recording is not yet available
		deployment cryostat-recording is not yet available
		deployment cryostat-recording is not yet available
		deployment cryostat-recording is not yet available
		deployment cryostat-recording is not yet available
		deployment cryostat-recording is not yet available
		deployment cryostat-recording is not yet available
		deployment cryostat-recording is not yet available
		deployment cryostat-recording is not yet available
		deployment cryostat-recording is not yet available
		deployment cryostat-recording is not yet available
		deployment cryostat-recording is available
		application is ready at https://cryostat-recording-cryostat-operator-scorecard.apps-crc.testing
		found a target: {ConnectUrl:service:jmx:rmi:///jndi/rmi://10-217-0-141.cryostat-operator-scorecard.pod:9091/jmxrmi Alias:cryostat-recording-5d4b67d9c9-cqnfp}
		created stored credential with match expression: target.alias=="cryostat-recording-5d4b67d9c9-cqnfp"
		created a recording: &{DownloadURL:https://cryostat-recording-cryostat-operator-scorecard.apps-crc.testing:443/api/v1/targets/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2F10-217-0-141.cryostat-operator-scorecard.pod:9091%2Fjmxrmi/recordings/scorecard_test_rec ReportURL:https://cryostat-recording-cryostat-operator-scorecard.apps-crc.testing:443/api/v1/targets/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2F10-217-0-141.cryostat-operator-scorecard.pod:9091%2Fjmxrmi/reports/scorecard_test_rec Id:1 Name:scorecard_test_rec StartTime:1703447359940 State:RUNNING Duration:0 Continuous:true ToDisk:true MaxSize:0 MaxAge:0}
		current list of recordings: [{DownloadURL:https://cryostat-recording-cryostat-operator-scorecard.apps-crc.testing:443/api/v1/targets/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2F10-217-0-141.cryostat-operator-scorecard.pod:9091%2Fjmxrmi/recordings/scorecard_test_rec ReportURL:https://cryostat-recording-cryostat-operator-scorecard.apps-crc.testing:443/api/v1/targets/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2F10-217-0-141.cryostat-operator-scorecard.pod:9091%2Fjmxrmi/reports/scorecard_test_rec Id:1 Name:scorecard_test_rec StartTime:1703447359940 State:RUNNING Duration:0 Continuous:true ToDisk:true MaxSize:0 MaxAge:0}]
		archived the recording scorecard_test_rec at: cryostat-recording-5d4b67d9c9-cqnfp_scorecard_test_rec_20231224T194950Z.jfr
		current list of archives: [{Name:cryostat-recording-5d4b67d9c9-cqnfp_scorecard_test_rec_20231224T194950Z.jfr DownloadUrl:https://cryostat-recording-cryostat-operator-scorecard.apps-crc.testing:443/api/beta/recordings/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2F10-217-0-141.cryostat-operator-scorecard.pod:9091%2Fjmxrmi/cryostat-recording-5d4b67d9c9-cqnfp_scorecard_test_rec_20231224T194950Z.jfr ReportUrl:https://cryostat-recording-cryostat-operator-scorecard.apps-crc.testing:443/api/beta/reports/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2F10-217-0-141.cryostat-operator-scorecard.pod:9091%2Fjmxrmi/cryostat-recording-5d4b67d9c9-cqnfp_scorecard_test_rec_20231224T194950Z.jfr Metadata:{Labels:map[template.name:ALL template.type:TARGET]} Size:4406873}]
		generated report for the recording scorecard_test_rec: map[Allocations.class:map[evaluation:map[explanation:Frequently allocated types are good places to start when trying to reduce garbage collections. Look at where the most common types are being allocated to see if many instances are created along the same call path. Try to reduce the number of instances created by invoking the most commonly taken paths less. suggestions:[] summary:The most allocated type is likely ''byte[]'', most commonly allocated by: org.openjdk.jmc.flightrecorder.internal.parser.v1.StructTypes$JfrMethod@b73f018f] name:Allocated Classes score:12.405895119368047 topic:heap] Allocations.thread:map[evaluation:map[explanation:Many allocations performed by the same thread might indicate a problem in a multi-threaded program. Look at the stack traces for the thread with the highest allocation rate. See if the allocation rate can be brought down, or balanced among the active threads. suggestions:[] summary:The most allocations were likely done by thread ''vert.x-worker-thread-12'' at: org.openjdk.jmc.flightrecorder.internal.parser.v1.StructTypes$JfrMethod@102c1cf5,org.openjdk.jmc.flightrecorder.internal.parser.v1.StructTypes$JfrMethod@2bba42b1,org.openjdk.jmc.flightrecorder.internal.parser.v1.StructTypes$JfrMethod@35be4ede,org.openjdk.jmc.flightrecorder.internal.parser.v1.StructTypes$JfrMethod@f5d26b03,org.openjdk.jmc.flightrecorder.internal.parser.v1.StructTypes$JfrMethod@871fff19,org.openjdk.jmc.flightrecorder.internal.parser.v1.StructTypes$JfrMethod@d9595ce6] name:Threads Allocating score:8.34649245842964 topic:java_application] ApplicationHalts:map[evaluation:map[explanation:The highest ratio of application halts to execution time was 0.319 % during 12/24/2023, 7:49:19.000 PM – 7:50:19 PM. 24.3 % of the halts were for reasons other than GC. The halts ratio for the entire recording was 0.616 %. 24.3 % of the total halts were for reasons other than GC. suggestions:[] summary:Application efficiency was not highly affected by halts.] name:Application Halts score:1.5954072750000001 topic:java_application] BufferLost:map[evaluation:map[suggestions:[] summary:No Flight Recorder buffers were lost during the recording.] name:Lost Flight Recorder Buffers score:0 topic:recording] BytecodeVerification:map[evaluation:map[suggestions:[] summary:The application ran with bytecode verification enabled.] name:Bytecode Verification score:0 topic:jvm_information] ClassLeak:map[evaluation:map[suggestions:[]] name:Class Leak score:-1 topic:classloading] ClassLoading:map[evaluation:map[suggestions:[] summary:No significant time was spent loading new classes during this recording.] name:Class Loading Pressure score:0 topic:classloading] CodeCache:map[evaluation:map[suggestions:[]] name:Code Cache score:-1 topic:code_cache] CompareCpu:map[evaluation:map[explanation:The application performance can be affected when the machine is under heavy load and there are other processes that use CPU or other resources on the same computer. To profile representatively or get higher throughput, shut down other resource intensive processes running on the machine. suggestions:[] summary:An average CPU load of 15 % was caused by other processes for during 12/24/2023, 7:49:19.000 PM – 7:49:50 PM.] name:Competing CPU Ratio Usage score:6.225391176180512 topic:processes] CompressedOops:map[evaluation:map[suggestions:[] summary:The settings for Compressed Oops were OK.] name:Compressed Oops score:0 topic:gc_configuration] ContextSwitch:map[evaluation:map[suggestions:[] summary:The program did not context switch excessively during the recording.] name:Context Switches score:1 topic:lock_instances] DMSIncident:map[evaluation:map[suggestions:[]] name:DMS Incidents score:-1 topic:DMS] DebugNonSafepoints:map[evaluation:map[suggestions:[] summary:DebugNonSafepoints was implicitly enabled in the JVM version used to create this recording.] name:DebugNonSafepoints score:0 topic:jvm_information] DiscouragedVmOptions:map[evaluation:map[suggestions:[] summary:No problems were found with the VM options.] name:Discouraged VM Options score:0 topic:jvm_information] DumpReason:map[evaluation:map[suggestions:[]] name:Exceptional Dump Reason score:-1 topic:recording] DuplicateFlags:map[evaluation:map[suggestions:[] summary:There were no duplicate JVM flags on the command line.] name:Duplicated Flags score:0 topic:jvm_information] Errors:map[evaluation:map[explanation:3 errors were thrown in total. The most common error was ''java.lang.NoSuchMethodError'', which was thrown 3 times. Investigate the thrown errors to see if they can be avoided. Errors indicate that something went wrong with the code execution and should never be used for flow control. suggestions:[] summary:The program generated an average of 3 errors per minute during 12/24/2023, 7:49:20.000 PM – 7:50:20 PM.] name:Thrown Errors score:2.5 topic:exceptions] Exceptions:map[evaluation:map[explanation:Throwing exceptions is more expensive than normal code execution, which means that they should only be used for exceptional situations. Investigate the thrown exceptions to see if any of them can be avoided with a non-exceptional control flow. suggestions:[] summary:The program generated 7.21 exceptions per second during 12/24/2023, 7:49:20.000 PM – 7:49:50 PM.] name:Thrown Exceptions score:0.036032756557046075 topic:exceptions] Fatal Errors:map[evaluation:map[suggestions:[]] name:Fatal Errors score:-1 topic:jvm_information] FewSampledThreads:map[evaluation:map[suggestions:[]] name:Parallel Threads score:-1 topic:java_application] FileRead:map[evaluation:map[suggestions:[] summary:No long file read pauses were found in this recording (the longest was 6.508 ms).] name:File Read Peak Duration score:0.08134705 topic:file_io] FileWrite:map[evaluation:map[suggestions:[] summary:No long file write pauses were found in this recording (the longest was 333.179 μs).] name:File Write Peak Duration score:0 topic:file_io] FlightRecordingSupport:map[evaluation:map[suggestions:[] summary:The JVM version used for this recording has full Flight Recorder support.] name:Flight Recording Support score:0 topic:jvm_information] FullGc:map[evaluation:map[explanation:At least one Full, Stop-The-World Garbage Collection occurred during this recording. For the CMS and G1 collectors, Full GC events are a strong negative performance indicator. Tunable GC parameters can be used to allow the collector to operate in concurrent mode, avoiding Stop-The-World pauses and increasing GC and application performance. suggestions:[] summary:Full GC detected.] name:G1/CMS Full Collection score:75 topic:garbage_collection] GarbageCollectionInfoRule:map[evaluation:map[suggestions:[]] name:Garbage Collection Info score:0 topic:garbage_collection] GcFreedRatio:map[evaluation:map[suggestions:[] summary:Only 8 heap summary events were found, this rule requires at least 10 events to be able to calculate a relevant result. This likely means that only a few garbage collections occurred during the recording. Having few garbage collections is generally a good sign.] name:GC Freed Ratio score:0 topic:heap] GcLocker:map[evaluation:map[suggestions:[] summary:No GCs were affected by the GC Locker.] name:GCs Caused by GC Locker score:0 topic:garbage_collection] GcOptions:map[evaluation:map[suggestions:[] summary:No problems were found with the GC configuration.] name:GC Setup score:0 topic:jvm_information] GcPauseRatio:map[evaluation:map[explanation:The highest ratio between garbage collection pauses and execution time was 0.242 % during 12/24/2023, 7:49:19.000 PM – 7:50:19 PM. The garbage collection pause ratio of the entire recording was 0.466 %. solution:Pause times may be reduced by increasing the heap size or by trying to reduce allocation. suggestions:[] summary:Application efficiency was not highly affected by GC pauses.] name:GC Pauses score:1.207738575 topic:garbage_collection] GcStall:map[evaluation:map[suggestions:[] summary:No indications that the garbage collector could not keep up with the workload were detected.] name:GC Stall score:0 topic:garbage_collection] HeapContent:map[evaluation:map[explanation:If the heap usage needs to be reduced, then this would be a good place to start. suggestions:[] summary:Most of the heap was used by only a few classes.] name:Heap Content score:89.91250273799291 topic:heap] HeapDump:map[evaluation:map[suggestions:[]] name:Heap Dump score:-1 topic:heap] HeapInspectionGc:map[evaluation:map[explanation:Performing heap inspection garbage collections may be a problem since they usually take a lot of time. suggestions:[] summary:The JVM performed 4 heap inspection garbage collections.] name:GCs Caused by Heap Inspection score:59.77379177936154 topic:garbage_collection] HighGc:map[evaluation:map[explanation:The time spent performing garbage collection may be reduced by increasing the heap size or by trying to reduce allocation.
		To improve rule accuracy and/or get more details for further investigation, it is recommended to enable the following event types: . suggestions:[] summary:The JVM was paused for 100 % during 12/24/2023, 7:49:20.006.000 PM – .053] name:GC Pressure score:10.273532100008776 topic:heap] HighJvmCpu:map[evaluation:map[explanation:The sampling period for the 'CPU Load' events was set to Every Chunk, which is too high for CPU load related rules to work. suggestions:[] summary:This recording has a high sampling period for 'CPU Load' events.] name:High JVM CPU Load score:25 topic:java_application] IncreasingLiveSet:map[evaluation:map[explanation:Perform a dump with the 'Trace Paths to GC Roots' option enabled to enable a more detailed analysis of the potential memory leak. suggestions:[] summary:The live set on the heap seems to increase with a speed of about 11.6 KiB per second during the recording.There is no particular class that seems to be leaking more than any other.] name:Heap Live Set Trend score:0.849074074074074 topic:memoryleak] IncreasingMetaSpaceLiveSet:map[evaluation:map[suggestions:[] summary:The class data does not seem to increase during the recording.] name:Metaspace Live Set Trend score:4.102465604874872 topic:garbage_collection] JavaBlocking:map[evaluation:map[explanation:The following regular expression was used to exclude threads from this rule: ''(.*weblogic\.socket\.Muxer.*)'' suggestions:[] summary:No excessive problems with lock contention found.] name:Java Blocking score:0.07353422487380579 topic:lock_instances] JfrPeriodicEventsFix:map[evaluation:map[suggestions:[] summary:The version of Java you are running is not affected by a performance issue related to periodic events.] name:JFR Periodic Events Fix score:0 topic:jvm_information] LongGcPause:map[evaluation:map[explanation: suggestions:[] summary:The longest GC pause was 47.079 ms.] name:GC Pause Peak Duration score:1.422363001557028 topic:garbage_collection] LowOnPhysicalMemory:map[evaluation:map[suggestions:[] summary:The system did not run low on physical memory during this recording.] name:Free Physical Memory score:0 topic:heap] ManagementAgent:map[evaluation:map[solution:See the [Java Monitoring and Management Guide](https://docs.oracle.com/javase/8/docs/technotes/guides/management/agent.html) for more information about how to configure the management agent. suggestions:[]] name:Discouraged Management Agent Settings score:-1 topic:jvm_information] ManyRunningProcesses:map[evaluation:map[explanation:At 12/24/23, 7:49:50.446 PM, a total of 1 other processes were running on the host machine that this Flight Recording was made on. solution:If this is a server environment, it may be good to only run other critical processes on that machine. suggestions:[] summary:1 processes were running while this Flight Recording was made.] name:Competing Processes score:0.20309488837692125 topic:processes] MetaspaceOom:map[evaluation:map[suggestions:[]] name:Metaspace Out of Memory score:-1 topic:garbage_collection] MethodProfiling:map[evaluation:map[suggestions:[]] name:Method Profiling score:-1 topic:method_profiling] Options:map[evaluation:map[suggestions:[] summary:No undocumented, deprecated or non-recommended option flags were detected.] name:Command Line Options Check score:0 topic:jvm_information] OverAggressiveRecordingSetting:map[evaluation:map[explanation:Event types without threshold can lead to quite a lot of events being generated, possibly translating to higher overhead. If this was not intended, please check the settings in the template for future recordings. suggestions:[] summary:These following event types had no threshold: 'Java Monitor Blocked', 'Java Thread Park'] name:Discouraged Recording Settings score:25 topic:recording] PasswordsInArguments:map[evaluation:map[suggestions:[] summary:The recording does not seem to contain passwords in the application arguments.] name:Passwords in Java Arguments score:0 topic:jvm_information] PasswordsInEnvironment:map[evaluation:map[explanation:The following suspicious environment variables were found in this recording: CRYOSTAT_JDBC_PASSWORD, CRYOSTAT_JMX_CREDENTIALS_DB_PASSWORD. The following regular expression was used to exclude strings from this rule: ''(passworld|passwise)''. solution:If you wish to keep having passwords in your environment variables, but want to be able to share recordings without also sharing the passwords, please disable the ''Initial Environment Variable'' event. suggestions:[] summary:The environment variables in the recording may contain passwords.] name:Passwords in Environment Variables score:75 topic:environment_variables] PasswordsInSystemProperties:map[evaluation:map[explanation:The following suspicious system properties were found in this recording: javax.net.ssl.keyStorePassword,javax.net.ssl.trustStorePassword,com.sun.management.jmxremote.password.file. The following regular expression was used to exclude strings from this rule: ''(passworld|passwise)''. solution:If you wish to keep having passwords in your system properties, but want to be able to share recordings without also sharing the passwords, please disable the ''Initial System Property'' event. suggestions:[] summary:The system properties in the recording may contain passwords.] name:Passwords in System Properties score:75 topic:system_properties] PrimitiveToObjectConversion:map[evaluation:map[explanation:
		The most common object type that primitives are converted into is ''java.lang.Long'', which causes 42.3 KiB to be allocated. The most common call site is ''void sun.rmi.server.UnicastServerRef.dispatch(java.rmi.Remote, java.rmi.server.RemoteCall):323''.
		Conversion from primitives to the corresponding object types can either be done explicitly, or be caused by autoboxing. If a considerable amount of the total allocation is caused by such conversions, consider changing the application source code to avoid this behavior. Look at the allocation stack traces to see which parts of the code to change. This rule finds the calls to the valueOf method for any of the eight object types that have primitive counterparts. suggestions:[] summary:0.0733 % of the total allocation (56.4 MiB) is caused by conversion from primitive types to object types. The most common object type that primitives are converted into is ''java.lang.Long''.] name:Primitive To Object Conversion score:0.09158578477668697 topic:heap] ProcessStarted:map[evaluation:map[suggestions:[]] name:Process Started score:-1 topic:processes] SocketRead:map[evaluation:map[explanation:The longest recorded socket read took 26.735 s to read 5 B from the host at 10.217.4.1. Average time of recorded IO: 32.741 ms. Total time of recorded IO: 1 min 29 s. Total time of recorded IO for the host 10.217.4.1: 34.611 s. Note that there are some socket read patterns with high duration reads that we consider to be normal and are therefore excluded. Such patterns include JMX RMI communication and MQ series. suggestions:[] summary:There are long socket read pauses in this recording (the longest is 26.735 s).] name:Socket Read Peak Duration score:75 topic:socket_io] SocketWrite:map[evaluation:map[explanation:Note that there are some socket write patterns with high duration writes that we consider to be normal and are therefore excluded. Such patterns include JMX RMI communication. suggestions:[] summary:No long socket write pauses were found in this recording (the longest was 5.328 ms).] name:Socket Write Peak Duration score:0.4843197272727273 topic:socket_io] StackdepthSetting:map[evaluation:map[explanation:The Flight Recorder is configured with a maximum captured stack depth of 64. 1.01 % of all traces were larger than this option, and were therefore truncated. If more detailed traces are required, increase the ''-XX:FlightRecorderOptions=stackdepth=<value>'' value.
		Events of the following types have truncated stack traces: org.openjdk.jmc.flightrecorder.rules.jdk.general.StackDepthSettingRule$StackDepthTruncationData@37122a58,org.openjdk.jmc.flightrecorder.rules.jdk.general.StackDepthSettingRule$StackDepthTruncationData@1bb29df1,org.openjdk.jmc.flightrecorder.rules.jdk.general.StackDepthSettingRule$StackDepthTruncationData@44a7f5dd,org.openjdk.jmc.flightrecorder.rules.jdk.general.StackDepthSettingRule$StackDepthTruncationData@69e48b16,org.openjdk.jmc.flightrecorder.rules.jdk.general.StackDepthSettingRule$StackDepthTruncationData@4f823c98,org.openjdk.jmc.flightrecorder.rules.jdk.general.StackDepthSettingRule$StackDepthTruncationData@40d06d8c,org.openjdk.jmc.flightrecorder.rules.jdk.general.StackDepthSettingRule$StackDepthTruncationData@3b226156,org.openjdk.jmc.flightrecorder.rules.jdk.general.StackDepthSettingRule$StackDepthTruncationData@7653e11f suggestions:[] summary:Some stack traces were truncated in this recording.] name:Stackdepth Setting score:25 topic:jvm_information] StringDeduplication:map[evaluation:map[explanation:String deduplication is enabled using the JVM flag '-XX:+UseStringDeduplication'. This flag can be used together with the G1 garbage collector in JDK 8u20 or later, or with the Shenandoah garbage collector.
		To validate if this gives a performance improvement for your application, create flight recordings both with and without string deduplication. For the run with string deduplication enabled, also enable statistics with '-XX:+PrintStringDeduplicationStatistics' for JDK 8 or '-Xlog:stringdedup*=debug' for JDK 9. Check if the heap live set decrease in the recording with string deduplication enabled is larger than the size of the string deduplication metadata table. The size of the metadata table is printed in the statistics output as 'Table/Memory Usage: XX MB'
		You can read more about string deduplication in the java options documentation or in [JEP 192](https://openjdk.java.net/jeps/192). suggestions:[] summary:Approximately 1,544 % of the live set consists of the internal array type of strings (''byte[]'' for this JDK version).
		The heap is around 1.26 % full. There is likely no big benefit from enabling string deduplication.] name:String Deduplication score:8.844070278634204 topic:heap] SystemGc:map[evaluation:map[suggestions:[] summary:No garbage collections were caused by System.gc().] name:GCs Caused by System.gc() score:0 topic:garbage_collection] TlabAllocationRatio:map[evaluation:map[solution:Allocating objects outside of Thread Local Allocation Buffers (TLABs) is more expensive than allocating inside TLABs. This may be acceptable if the individual allocations are intended to be larger than a reasonable TLAB. It may be possible to avoid this by decreasing the size of the individual allocations. There are some TLAB related JVM flags that you can experiment with, but it is usually better to let the JVM manage TLAB sizes automatically. suggestions:[] summary:The program allocated 11.8 % of the memory outside of TLABs.] name:TLAB Allocation Ratio score:15.934741099319458 topic:tlab] VMOperations:map[evaluation:map[suggestions:[] summary:No excessively long VM operations were found in this recording (the longest was 47.116 ms).] name:VMOperation Peak Duration score:1.17788785 topic:vm_operations] biasedLockingRevocation:map[evaluation:map[suggestions:[]] name:Biased Locking Revocation score:-1 topic:biased_locking] biasedLockingRevocationPause:map[evaluation:map[suggestions:[] summary:No revocation of biased locks found.] name:Biased Locking Revocation Pauses score:0 topic:vm_operations]]
		stopped the recording: scorecard_test_rec
		deleted the recording: scorecard_test_rec
		current list of recordings: []

Copy link
Member

@ebaron ebaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @tthvo, really great work! Thank you!

I'll give this a more thorough look shortly. We'll probably need to add some kind of mitigation or identify the cause of the intermittent 500 errors (any ideas here @andrewazores?). This could cause our product builds to fail, for example.

@andrewazores
Copy link
Member

andrewazores commented Jan 10, 2024

I haven't seen that particular 500 [0] before, but it's saying that the JMX connection was unexpectedly closed. We have seen other similar ones in the past, ex.:

This could be a bug in how we treat the JMX connection and it gets expired and evicted from the connection cache prematurely, or else it could be some other network hiccup that causes the connection to drop, or perhaps there is something else that can cause the target JVM to drop JMX connections. It could even be a bug somewhere in the JMC library that we use in -core for creating the JMX connections.

[0]:

failed to generate report for the recording: API request failed with status code 500: 
org.openjdk.jmc.rjmx.services.jfr.FlightRecorderException: Could not clone the scorecard_test_rec recording caused by 
IOException: The client has been closed.

@ebaron
Copy link
Member

ebaron commented Jan 10, 2024

I haven't seen that particular 500 [0] before, but it's saying that the JMX connection was unexpectedly closed. We have seen other similar ones in the past, ex.:

* [FlightRecorderException: Could not create a recording! cryostat#563](https://github.com/cryostatio/cryostat/issues/563)

* [HTTP 500: org.openjdk.jmc.rjmx.services.jfr.FlightRecorderException: Could not retrieve the attribute Recordings! cryostat#775](https://github.com/cryostatio/cryostat/issues/775)

This could be a bug in how we treat the JMX connection and it gets expired and evicted from the connection cache prematurely, or else it could be some other network hiccup that causes the connection to drop, or perhaps there is something else that can cause the target JVM to drop JMX connections. It could even be a bug somewhere in the JMC library that we use in -core for creating the JMX connections.

[0]:

failed to generate report for the recording: API request failed with status code 500: 
org.openjdk.jmc.rjmx.services.jfr.FlightRecorderException: Could not clone the scorecard_test_rec recording caused by 
IOException: The client has been closed.

I was able to reproduce this in an OpenShift cluster. Here are the logs from the Cryostat pod when it occurred. Looks kind of like the cache eviction you described, but I'm not certain.

Jan 10, 2024 10:11:29 PM io.cryostat.core.log.Logger info
INFO: Creating connection for service:jmx:rmi:///jndi/rmi://10-128-0-64.cryostat-operator-scorecard.pod:9091/jmxrmi
Jan 10, 2024 10:11:29 PM io.cryostat.core.log.Logger info
INFO: Removing cached connection for service:jmx:rmi:///jndi/rmi://10-128-0-64.cryostat-operator-scorecard.pod:9091/jmxrmi: EXPIRED
Jan 10, 2024 10:11:29 PM io.cryostat.core.log.Logger info
INFO: Connection for service:jmx:rmi:///jndi/rmi://10-128-0-64.cryostat-operator-scorecard.pod:9091/jmxrmi closed
Jan 10, 2024 10:11:29 PM io.cryostat.core.log.Logger info
INFO: Removing cached connection for service:jmx:rmi:///jndi/rmi://10-128-0-64.cryostat-operator-scorecard.pod:9091/jmxrmi: EXPLICIT
Jan 10, 2024 10:11:29 PM io.cryostat.core.log.Logger info
INFO: Connection for service:jmx:rmi:///jndi/rmi://10-128-0-64.cryostat-operator-scorecard.pod:9091/jmxrmi closed
Jan 10, 2024 10:11:30 PM org.slf4j.impl.JDK14LoggerAdapter fillCallerData
INFO: 10.128.0.6 - - [Wed, 10 Jan 2024 22:11:30 GMT] 1230ms "PATCH /api/v1/targets/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2F10-128-0-64.cryostat-operator-scorecard.pod:9091%2Fjmxrmi/recordings/scorecard_test_rec HTTP/1.1" 200 75 bytes "-" "Go-http-client/1.1"
Jan 10, 2024 10:11:30 PM io.cryostat.core.log.Logger info
INFO: Outgoing WS message: {"meta":{"category":"ActiveRecordingSaved","type":{"type":"application","subType":"json"}},"message":{"jvmId":"4qiEWOM9ieFlmuNzpVtCWtBAYprAc5X4MGX5OpTqsVY=","recording":{"downloadUrl":"https://cryostat-recording-cryostat-operator-scorecard.apps.example.com:443/api/beta/recordings/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2F10-128-0-64.cryostat-operator-scorecard.pod:9091%2Fjmxrmi/cryostat-recording-75d949695f-j4q85_scorecard_test_rec_20240110T221129Z.jfr","name":"cryostat-recording-75d949695f-j4q85_scorecard_test_rec_20240110T221129Z.jfr","reportUrl":"https://cryostat-recording-cryostat-operator-scorecard.apps.example.com:443/api/beta/reports/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2F10-128-0-64.cryostat-operator-scorecard.pod:9091%2Fjmxrmi/cryostat-recording-75d949695f-j4q85_scorecard_test_rec_20240110T221129Z.jfr","metadata":{"labels":{"template.name":"ALL","template.type":"TARGET"}},"size":4890056,"archivedTime":1704924690552},"target":"service:jmx:rmi:///jndi/rmi://10-128-0-64.cryostat-operator-scorecard.pod:9091/jmxrmi"}}
Jan 10, 2024 10:11:30 PM io.cryostat.core.log.Logger info
INFO: GraphQL query: 
			query ArchivedRecordingsForTarget($connectUrl: String) {
				archivedRecordings(filter: { sourceTarget: $connectUrl }) {
					data {
						name
						downloadUrl
						reportUrl
						metadata {
						labels
						}
						size
					}
				}
			}
		
Jan 10, 2024 10:11:30 PM org.slf4j.impl.JDK14LoggerAdapter fillCallerData
INFO: 10.128.0.6 - - [Wed, 10 Jan 2024 22:11:30 GMT] 149ms "POST /api/v2.2/graphql HTTP/1.1" 200 856 bytes "-" "Go-http-client/1.1"
Jan 10, 2024 10:11:30 PM io.cryostat.core.log.Logger info
INFO: Creating connection for service:jmx:rmi:///jndi/rmi://10-128-0-64.cryostat-operator-scorecard.pod:9091/jmxrmi
Jan 10, 2024 10:11:30 PM io.cryostat.core.log.Logger info
INFO: Connection for service:jmx:rmi:///jndi/rmi://10-128-0-64.cryostat-operator-scorecard.pod:9091/jmxrmi closed
Jan 10, 2024 10:11:30 PM io.cryostat.core.log.Logger info
INFO: Removing cached connection for service:jmx:rmi:///jndi/rmi://10-128-0-64.cryostat-operator-scorecard.pod:9091/jmxrmi: EXPLICIT
Jan 10, 2024 10:11:30 PM io.cryostat.core.log.Logger info
INFO: Connection for service:jmx:rmi:///jndi/rmi://10-128-0-64.cryostat-operator-scorecard.pod:9091/jmxrmi closed
Jan 10, 2024 10:11:31 PM io.cryostat.core.log.Logger error
SEVERE: HTTP 500: org.openjdk.jmc.rjmx.services.jfr.FlightRecorderException: Could not clone the scorecard_test_rec recording 
io.vertx.ext.web.handler.HttpException: Internal Server Error
Caused by: java.util.concurrent.ExecutionException: org.openjdk.jmc.rjmx.services.jfr.FlightRecorderException: Could not clone the scorecard_test_rec recording 
	at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2096)
	at io.cryostat.net.web.http.api.v1.TargetReportGetHandler.handleAuthenticated(TargetReportGetHandler.java:119)
	at io.cryostat.net.web.http.AbstractAuthenticatedRequestHandler.handle(AbstractAuthenticatedRequestHandler.java:81)
	at io.cryostat.net.web.http.AbstractAuthenticatedRequestHandler.handle(AbstractAuthenticatedRequestHandler.java:51)
	at io.vertx.ext.web.impl.BlockingHandlerDecorator.lambda$handle$0(BlockingHandlerDecorator.java:48)
	at io.vertx.core.impl.ContextBase.lambda$null$0(ContextBase.java:137)
	at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:264)
	at io.vertx.core.impl.ContextBase.lambda$executeBlocking$1(ContextBase.java:135)
	at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: org.openjdk.jmc.rjmx.services.jfr.FlightRecorderException: Could not clone the scorecard_test_rec recording 
	at org.openjdk.jmc.rjmx.services.jfr.internal.FlightRecorderServiceV2.clone(FlightRecorderServiceV2.java:432)
	at org.openjdk.jmc.rjmx.services.jfr.internal.FlightRecorderServiceV2.openStream(FlightRecorderServiceV2.java:279)
	at io.cryostat.core.net.JmxFlightRecorderService.openStream(JmxFlightRecorderService.java:149)
	at io.cryostat.net.reports.AbstractReportGeneratorService.copyRecordingToFile(AbstractReportGeneratorService.java:90)
	at io.cryostat.net.reports.AbstractReportGeneratorService.lambda$getRecordingFromLiveTarget$1(AbstractReportGeneratorService.java:76)
	at io.cryostat.net.TargetConnectionManager.executeConnectedTask(TargetConnectionManager.java:151)
	at io.cryostat.net.reports.AbstractReportGeneratorService.getRecordingFromLiveTarget(AbstractReportGeneratorService.java:73)
	at io.cryostat.net.reports.AbstractReportGeneratorService.exec(AbstractReportGeneratorService.java:55)
	at io.cryostat.net.reports.ActiveRecordingReportCache.getReport(ActiveRecordingReportCache.java:126)
	at io.cryostat.net.reports.ActiveRecordingReportCache.getReport(ActiveRecordingReportCache.java:113)
	at com.github.benmanes.caffeine.cache.LocalLoadingCache.lambda$newMappingFunction$3(LocalLoadingCache.java:183)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2685)
	at java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1916)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2683)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2666)
	at com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:112)
	at com.github.benmanes.caffeine.cache.LocalLoadingCache.get(LocalLoadingCache.java:58)
	at io.cryostat.net.reports.ActiveRecordingReportCache.get(ActiveRecordingReportCache.java:87)
	at io.cryostat.net.reports.ReportService.get(ReportService.java:48)
	at io.cryostat.net.web.http.api.v1.TargetReportGetHandler.handleAuthenticated(TargetReportGetHandler.java:115)
	... 11 more
Caused by: java.io.IOException: The client has been closed.
	at java.management/com.sun.jmx.remote.internal.ClientCommunicatorAdmin.restart(ClientCommunicatorAdmin.java:99)
	at java.management/com.sun.jmx.remote.internal.ClientCommunicatorAdmin.gotIOException(ClientCommunicatorAdmin.java:59)
	at java.management.rmi/javax.management.remote.rmi.RMIConnector$RMIClientCommunicatorAdmin.gotIOException(RMIConnector.java:1497)
	at java.management.rmi/javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:1027)
	at org.openjdk.jmc.rjmx.internal.MCMBeanServerConnection.invoke(MCMBeanServerConnection.java:259)
	at org.openjdk.jmc.rjmx.ConnectionToolkit.invokeOperation(ConnectionToolkit.java:197)
	at org.openjdk.jmc.rjmx.services.jfr.internal.FlightRecorderCommunicationHelperV2.invokeJfrOperation(FlightRecorderCommunicationHelperV2.java:91)
	at org.openjdk.jmc.rjmx.services.jfr.internal.FlightRecorderCommunicationHelperV2.invokeOperation(FlightRecorderCommunicationHelperV2.java:82)
	at org.openjdk.jmc.rjmx.services.jfr.internal.FlightRecorderServiceV2.clone(FlightRecorderServiceV2.java:429)
	... 30 more

@ebaron
Copy link
Member

ebaron commented Jan 15, 2024

@andrewazores and I tried to get to the bottom of this. So far we've discovered that the issue doesn't seem to occur when using a custom target of service:jmx:rmi:///jndi/rmi://localhost:0/jmxrmi and not the relying on the built-in discovery.

@tthvo
Copy link
Member Author

tthvo commented Jan 15, 2024

So far we've discovered that the issue doesn't seem to occur when using a custom target of service:jmx:rmi:///jndi/rmi://localhost:0/jmxrmi

Ah that's interesting! I suppose another scorecard test with the same recording workflow but acting on a custom target or a target of a different realm (i.e. jdp, custom, k8s) would be useful in catching these issues?

@ebaron
Copy link
Member

ebaron commented Jan 15, 2024

So far we've discovered that the issue doesn't seem to occur when using a custom target of service:jmx:rmi:///jndi/rmi://localhost:0/jmxrmi

Ah that's interesting! I suppose another scorecard test with the same recording workflow but acting on a custom target or a target of a different realm (i.e. jdp, custom, k8s) would be useful in catching these issues?

A custom target test would be good. Although testing the built-in discovery is important too. That said, if we don't identify a cause for the problem we're seeing then we may need to settle for using only a custom target until we do find the cause.

@tthvo
Copy link
Member Author

tthvo commented Feb 29, 2024

Hey @ebaron @andrewazores, the tests now runs more reliably for me also. @Ming also added the check for EOF error and header logs. Is there anything else meanwhile? I will see if I can help figure the bugs above...hopefully :D

@tthvo
Copy link
Member Author

tthvo commented Mar 1, 2024

@andrewazores
Copy link
Member

I haven't had a chance to exercise the new changes - has anyone done it and seen that the EOFs have occurred, been handled with a retry, and then succeeded? ie verified that this specific "fix" actually works?

@tthvo
Copy link
Member Author

tthvo commented Mar 5, 2024

I haven't had a chance to exercise the new changes - has anyone done it and seen that the EOFs have occurred, been handled with a retry, and then succeeded? ie verified that this specific "fix" actually works?

Took me a while to hit the EOF error. The error seems to caught but looks like the request body has been closed so retry is failing. I will have a closer look...

Image:      quay.io/thvo/cryostat-operator-scorecard:2.5.0-20240304234157
Entrypoint: [cryostat-scorecard-tests cryostat-recording]
Labels:
	"suite":"cryostat"
	"test":"cryostat-recording"
Results:
	Name: cryostat-recording
	State: fail

	Errors:
		failed to archive the recording: Patch "https://cryostat-recording-cryostat-operator-scorecard.apps-crc.testing/api/v1/targets/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2Flocalhost:0%2Fjmxrmi/recordings/scorecard_test_rec": http: ContentLength=4 with Body length 0
	Log:
		deployment cryostat-recording is not yet found
		deployment cryostat-recording is not yet found
		deployment cryostat-recording is not yet found
		deployment cryostat-recording is not yet found
		deployment cryostat-recording is not yet found
		deployment cryostat-recording is not yet found
		deployment cryostat-recording is not yet available
		deployment cryostat-recording is not yet available
		deployment cryostat-recording is not yet available
		deployment cryostat-recording is not yet available
		...output-omitted...
		Connectio error: Patch "https://cryostat-recording-cryostat-operator-scorecard.apps-crc.testing/api/v1/targets/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2Flocalhost:0%2Fjmxrmi/recordings/scorecard_test_rec": EOF

@tthvo
Copy link
Member Author

tthvo commented Mar 5, 2024

With the latest commit, the test should work properly now. EOF is now correctly caught and retried.

Full log for recording test: https://gist.github.com/tthvo/42ad8f29e2aa2c5731ce436f513601c8
Patch (for logging EOF): https://gist.github.com/tthvo/b5093d56e01909f143cfe15c9b076a9e

Image:      quay.io/thvo/cryostat-operator-scorecard:2.5.0-20240305013338
Entrypoint: [cryostat-scorecard-tests cryostat-recording]
Labels:
	"test":"cryostat-recording"
	"suite":"cryostat"
Results:
	Name: cryostat-recording
	State: pass

	Log:
		deployment cryostat-recording is not yet found
		deployment cryostat-recording is not yet found
		deployment cryostat-recording is not yet found

		...output-omitted...

		deployment cryostat-recording is not yet available
		deployment cryostat-recording is not yet available
		deployment cryostat-recording is available
		application is available at https://cryostat-recording-cryostat-operator-scorecard.apps-crc.testing
		application is ready at https://cryostat-recording-cryostat-operator-scorecard.apps-crc.testing
		created a custom target: &{ConnectUrl:service:jmx:rmi:///jndi/rmi://localhost:0/jmxrmi Alias:customTarget}
		created stored credential with match expression: target.alias=="customTarget"
		created a recording: &{DownloadURL:https://cryostat-recording-cryostat-operator-scorecard.apps-crc.testing:443/api/v1/targets/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2Flocalhost:0%2Fjmxrmi/recordings/scorecard_test_rec ReportURL:https://cryostat-recording-cryostat-operator-scorecard.apps-crc.testing:443/api/v1/targets/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2Flocalhost:0%2Fjmxrmi/reports/scorecard_test_rec Id:1 Name:scorecard_test_rec StartTime:1709603635188 State:RUNNING Duration:0 Continuous:true ToDisk:true MaxSize:0 MaxAge:0}
		current list of recordings: [{DownloadURL:https://cryostat-recording-cryostat-operator-scorecard.apps-crc.testing:443/api/v1/targets/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2Flocalhost:0%2Fjmxrmi/recordings/scorecard_test_rec ReportURL:https://cryostat-recording-cryostat-operator-scorecard.apps-crc.testing:443/api/v1/targets/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2Flocalhost:0%2Fjmxrmi/reports/scorecard_test_rec Id:1 Name:scorecard_test_rec StartTime:1709603635188 State:RUNNING Duration:0 Continuous:true ToDisk:true MaxSize:0 MaxAge:0}]

                Connection error: Patch "https://cryostat-recording-cryostat-operator-scorecard.apps-crc.testing/api/v1/targets/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2Flocalhost:0%2Fjmxrmi/recordings/scorecard_test_rec": EOF

                archived the recording scorecard_test_rec at: customTarget_scorecard_test_rec_20240305T015426Z.jfr
		current list of archives: [{Name:customTarget_scorecard_test_rec_20240305T015426Z.jfr DownloadUrl:https://cryostat-recording-cryostat-operator-scorecard.apps-crc.testing:443/api/beta/recordings/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2Flocalhost:0%2Fjmxrmi/customTarget_scorecard_test_rec_20240305T015426Z.jfr ReportUrl:https://cryostat-recording-cryostat-operator-scorecard.apps-crc.testing:443/api/beta/reports/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2Flocalhost:0%2Fjmxrmi/customTarget_scorecard_test_rec_20240305T015426Z.jfr Metadata:{Labels:map[template.name:ALL template.type:TARGET]} Size:2967065}]
		generated report for the recording scorecard_test_rec: map[Allocations.class:map[evaluation:map[explanation:Frequently allocated types are good places to start when trying to reduce garbage collections. 
		...output-omitted...
		stopped the recording: scorecard_test_rec
		deleted the recording: scorecard_test_rec
		current list of recordings: []


serviceaccount "cryostat-scorecard" deleted
role.rbac.authorization.k8s.io "cryostat-scorecard" deleted
clusterrole.rbac.authorization.k8s.io "cryostat-scorecard" deleted
rolebinding.rbac.authorization.k8s.io "cryostat-scorecard" deleted
clusterrolebinding.rbac.authorization.k8s.io "cryostat-scorecard" deleted
namespace "cryostat-operator-scorecard" deleted

Copy link
Member

@ebaron ebaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EOF handling seems to be working nicely! I did 30 runs and got the following:

  • Hit the EOF once, which retried and was successful.
  • 1 timeout, which affected other tests in the suite. This could be flaky infrastructure, so I'm not going to worry about it.
  • 3 failures due to 503 errors when first trying to create target. I don't see anything wrong with the ready detection logic, so perhaps this is the container crashing and we're not seeing it. I'll file a new issue to increase the logging (especially on failure).

Excellent work and thank you for all your patience!

@ebaron ebaron merged commit cfcbfc7 into cryostatio:main Mar 5, 2024
5 checks passed
@tthvo
Copy link
Member Author

tthvo commented Mar 5, 2024

^^ I am so glad I could help with this :D a bit caught off guard that 503 still occurs tho... I will try to see if I can reproduce it...

@ebaron
Copy link
Member

ebaron commented Mar 6, 2024

@Mergifyio backport cryostat3

Copy link
Contributor

mergify bot commented Mar 6, 2024

backport cryostat3

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Mar 6, 2024
* test(scorecard): scorecard tests for recording management

Signed-off-by: Thuan Vo <thuan.votann@gmail.com>

* fixup(scorecard): fix cr cleanup func

* test(scorecard): registry recording test to suite

* chore(scorecard): reorganize client def

* chore(scorecard): clean up common setup func

* chore(bundle): regenerate bundle with scorecard tag

* chore(bundle): correct image tag in bundle

* fix(bundle): add missing scorecard test config patch

* feat(scorecard): scaffold cryostat API client

* chore(scorecard): clean up API client

* test(scorecard): implement recording scorecard test

* fixup(scorecard): correctly add scorecard test via hack templates

* fix(client): ignore unverified tls certs and base64 oauth token

* chore(bundle): split cryostat tests to separate stage

* fix(scorecard): extend default transport instead of overwriting

* chore(scorecard): refactor client to support multi-part

* fixup(client): fix request verb

* fix(client): fix recording create form format

* fix(scorecard): create stored credentials for target JVM

* fix(scorecard): fix 502 status error

* chore(scorecard): simplify client def

* chore(scorecard): fetch recordings to ensure action is correctly performed

* test(scorecard): test generating report for a recording

* chore(scorecard): clean up

* test(scorecard): list archives in tests

* ci(scorecard): reconfigure ingress for kind

* ci(k8s): correct cluster name

* test(scorecard): use role instead of clusterrole for oauth rules

* test(scorecard): parse health response for additional checks

* chore(scorecard): add missing newline in logs

* chore(scorecard): check status code before parsing body in health check

* test(scorecard): add custom target discovery to recording scorecard test

* add EOF wait and resp headers

* add resp headers

* chore(client): configure all clients to send safe requests

* fix(clients): add missing content-type header

* fix(scorecard): add missing test name in help message

* chore(client): create new http requests when retrying

* chore(bundle): update scorecard image tags

---------

Signed-off-by: Thuan Vo <thuan.votann@gmail.com>
Co-authored-by: Ming Yu Wang <90855268+mwangggg@users.noreply.github.com>
Co-authored-by: Ming Wang <miwan@redhat.com>
(cherry picked from commit cfcbfc7)

# Conflicts:
#	bundle/manifests/cryostat-operator.clusterserviceversion.yaml
ebaron added a commit that referenced this pull request Mar 6, 2024
) (#752)

* test(scorecard): scorecard tests for recording management (#698)

* test(scorecard): scorecard tests for recording management

Signed-off-by: Thuan Vo <thuan.votann@gmail.com>

* fixup(scorecard): fix cr cleanup func

* test(scorecard): registry recording test to suite

* chore(scorecard): reorganize client def

* chore(scorecard): clean up common setup func

* chore(bundle): regenerate bundle with scorecard tag

* chore(bundle): correct image tag in bundle

* fix(bundle): add missing scorecard test config patch

* feat(scorecard): scaffold cryostat API client

* chore(scorecard): clean up API client

* test(scorecard): implement recording scorecard test

* fixup(scorecard): correctly add scorecard test via hack templates

* fix(client): ignore unverified tls certs and base64 oauth token

* chore(bundle): split cryostat tests to separate stage

* fix(scorecard): extend default transport instead of overwriting

* chore(scorecard): refactor client to support multi-part

* fixup(client): fix request verb

* fix(client): fix recording create form format

* fix(scorecard): create stored credentials for target JVM

* fix(scorecard): fix 502 status error

* chore(scorecard): simplify client def

* chore(scorecard): fetch recordings to ensure action is correctly performed

* test(scorecard): test generating report for a recording

* chore(scorecard): clean up

* test(scorecard): list archives in tests

* ci(scorecard): reconfigure ingress for kind

* ci(k8s): correct cluster name

* test(scorecard): use role instead of clusterrole for oauth rules

* test(scorecard): parse health response for additional checks

* chore(scorecard): add missing newline in logs

* chore(scorecard): check status code before parsing body in health check

* test(scorecard): add custom target discovery to recording scorecard test

* add EOF wait and resp headers

* add resp headers

* chore(client): configure all clients to send safe requests

* fix(clients): add missing content-type header

* fix(scorecard): add missing test name in help message

* chore(client): create new http requests when retrying

* chore(bundle): update scorecard image tags

---------

Signed-off-by: Thuan Vo <thuan.votann@gmail.com>
Co-authored-by: Ming Yu Wang <90855268+mwangggg@users.noreply.github.com>
Co-authored-by: Ming Wang <miwan@redhat.com>
(cherry picked from commit cfcbfc7)

# Conflicts:
#	bundle/manifests/cryostat-operator.clusterserviceversion.yaml

* Fix conflicts

---------

Co-authored-by: Thuan Vo <thuan.votann@gmail.com>
Co-authored-by: Elliott Baron <ebaron@redhat.com>
@tthvo tthvo deleted the recording-scorecard branch March 6, 2024 21:40
andrewazores added a commit that referenced this pull request Apr 23, 2024
* feat(discovery): options to configure discovery port names and numbers (backport #715) (#725)

* feat(discovery): options to configure discovery port names and numbers (#715)

Signed-off-by: Thuan Vo <thuan.votann@gmail.com>
(cherry picked from commit a552021)

* resolve conflict

---------

Co-authored-by: Thuan Vo <thuan.votann@gmail.com>
Co-authored-by: Andrew Azores <aazores@redhat.com>

* Deploy cryostat 3.0

* Remove extraneous file

* test adjustments

* feat(discovery): options to configure discovery port names and numbers (#715)

Signed-off-by: Thuan Vo <thuan.votann@gmail.com>

* Fix typo in environment variable breaking reconciler test, fix missing SecurityContext

* Fix conflict with cluster cryostat removal

* ci(gh): add comment when /build_test is finished (#745)

* add scorecard test/suite selection (#746)

* test(scorecard): scorecard tests for recording management (#698)

* test(scorecard): scorecard tests for recording management

Signed-off-by: Thuan Vo <thuan.votann@gmail.com>

* fixup(scorecard): fix cr cleanup func

* test(scorecard): registry recording test to suite

* chore(scorecard): reorganize client def

* chore(scorecard): clean up common setup func

* chore(bundle): regenerate bundle with scorecard tag

* chore(bundle): correct image tag in bundle

* fix(bundle): add missing scorecard test config patch

* feat(scorecard): scaffold cryostat API client

* chore(scorecard): clean up API client

* test(scorecard): implement recording scorecard test

* fixup(scorecard): correctly add scorecard test via hack templates

* fix(client): ignore unverified tls certs and base64 oauth token

* chore(bundle): split cryostat tests to separate stage

* fix(scorecard): extend default transport instead of overwriting

* chore(scorecard): refactor client to support multi-part

* fixup(client): fix request verb

* fix(client): fix recording create form format

* fix(scorecard): create stored credentials for target JVM

* fix(scorecard): fix 502 status error

* chore(scorecard): simplify client def

* chore(scorecard): fetch recordings to ensure action is correctly performed

* test(scorecard): test generating report for a recording

* chore(scorecard): clean up

* test(scorecard): list archives in tests

* ci(scorecard): reconfigure ingress for kind

* ci(k8s): correct cluster name

* test(scorecard): use role instead of clusterrole for oauth rules

* test(scorecard): parse health response for additional checks

* chore(scorecard): add missing newline in logs

* chore(scorecard): check status code before parsing body in health check

* test(scorecard): add custom target discovery to recording scorecard test

* add EOF wait and resp headers

* add resp headers

* chore(client): configure all clients to send safe requests

* fix(clients): add missing content-type header

* fix(scorecard): add missing test name in help message

* chore(client): create new http requests when retrying

* chore(bundle): update scorecard image tags

---------

Signed-off-by: Thuan Vo <thuan.votann@gmail.com>
Co-authored-by: Ming Yu Wang <90855268+mwangggg@users.noreply.github.com>
Co-authored-by: Ming Wang <miwan@redhat.com>

* test(scorecard): scorecard test for Cryostat CR configuration changes (#739)

* CR config scorecard

* reformat

* reviews

* add kubectl license

* test(scorecard): scorecard test for report generator  (#753)

* deploy reports sidecar

* report scorecard test

* update

* rebase fix

* query health

* fix(build-ci): fix scorecard image tag returned as null (#760)

Signed-off-by: Thuan Vo <thuan.votann@gmail.com>
Co-authored-by: Elliott Baron <ebaron@redhat.com>

* test(scorecard): add container logs to scorecard results (#758)

* test(scorecard): add container logs to scorecard results

* build(bundle): regenerate bundle with new scorecard tags

* chore(scorecard): refactor to remove duplicate codes

* add permission to publish comment when ci fails (#769)

Co-authored-by: Elliott Baron <ebaron@redhat.com>

* Update NewCoreContainer and associated tests

* build(go): update Golang to 1.21 (#777)

* test(scorecard): logWorkloadEvent for cryostat-recording errors (#759)

* logWorkLoadEvent for cryostat-recording errors

* reviews

* tr.LogChannel

---------

Co-authored-by: Elliott Baron <ebaron@redhat.com>

* test(scorecard): fix rebasing skipped commit (#780)

* Merge pull request #8 from ebaron/scorecard-methods

test(scorecard): use methods for more easily passing data

* update bundle image

* Review fixes

* generate storage key, create expected Secret

* fixup! generate storage key, create expected Secret

* database secret handling corrections

* combine database connection password and encryption key into one secret

* correct storage secret key/access key

* update datasource port number to not conflict with storage

* precreate eventtemplates bucket

* remove storage volume parameter overrides

* use HTTP for Cryostat probe even when TLS is enabled - TLS will be done via auth proxy later

* correct environment variable names for proxy awareness

* Fix remaining merge conflict

* Fix makefile

* config cleanup and test fixup

---------

Signed-off-by: Thuan Vo <thuan.votann@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Thuan Vo <thuan.votann@gmail.com>
Co-authored-by: Andrew Azores <aazores@redhat.com>
Co-authored-by: Ming Yu Wang <90855268+mwangggg@users.noreply.github.com>
Co-authored-by: Ming Wang <miwan@redhat.com>
Co-authored-by: Elliott Baron <ebaron@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Add a Scorecard test that manages recordings
4 participants