Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change telemetry devices to rely on jvm.config instead of ES_JAVA_OPTS #711

Merged
merged 8 commits into from
Jun 24, 2019

Conversation

ebadyano
Copy link
Contributor

@ebadyano ebadyano commented Jun 13, 2019

Move telemetry devices that currently rely on ES_JAVA_OPTS from the
launcher to the provisioner and instead persist the necessary
information in config/jvm.options

Relates #697

Move telemetry devices that currently rely on ES_JAVA_OPTS from the
launcher to the provisioner and instead persist the necessary
information in config/jvm.options
@ebadyano
Copy link
Contributor Author

Depends on elastic/rally-teams#28

Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks already promising. :) I left a few comments.

@@ -38,11 +38,22 @@ def local_provisioner(cfg, car, plugins, cluster_settings, all_node_ips, target_
node_root_dir = "%s/%s" % (target_root, node_name)

_, java_home = java_resolver.java_home(car, cfg)

node_telemetry_dir = "%s/telemetry" % node_root_dir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should replace this with os.path.join(node_root_dir, "telemetry").

@@ -165,6 +177,9 @@ def prepare(self, binary):
# determine after installation because some variables will depend on the install directory
target_root_path = self.es_installer.es_home_path
provisioner_vars = self._provisioner_variables()

# add java options for telemetry devices
provisioner_vars.update({"additional_java_settings" : self.telemetry.instrument_candidate_env(self.es_installer.car, self.es_installer.node_name)})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could move this to prepare()? Also, a simpler option would be to write it like:

provisioner_vars["additional_java_settings"] = self.telemetry.instrument_candidate_env(self.es_installer.car, self.es_installer.node_name)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, not sure what you mean, this snippet is already in prepare()..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should i move it to _provisioner_variables?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant self._provisioner_variables(), yes. :)

@@ -47,16 +47,12 @@ def __init__(self, enabled_devices=None, devices=None):
self.devices = devices

def instrument_candidate_env(self, car, candidate_id):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we should rename this method because we now provide only Java options.

@@ -47,16 +47,12 @@ def __init__(self, enabled_devices=None, devices=None):
self.devices = devices

def instrument_candidate_env(self, car, candidate_id):
opts = {}
opts = []
for device in self.devices:
if self._enabled(device):
additional_opts = device.instrument_env(car, candidate_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly we should rename instrument_env of the individual telemetry devices.

@@ -190,7 +186,7 @@ def instrument_env(self, car, candidate_id):
java_opts = self.java_opts(log_file)

self.logger.info("jfr: Adding JVM arguments: [%s].", java_opts)
return {"ES_JAVA_OPTS": java_opts}
return java_opts.split()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we just need to return java_opts here? A Python list does not provide a split method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielmitterdorfer For jfr version of this method, I kept java_opts as string and then split it into list by white space, the code seemed simpler that way. But I can change it to use list from the beginning. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

java_opts = self.java_opts(log_file) for FlightRecorder still returns string, and here I split it into list. Do you think I should change java_opts to return list instead to make it more clear?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we should always return lists.

return {"ES_JAVA_OPTS": "-XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation "
"-XX:LogFile=%s -XX:+PrintAssembly" % log_file}
return ["-XX:+UnlockDiagnosticVMOptions", "-XX:+TraceClassLoading", "-XX:+LogCompilation",
"-XX:LogFile=%s", "-XX:+PrintAssembly" % log_file ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be:

"-XX:LogFile={}".format(log_file), "-XX:+PrintAssembly"

return {"ES_JAVA_OPTS": "-Xloggc:%s -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps "
"-XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime "
"-XX:+PrintTenuringDistribution" % log_file}
return ["-Xloggc:%s", "-XX:+PrintGCDetails", "-XX:+PrintGCDateStamps", "-XX:+PrintGCTimeStamps",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly "-Xloggc:{}".format(log_file)

else:
# see https://docs.oracle.com/javase/9/tools/java.htm#JSWOR-GUID-BE93ABDC-999C-4CB5-A88B-1994AAAC74D5
return {"ES_JAVA_OPTS": "-Xlog:gc*=info,safepoint=info,age*=trace:file=%s:utctime,uptimemillis,level,tags:filecount=0" % log_file}
return ["-Xlog:gc*=info,safepoint=info,age*=trace:file=%s:utctime,uptimemillis,level,tags:filecount=0" % log_file]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For newer code we should use #format() instead of % unless it is a hot code path.

@danielmitterdorfer danielmitterdorfer added :Benchmark Candidate Management Anything affecting how Rally sets up Elasticsearch :misc Changes that don't affect users directly: linter fixes, test improvements, etc. enhancement Improves the status quo labels Jun 14, 2019
@ebadyano ebadyano marked this pull request as ready for review June 18, 2019 14:21
ebadyano added a commit to elastic/rally-teams that referenced this pull request Jun 18, 2019
ebadyano added a commit to elastic/rally-teams that referenced this pull request Jun 18, 2019
ebadyano added a commit to elastic/rally-teams that referenced this pull request Jun 18, 2019
@ebadyano
Copy link
Contributor Author

@danielmitterdorfer Should I also add support for telemetry java options in docker provisioner in this PR or a different PR?

@danielmitterdorfer
Copy link
Member

IMHO this change should be consistent so you'd need to adapt the Docker provisioner as well. As I will start my vacation at the end of the day it would be best if @dliappis could pick this up as a reviewer.

@danielmitterdorfer
Copy link
Member

IMHO this change should be consistent so you'd need to adapt the Docker provisioner as well.

When I wrote this originally I was under the impression that we would already support this for Docker but we don't. Therefore there is no need to add this new functionality in this PR and we can defer this to a later point in time if needed.

@dliappis
Copy link
Contributor

@ebadyano I will do a review as well and test this a bit, so let's not merge yet please until I've done that too.

Copy link
Contributor

@dliappis dliappis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good and worked on some scenarios that I tried. I left a few PEP-8 related styling comments.

self.es_installer.install(binary["elasticsearch"])
# we need to immediately delete it as plugins may copy their configuration during installation.
self.es_installer.delete_pre_bundled_configuration()

# determine after installation because some variables will depend on the install directory
target_root_path = self.es_installer.es_home_path
provisioner_vars = self._provisioner_variables()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for extra space on this empty line

if java_opts:
provisioner_vars["additional_java_settings"] = java_opts


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just 1 empty line is enough here.

return {"ES_JAVA_OPTS": "-XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation "
"-XX:LogFile=%s -XX:+PrintAssembly" % log_file}
return ["-XX:+UnlockDiagnosticVMOptions", "-XX:+TraceClassLoading", "-XX:+LogCompilation",
"-XX:LogFile={}".format(log_file), "-XX:+PrintAssembly"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This over-indented, see https://lintlyci.github.io/Flake8Rules/rules/E127.html; would be great if we could align.

"-XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime "
"-XX:+PrintTenuringDistribution" % log_file}
return ["-Xloggc:{}".format(log_file), "-XX:+PrintGCDetails", "-XX:+PrintGCDateStamps", "-XX:+PrintGCTimeStamps",
"-XX:+PrintGCApplicationStoppedTime", "-XX:+PrintGCApplicationConcurrentTime",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar PEP-8 style comment here: https://lintlyci.github.io/Flake8Rules/rules/E127.html

"-XX:FlightRecorderOptions=disk=true,maxage=0s,maxsize=0,dumponexit=true,"
"dumponexitpath=/var/log/test-recording.jfr -XX:StartFlightRecording=defaultrecording=true", java_opts)
"dumponexitpath=/var/log/test-recording.jfr", "-XX:StartFlightRecording=defaultrecording=true"], java_opts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment for indentation here: https://lintlyci.github.io/Flake8Rules/rules/E127.html

You can consider a syntax like the following as well to make it a bit cleaner, if you prefer:

       self.assertEqual(
            ["-XX:+UnlockDiagnosticVMOptions", "-XX:+DebugNonSafepoints", "-XX:+UnlockCommercialFeatures", "-XX:+FlightRecorder",
             "-XX:FlightRecorderOptions=disk=true,maxage=0s,maxsize=0,dumponexit=true,"
             "dumponexitpath=/var/log/test-recording.jfr", "-XX:StartFlightRecording=defaultrecording=true"], java_opts)

self.assertEqual("-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints "
"-XX:StartFlightRecording=maxsize=0,maxage=0s,disk=true,dumponexit=true,filename=/var/log/test-recording.jfr",
self.assertEqual(["-XX:+UnlockDiagnosticVMOptions", "-XX:+DebugNonSafepoints",
"-XX:StartFlightRecording=maxsize=0,maxage=0s,disk=true,dumponexit=true,filename=/var/log/test-recording.jfr"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

java_opts)

def test_sets_options_for_pre_java_9_custom_recording_template(self):
jfr = telemetry.FlightRecorder(telemetry_params={"recording-template": "profile"},
log_root="/var/log",
java_major_version=random.randint(0, 8))
java_opts = jfr.java_opts("/var/log/test-recording.jfr")
self.assertEqual("-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -XX:+UnlockCommercialFeatures -XX:+FlightRecorder "
self.assertEqual(["-XX:+UnlockDiagnosticVMOptions", "-XX:+DebugNonSafepoints", "-XX:+UnlockCommercialFeatures", "-XX:+FlightRecorder",
"-XX:FlightRecorderOptions=disk=true,maxage=0s,maxsize=0,dumponexit=true,"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"-XX:StartFlightRecording=maxsize=0,maxage=0s,disk=true,dumponexit=true,"
"filename=/var/log/test-recording.jfr,settings=profile",
"filename=/var/log/test-recording.jfr,settings=profile"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this section needs to align with https://lintlyci.github.io/Flake8Rules/rules/E127.html

java_opts)

def test_sets_options_for_java_11_or_above_custom_recording_template(self):
jfr = telemetry.FlightRecorder(telemetry_params={"recording-template": "profile"},
log_root="/var/log",
java_major_version=random.randint(11, 999))
java_opts = jfr.java_opts("/var/log/test-recording.jfr")
self.assertEqual("-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints "
self.assertEqual(["-XX:+UnlockDiagnosticVMOptions", "-XX:+DebugNonSafepoints",
"-XX:StartFlightRecording=maxsize=0,maxage=0s,disk=true,dumponexit=true,"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this section needs to align with https://lintlyci.github.io/Flake8Rules/rules/E127.html

gc_java_opts = gc.java_opts("/var/log/defaults-node-0.gc.log")
self.assertEqual(7, len(gc_java_opts))
self.assertEqual(["-Xloggc:/var/log/defaults-node-0.gc.log", "-XX:+PrintGCDetails", "-XX:+PrintGCDateStamps", "-XX:+PrintGCTimeStamps",
"-XX:+PrintGCApplicationStoppedTime", "-XX:+PrintGCApplicationConcurrentTime", "-XX:+PrintTenuringDistribution"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this section needs to align with https://lintlyci.github.io/Flake8Rules/rules/E127.html

@ebadyano
Copy link
Contributor Author

Thank you for the review @dliappis . I update the change according to your comments.

Copy link
Contributor

@dliappis dliappis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you.

@ebadyano ebadyano merged commit a9df624 into elastic:master Jun 24, 2019
@dliappis dliappis added this to the 1.3.0 milestone Jul 3, 2019
ebadyano added a commit to ebadyano/rally that referenced this pull request Jul 31, 2019
By using ES_JAVA_OPTS we can provision a node, run a benchmark, and then
“dynamically” (i.e. without reprovisioning) start the node again with
telemetry attached.

Relates to elastic#697
Relates to elastic#711
danielmitterdorfer pushed a commit to elastic/rally-teams that referenced this pull request Aug 13, 2019
danielmitterdorfer pushed a commit to elastic/rally-teams that referenced this pull request Aug 13, 2019
danielmitterdorfer pushed a commit to elastic/rally-teams that referenced this pull request Aug 13, 2019
danielmitterdorfer pushed a commit to elastic/rally-teams that referenced this pull request Aug 13, 2019
ebadyano added a commit that referenced this pull request Aug 20, 2019
By using ES_JAVA_OPTS we can provision a node, run a benchmark, and then
“dynamically” (i.e. without reprovisioning) start the node again with
telemetry attached.

Relates to #697
Relates to #711
novosibman pushed a commit to novosibman/rally that referenced this pull request Oct 2, 2019
By using ES_JAVA_OPTS we can provision a node, run a benchmark, and then
“dynamically” (i.e. without reprovisioning) start the node again with
telemetry attached.

Relates to elastic#697
Relates to elastic#711
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Benchmark Candidate Management Anything affecting how Rally sets up Elasticsearch enhancement Improves the status quo :misc Changes that don't affect users directly: linter fixes, test improvements, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants