Refactor: drop redundant (jruby-complete.jar) dependency #13159

kares · 2021-08-23T16:15:29Z

ATM, LS startup depends on jruby-complete.jar (which includes the Ruby stdlib) from logstash-core/lib/jars.
On the other hand, LS has an expanded JRuby installation located at vendor/jruby.

Release notes

[rn:skip]

What does this PR do?

Changes how LS boots, this boot process is expected to be slightly faster, if noticeable at all.
More importantly it will allow LS to easily patch the standard library (and potentially fix some issues).

Why is it important/What is the impact to the user?

No impact for the user.

Author's Checklist

adjust (and test) Windows .bat scripts
~~restore custom.jruby.path~~
fix license report generation (by providing meta-data for vendor/jruby/lib/jruby.jar)

How to test this PR locally

The usual bin/logstash ... way.

Related issues

kares · 2021-08-30T18:07:03Z

bin/logstash.lib.sh

  for J in $(cd "${jar_directory}"; ls *.jar); do
    classpath=${classpath}${classpath:+:}${jar_directory}/${J}
  done
  echo "${classpath}"
 }

+# set up CLASSPATH as a side-effect, we start to java processes
+CLASSPATH="${JRUBY_HOME}/lib/jruby.jar"
+CLASSPATH="$(setup_classpath $CLASSPATH $LOGSTASH_JARS)"


we now set up the CP once (as it's used by 2 java invocations)

kares · 2021-08-30T18:08:05Z

ci/integration_tests.sh

@@ -9,7 +9,6 @@ export GRADLE_OPTS="-Xmx4g -Dorg.gradle.jvmargs=-Xmx4g -Dorg.gradle.daemon=false

 export SPEC_OPTS="--order rand --format documentation"
 export CI=true
-export OSS=true


unrelated - but this should not longer be needed

kares · 2021-08-30T18:08:30Z

lib/bootstrap/rspec.rb


 require "rspec/core"
 require "rspec"
 require 'ci/reporter/rake/rspec_loader'

-RSpec.world.reset # if multiple rspec runs occur in a single process, the RSpec "world" state needs to be reset.
+RSpec.clear_examples # if multiple rspec runs occur in a single process, the RSpec "world" state needs to be reset.


the official way to do a proper RSpec "reset"

kares · 2021-08-30T18:10:57Z

logstash-core/build.gradle

@@ -105,7 +105,7 @@ tasks.register("javaTests", Test) {
 tasks.register("rubyTests", Test) {
    inputs.files fileTree("${projectDir}/lib")
    inputs.files fileTree("${projectDir}/spec")
-    systemProperty 'logstash.core.root.dir', projectDir.absolutePath
+    systemProperty 'logstash.root.dir', projectDir.parent


a new convention for all RSpec Java invocations ...
also systemProperty 'logstash.core.root.dir' wasn't previously always pointing to logstash-core
(e.g. from x-pack/build.gradle)

kares · 2021-08-30T18:13:09Z

logstash-core/build.gradle

@@ -151,10 +151,9 @@ idea {
    }
 }

-def customJRubyDir = project.hasProperty("custom.jruby.path") ? project.property("custom.jruby.path") : ""


custom.jruby.path would need more work to get right,
should be unnecessary as we only need to replace vendor/jruby and no longer the complete.jar

I think that this was in place so that we could "try" a custom jruby build in local development (including one that was locally built).

If your solution satisfies this need in some other way, I think we can safely nix the custom.jruby.path property (but we may want to leave a breadcrumb pointing to the official way if that property is set).

IIRC, we also had some CI resources at one point that would occasionally take snapshot jruby builds and run our tests against them, but I do not know if/where that exists these days).

+++ currently we document how to run logstash with a custom ruby, and I've used this in the past to test jruby snapshots and PRs, I'd like for this to not rot and still provide docs and gradle tasks to somehow run with a custom jruby, even if the docs are: build jruby locally, copy the jar here, and the stdlib there.

custom.jruby.path would need more work to get right,

was wrong here - the custom.jruby.path bootstrap mechanism in the base build.gradle simply skips downloading JRuby and will (build and) copy a vendor/jruby from the locally build jruby artifact. there's no special need to select the proper jruby-complete.jar from the custom location since the logstash-core now relies on the vendor/jruby/lib/jruby.jar as an api dependency.

thus actually in case of a ./gradlew clean test -Pcustom.jruby.path=... everything operates as before

kares · 2021-08-30T18:14:47Z

logstash-core/src/test/java/org/logstash/AccessorsTest.java

@@ -32,7 +32,7 @@
 import static org.junit.Assert.assertNull;
 import static org.junit.Assert.assertTrue;

-public class AccessorsTest {
+public class AccessorsTest extends RubyTestBase {


these Java tests caused JRuby runtime initialization ...
-> we would lost control of booting a proper RubyUtil.RUBY global runtime

🤔

I get that we need a consistent hook to ensure exactly one global runtime.

But I'm not sure that a superclass is the best path. What happens when a new test that uses jruby bits fails to subclass this testing base? Do we end up with wildly-broken behaviour that is easy to trace down, or subtly-broken behaviour that continues to work unless the right edge-case is hit? Is there anything we can do to ensure that we get the former instead of the latter?

Would it be possible to have this behaviour powered by an annotation instead of subclassing?

Would it be possible to have this behaviour powered by an annotation instead of subclassing?

yes it would be possible - there's simply an existing convention for some of the IR "ruby-aware" tests
to use RubyEnvTestCase.java as a super-class, so here we're following the same ...

also considered the other option of having a JUnit style @BeforeClass boot but that still would depend on order or hooks since the RubyEnvTestCase.java is using one where it assumes RubyUtil.RUBY already booted.

But I'm not sure that a superclass is the best path. What happens when a new test that uses jruby bits fails to subclass this testing base? Do we end up with wildly-broken behaviour that is easy to trace down, or subtly-broken behaviour that continues to work unless the right edge-case is hit? Is there anything we can do to ensure that we get the former instead of the latter?

Adding/changing an existing Java test (which would be using org.jruby bits directly or indirectly and forgot to sub-class or annotate) would lead to an AssertionError when initializing the base class (or annotation) -> all classes that depend on the class than lead to a NoClassDefFoundError ... here's a sample CI output where I forgot to sub-class one case

whether it's an annotation or sub-class does not make much of a difference (in terms of the failure, I think) ...
if you feel it still provides readability to have an annotation instead I am happy to refactor 😉

kares · 2021-08-30T18:17:24Z

qa/integration/framework/fixture.rb

-  def output_exists?
-    File.exists?(@actual_output)
-  end
-


defunct - no longer used

kares · 2021-08-30T18:19:02Z

qa/integration/framework/fixture.rb

  def teardown
-    File.delete(@actual_output) if @settings.is_set?("actual_output") && output_exists?


the cleanup was moved to the spec -> as it had assumptions about current work-dir.
(an output { file { path => ... } } is considered LS_HOME relative - this is now explicit in the one spec using actual_output: ...)

yaauie

First pass: I have made a few comments.

In general, this looks sensible. I'll spend some time ruminating on it and come back later in the week for a full review.

yaauie · 2021-08-30T19:16:42Z

lib/bootstrap/rspec.rb

@@ -20,13 +20,18 @@
 require "logstash-core"
 require "logstash/environment"

-$LOAD_PATH.unshift(File.join(LogStash::Environment::LOGSTASH_CORE, "spec"))
+# Bundler + gemspec already setup $LOAD_PATH << '.../lib'
+# but since we load specs from 2 locations we need to hook up these:


3 locations? ${LOGSTASH_HOME}/x-pack/spec

kind of yes but the re-use in x-pack is "secondary" in the context here.
for core we load the runtime with CWD == LS_HOME (this is what IDEA does when a test is run from IDE)
for x-pack we leave the CWD as Gradle changes it (to CWD == LS_HOME/x-pack) thus spec/**/*_spec.rb will work.

again this could use a cleanup where the path setting would be orthogonal - done before loading the script - I simply did not want to spent too much time here but if it's a concern I am happy to do so...

yaauie · 2021-08-30T19:19:45Z

logstash-core/build.gradle

@@ -151,10 +151,9 @@ idea {
    }
 }

-def customJRubyDir = project.hasProperty("custom.jruby.path") ? project.property("custom.jruby.path") : ""


I think that this was in place so that we could "try" a custom jruby build in local development (including one that was locally built).

If your solution satisfies this need in some other way, I think we can safely nix the custom.jruby.path property (but we may want to leave a breadcrumb pointing to the official way if that property is set).

IIRC, we also had some CI resources at one point that would occasionally take snapshot jruby builds and run our tests against them, but I do not know if/where that exists these days).

yaauie · 2021-08-30T19:47:10Z

logstash-core/src/main/java/org/logstash/Logstash.java

+    /**
+     * Initialize a runtime configuration.
+     * @param lsHome the LOGSTASH_HOME
+     * @param args extra arguments (ARGV) to process
+     * @return a runtime configuration instance
+     */
+    public static RubyInstanceConfig initRubyConfig(final Path lsHome,
+                                                    final String... args) {
+        return initRubyConfigImpl(lsHome.toString(), safePath(lsHome, "vendor", "jruby"), args);
+    }
+
+    /**
+     * Initialize a runtime configuration.
+     * @param lsHome the LOGSTASH_HOME
+     * @param args extra arguments (ARGV) to process
+     * @return a runtime configuration instance
+     */
+    public static RubyInstanceConfig initRubyConfig(final Path lsHome,
+                                                    final String currentDir,
+                                                    final String... args) {
+        return initRubyConfigImpl(currentDir, safePath(lsHome, "vendor", "jruby"), args);
+    }
+
+    private static RubyInstanceConfig initRubyConfigImpl(final String currentDir,
+                                                     final String jrubyHome,
+                                                     final String[] args) {
+        final RubyInstanceConfig config = new RubyInstanceConfig();
+        if (currentDir != null) config.setCurrentDirectory(currentDir);
+        config.setJRubyHome(jrubyHome);
+        config.processArguments(args);
+        return config;
+    }
+


This section would read a bit more clearly to me if we consistently used Path (or even String) in our own interfaces, instead of intermixing Path and String to represent paths (unless there is something I am missing -- can jrubyHome be something other than a valid path?).

yes it can be smt else, in theory e.g. a classpath://... like entry.
however should be fine with a Path type (for now) on the public methods - good point, updated!

yaauie · 2021-08-30T19:55:12Z

logstash-core/src/test/java/org/logstash/AccessorsTest.java

@@ -32,7 +32,7 @@
 import static org.junit.Assert.assertNull;
 import static org.junit.Assert.assertTrue;

-public class AccessorsTest {
+public class AccessorsTest extends RubyTestBase {


🤔

I get that we need a consistent hook to ensure exactly one global runtime.

But I'm not sure that a superclass is the best path. What happens when a new test that uses jruby bits fails to subclass this testing base? Do we end up with wildly-broken behaviour that is easy to trace down, or subtly-broken behaviour that continues to work unless the right edge-case is hit? Is there anything we can do to ensure that we get the former instead of the latter?

Would it be possible to have this behaviour powered by an annotation instead of subclassing?

yaauie · 2021-08-30T19:59:26Z

logstash-core/src/test/java/org/logstash/common/io/DeadLetterQueueWriterTest.java

@@ -200,6 +198,11 @@ public void testCloseFlush() throws Exception {
        }
    }

+    private void sleep(long millis) throws InterruptedException {
+        Thread.sleep(millis);
+        Thread.yield();


Can you elaborate on why Thread#yield is needed after coming back from a timed Thread#sleep?

crumbs from my CI fight - was getting failures here, seemed the sleep wasn't enough to trigger the other thread.
so I increased the sleep time a bit and added an explicit yield - which has the intent that we're expecting another thread to kick in (subjective preference on mine). if the yield seems confusing I am happy to revert.

yaauie · 2021-08-30T20:00:37Z

logstash-core/src/test/java/org/logstash/config/ir/RubyEnvTestCase.java

@@ -44,15 +42,14 @@ public static void before() {
    private static void ensureLoadpath() {
        final LoadService loader = RubyUtil.RUBY.getLoadService();
        if (loader.findFileForLoad("logstash/compiler").library == null) {
+            final String gems = LS_HOME.


it seems a variation of our safePath helper from elsewhere may come in handy here.

honestly - this whole piece could use a refactoring as it's a bit worrying.
would rather drop it (had issues with this running against jruby-head) - but decided it's out-of-scope here.
there isn't anything new here - the vendor/bundle/jruby wasn't validated before ...
what's new is that we no longer have to rely on a system property being set to resolve the path.

for now - due potential loading issues in tests

Co-authored-by: João Duarte <jsvd@users.noreply.github.com>

kares · 2022-01-20T11:43:37Z

finally, managed to do a Windows tests (using this branch) - running LS seems to be working as expected:

kares · 2022-01-20T11:44:50Z

unfortunately things got 🔴 again due changes upstream - this time the introduction of JavaVersionCheck

kares · 2022-01-20T15:39:19Z

@jsvd adressed concerns, tested under Windows and merged with main. please let me know if there's anything else.

jsvd

LGTM 🚢

Co-authored-by: João Duarte <jsvd@users.noreply.github.com>

- usage of `data.convertToString().split(context, delimiter, MINUS_ONE);` instead of `data.convertToString().split(delimiter, -1);` - avoid to extend BuffererdTokenir test cases from `org.logstash.RubyTestBase` which was introduced in elastic#13159 - JDK 8 compatibilities: - `Arrays.asList` vs `List.of` - `assertThrows` method from JUnit5 not available in JUnit4 so reimplemented in the test

…onsume lines in case of lines bigger then sizeLimit (#16577) * Bugfix for BufferedTokenizer to completely consume lines in case of lines bigger then sizeLimit (#16482) Fixes the behaviour of the tokenizer to be able to work properly when buffer full conditions are met. Updates BufferedTokenizerExt so that can accumulate token fragments coming from different data segments. When a "buffer full" condition is matched, it record this state in a local field so that on next data segment it can consume all the token fragments till the next token delimiter. Updated the accumulation variable from RubyArray containing strings to a StringBuilder which contains the head token, plus the remaining token fragments are stored in the input array. Furthermore it translates the `buftok_spec` tests into JUnit tests. * Fixed compilation error due to libraries incompatibilities - usage of `data.convertToString().split(context, delimiter, MINUS_ONE);` instead of `data.convertToString().split(delimiter, -1);` - avoid to extend BuffererdTokenir test cases from `org.logstash.RubyTestBase` which was introduced in #13159 - JDK 8 compatibilities: - `Arrays.asList` vs `List.of` - `assertThrows` method from JUnit5 not available in JUnit4 so reimplemented in the test

roaksoax added the status:work-in-progress label Aug 23, 2021

kares force-pushed the drop-complete-jar branch 5 times, most recently from 69f9e41 to dd8eb7f Compare August 29, 2021 10:13

kares commented Aug 30, 2021

View reviewed changes

qa/integration/framework/fixture.rb

def output_exists?

File.exists?(@actual_output)

end

Copy link

Contributor Author

kares Aug 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defunct - no longer used

kares commented Aug 30, 2021

View reviewed changes

kares marked this pull request as ready for review August 30, 2021 18:20

roaksoax added status:needs-review and removed status:work-in-progress labels Aug 30, 2021

yaauie reviewed Aug 30, 2021

View reviewed changes

kares requested a review from yaauie October 26, 2021 11:32

kares added 11 commits November 23, 2021 12:47

Refactor: drop jruby-complete.jar

bd2112d

setup CLASSPATH once - with (core) jruby.jar

d71905e

Chore: script cleanup

b6fd94b

RSpec setup for booting JRuby

38bc115

shared Ruby config initialization

8451fbb

Build: task depends on vendor/jruby/lib/jruby.jar

5996b0c

Build: missed dependencies for x-pack tests

647dc64

Test: avoid unit test depending on stdlib

1bd3ddc

for now - due potential loading issues in tests

make integration RSpecTests boot properly

983a8c3

detect and fix IDE/gradle w Ruby-aware unit tests

988ed14

consider system prop set with ./gradlew

61079e0

kares and others added 3 commits January 20, 2022 09:23

Merge branch 'main' into drop-complete-jar

3211346

Update logstash-core/src/test/java/org/logstash/RubyTestBase.java

b638b51

Co-authored-by: João Duarte <jsvd@users.noreply.github.com>

Update qa/integration/specs/kafka_input_spec.rb

5f7090e

Co-authored-by: João Duarte <jsvd@users.noreply.github.com>

kares added status:work-in-progress and removed status:needs-review labels Jan 20, 2022

roaksoax added status:needs-review and removed status:work-in-progress labels Jan 20, 2022

kares added 2 commits January 20, 2022 14:34

Test: fix syntax error introduced from review

da8817f

Fix: setup_classpath once (after version check merge)

fedddf7

kares requested a review from jsvd January 20, 2022 15:39

kares closed this Jan 28, 2022

roaksoax removed the status:needs-review label Jan 28, 2022

kares reopened this Feb 1, 2022

Merge branch 'main' into drop-complete-jar

aa43990

roaksoax added the status:needs-review label Feb 1, 2022

jsvd approved these changes Feb 1, 2022

View reviewed changes

roaksoax added status:approved and removed status:needs-review labels Feb 1, 2022

kares merged commit 3637a30 into elastic:main Feb 2, 2022

roaksoax removed the status:approved label Feb 2, 2022

kares mentioned this pull request Feb 17, 2022

Review LS startup to re-use expanded JRuby bits #12741

Closed

kaisecheng mentioned this pull request Aug 27, 2022

Temporary JRuby files in /tmp are not cleaned up in SuSE #11051

Closed

kaisecheng pushed a commit to kaisecheng/logstash that referenced this pull request Aug 27, 2022

Refactor: drop redundant (jruby-complete.jar) dependency (elastic#13159)

b19a16e

Co-authored-by: João Duarte <jsvd@users.noreply.github.com>

kaisecheng mentioned this pull request Sep 5, 2022

clean jruby tmp directory #14476

Closed

6 tasks

andsel mentioned this pull request Oct 17, 2024

Backport #16482 to 7.17: Bugfix for BufferedTokenizer to completely consume lines in case of lines bigger then sizeLimit #16577

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: drop redundant (jruby-complete.jar) dependency #13159

Refactor: drop redundant (jruby-complete.jar) dependency #13159

kares commented Aug 23, 2021 •

edited

Loading

kares Aug 30, 2021

kares Aug 30, 2021

kares Aug 30, 2021

kares Aug 30, 2021

kares Aug 30, 2021

yaauie Aug 30, 2021

jsvd Dec 9, 2021

kares Jan 20, 2022

kares Aug 30, 2021

yaauie Aug 30, 2021

kares Aug 31, 2021

kares Aug 30, 2021

kares Aug 30, 2021 •

edited

Loading

yaauie left a comment

yaauie Aug 30, 2021

kares Aug 31, 2021 •

edited

Loading

yaauie Aug 30, 2021

yaauie Aug 30, 2021

kares Aug 31, 2021

yaauie Aug 30, 2021

yaauie Aug 30, 2021

kares Aug 31, 2021

yaauie Aug 30, 2021

kares Aug 31, 2021

kares commented Jan 20, 2022

kares commented Jan 20, 2022

kares commented Jan 20, 2022

jsvd left a comment

		def teardown
		File.delete(@actual_output) if @settings.is_set?("actual_output") && output_exists?

Refactor: drop redundant (jruby-complete.jar) dependency #13159

Refactor: drop redundant (jruby-complete.jar) dependency #13159

Conversation

kares commented Aug 23, 2021 • edited Loading

Release notes

What does this PR do?

Why is it important/What is the impact to the user?

Author's Checklist

How to test this PR locally

Related issues

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kares Aug 30, 2021 • edited Loading

Choose a reason for hiding this comment

yaauie left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kares Aug 31, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kares commented Jan 20, 2022

kares commented Jan 20, 2022

kares commented Jan 20, 2022

jsvd left a comment

Choose a reason for hiding this comment

kares commented Aug 23, 2021 •

edited

Loading

kares Aug 30, 2021 •

edited

Loading

kares Aug 31, 2021 •

edited

Loading