Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: drop redundant (jruby-complete.jar) dependency #13159

Merged
merged 66 commits into from
Feb 2, 2022

Conversation

kares
Copy link
Contributor

@kares kares commented Aug 23, 2021

ATM, LS startup depends on jruby-complete.jar (which includes the Ruby stdlib) from logstash-core/lib/jars.
On the other hand, LS has an expanded JRuby installation located at vendor/jruby.

Release notes

[rn:skip]

What does this PR do?

Changes how LS boots, this boot process is expected to be slightly faster, if noticeable at all.
More importantly it will allow LS to easily patch the standard library (and potentially fix some issues).

Why is it important/What is the impact to the user?

No impact for the user.

Author's Checklist

  • adjust (and test) Windows .bat scripts
  • restore custom.jruby.path
  • fix license report generation (by providing meta-data for vendor/jruby/lib/jruby.jar)

How to test this PR locally

The usual bin/logstash ... way.

Related issues

@kares kares force-pushed the drop-complete-jar branch 5 times, most recently from 69f9e41 to dd8eb7f Compare August 29, 2021 10:13
for J in $(cd "${jar_directory}"; ls *.jar); do
classpath=${classpath}${classpath:+:}${jar_directory}/${J}
done
echo "${classpath}"
}

# set up CLASSPATH as a side-effect, we start to java processes
CLASSPATH="${JRUBY_HOME}/lib/jruby.jar"
CLASSPATH="$(setup_classpath $CLASSPATH $LOGSTASH_JARS)"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we now set up the CP once (as it's used by 2 java invocations)

@@ -9,7 +9,6 @@ export GRADLE_OPTS="-Xmx4g -Dorg.gradle.jvmargs=-Xmx4g -Dorg.gradle.daemon=false

export SPEC_OPTS="--order rand --format documentation"
export CI=true
export OSS=true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated - but this should not longer be needed


require "rspec/core"
require "rspec"
require 'ci/reporter/rake/rspec_loader'

RSpec.world.reset # if multiple rspec runs occur in a single process, the RSpec "world" state needs to be reset.
RSpec.clear_examples # if multiple rspec runs occur in a single process, the RSpec "world" state needs to be reset.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the official way to do a proper RSpec "reset"

@@ -105,7 +105,7 @@ tasks.register("javaTests", Test) {
tasks.register("rubyTests", Test) {
inputs.files fileTree("${projectDir}/lib")
inputs.files fileTree("${projectDir}/spec")
systemProperty 'logstash.core.root.dir', projectDir.absolutePath
systemProperty 'logstash.root.dir', projectDir.parent
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a new convention for all RSpec Java invocations ...
also systemProperty 'logstash.core.root.dir' wasn't previously always pointing to logstash-core
(e.g. from x-pack/build.gradle)

@@ -151,10 +151,9 @@ idea {
}
}

def customJRubyDir = project.hasProperty("custom.jruby.path") ? project.property("custom.jruby.path") : ""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

custom.jruby.path would need more work to get right,
should be unnecessary as we only need to replace vendor/jruby and no longer the complete.jar

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this was in place so that we could "try" a custom jruby build in local development (including one that was locally built).

If your solution satisfies this need in some other way, I think we can safely nix the custom.jruby.path property (but we may want to leave a breadcrumb pointing to the official way if that property is set).


IIRC, we also had some CI resources at one point that would occasionally take snapshot jruby builds and run our tests against them, but I do not know if/where that exists these days).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+++ currently we document how to run logstash with a custom ruby, and I've used this in the past to test jruby snapshots and PRs, I'd like for this to not rot and still provide docs and gradle tasks to somehow run with a custom jruby, even if the docs are: build jruby locally, copy the jar here, and the stdlib there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

custom.jruby.path would need more work to get right,

was wrong here - the custom.jruby.path bootstrap mechanism in the base build.gradle simply skips downloading JRuby and will (build and) copy a vendor/jruby from the locally build jruby artifact. there's no special need to select the proper jruby-complete.jar from the custom location since the logstash-core now relies on the vendor/jruby/lib/jruby.jar as an api dependency.

thus actually in case of a ./gradlew clean test -Pcustom.jruby.path=... everything operates as before

@@ -32,7 +32,7 @@
import static org.junit.Assert.assertNull;
import static org.junit.Assert.assertTrue;

public class AccessorsTest {
public class AccessorsTest extends RubyTestBase {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these Java tests caused JRuby runtime initialization ...
-> we would lost control of booting a proper RubyUtil.RUBY global runtime

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔

I get that we need a consistent hook to ensure exactly one global runtime.

But I'm not sure that a superclass is the best path. What happens when a new test that uses jruby bits fails to subclass this testing base? Do we end up with wildly-broken behaviour that is easy to trace down, or subtly-broken behaviour that continues to work unless the right edge-case is hit? Is there anything we can do to ensure that we get the former instead of the latter?

Would it be possible to have this behaviour powered by an annotation instead of subclassing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to have this behaviour powered by an annotation instead of subclassing?

yes it would be possible - there's simply an existing convention for some of the IR "ruby-aware" tests
to use RubyEnvTestCase.java as a super-class, so here we're following the same ...

also considered the other option of having a JUnit style @BeforeClass boot but that still would depend on order or hooks since the RubyEnvTestCase.java is using one where it assumes RubyUtil.RUBY already booted.

But I'm not sure that a superclass is the best path. What happens when a new test that uses jruby bits fails to subclass this testing base? Do we end up with wildly-broken behaviour that is easy to trace down, or subtly-broken behaviour that continues to work unless the right edge-case is hit? Is there anything we can do to ensure that we get the former instead of the latter?

Adding/changing an existing Java test (which would be using org.jruby bits directly or indirectly and forgot to sub-class or annotate) would lead to an AssertionError when initializing the base class (or annotation) -> all classes that depend on the class than lead to a NoClassDefFoundError ... here's a sample CI output where I forgot to sub-class one case

whether it's an annotation or sub-class does not make much of a difference (in terms of the failure, I think) ...
if you feel it still provides readability to have an annotation instead I am happy to refactor 😉

def output_exists?
File.exists?(@actual_output)
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defunct - no longer used

def teardown
File.delete(@actual_output) if @settings.is_set?("actual_output") && output_exists?
Copy link
Contributor Author

@kares kares Aug 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the cleanup was moved to the spec -> as it had assumptions about current work-dir.
(an output { file { path => ... } } is considered LS_HOME relative - this is now explicit in the one spec using actual_output: ...)

@kares kares marked this pull request as ready for review August 30, 2021 18:20
Copy link
Member

@yaauie yaauie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass: I have made a few comments.

In general, this looks sensible. I'll spend some time ruminating on it and come back later in the week for a full review.

@@ -20,13 +20,18 @@
require "logstash-core"
require "logstash/environment"

$LOAD_PATH.unshift(File.join(LogStash::Environment::LOGSTASH_CORE, "spec"))
# Bundler + gemspec already setup $LOAD_PATH << '.../lib'
# but since we load specs from 2 locations we need to hook up these:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 locations? ${LOGSTASH_HOME}/x-pack/spec

Copy link
Contributor Author

@kares kares Aug 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kind of yes but the re-use in x-pack is "secondary" in the context here.
for core we load the runtime with CWD == LS_HOME (this is what IDEA does when a test is run from IDE)
for x-pack we leave the CWD as Gradle changes it (to CWD == LS_HOME/x-pack) thus spec/**/*_spec.rb will work.

again this could use a cleanup where the path setting would be orthogonal - done before loading the script - I simply did not want to spent too much time here but if it's a concern I am happy to do so...

@@ -151,10 +151,9 @@ idea {
}
}

def customJRubyDir = project.hasProperty("custom.jruby.path") ? project.property("custom.jruby.path") : ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this was in place so that we could "try" a custom jruby build in local development (including one that was locally built).

If your solution satisfies this need in some other way, I think we can safely nix the custom.jruby.path property (but we may want to leave a breadcrumb pointing to the official way if that property is set).


IIRC, we also had some CI resources at one point that would occasionally take snapshot jruby builds and run our tests against them, but I do not know if/where that exists these days).

Comment on lines 218 to 235
/**
* Initialize a runtime configuration.
* @param lsHome the LOGSTASH_HOME
* @param args extra arguments (ARGV) to process
* @return a runtime configuration instance
*/
public static RubyInstanceConfig initRubyConfig(final Path lsHome,
final String... args) {
return initRubyConfigImpl(lsHome.toString(), safePath(lsHome, "vendor", "jruby"), args);
}

/**
* Initialize a runtime configuration.
* @param lsHome the LOGSTASH_HOME
* @param args extra arguments (ARGV) to process
* @return a runtime configuration instance
*/
public static RubyInstanceConfig initRubyConfig(final Path lsHome,
final String currentDir,
final String... args) {
return initRubyConfigImpl(currentDir, safePath(lsHome, "vendor", "jruby"), args);
}

private static RubyInstanceConfig initRubyConfigImpl(final String currentDir,
final String jrubyHome,
final String[] args) {
final RubyInstanceConfig config = new RubyInstanceConfig();
if (currentDir != null) config.setCurrentDirectory(currentDir);
config.setJRubyHome(jrubyHome);
config.processArguments(args);
return config;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section would read a bit more clearly to me if we consistently used Path (or even String) in our own interfaces, instead of intermixing Path and String to represent paths (unless there is something I am missing -- can jrubyHome be something other than a valid path?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it can be smt else, in theory e.g. a classpath://... like entry.
however should be fine with a Path type (for now) on the public methods - good point, updated!

@@ -32,7 +32,7 @@
import static org.junit.Assert.assertNull;
import static org.junit.Assert.assertTrue;

public class AccessorsTest {
public class AccessorsTest extends RubyTestBase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔

I get that we need a consistent hook to ensure exactly one global runtime.

But I'm not sure that a superclass is the best path. What happens when a new test that uses jruby bits fails to subclass this testing base? Do we end up with wildly-broken behaviour that is easy to trace down, or subtly-broken behaviour that continues to work unless the right edge-case is hit? Is there anything we can do to ensure that we get the former instead of the latter?

Would it be possible to have this behaviour powered by an annotation instead of subclassing?

@@ -200,6 +198,11 @@ public void testCloseFlush() throws Exception {
}
}

private void sleep(long millis) throws InterruptedException {
Thread.sleep(millis);
Thread.yield();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on why Thread#yield is needed after coming back from a timed Thread#sleep?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crumbs from my CI fight - was getting failures here, seemed the sleep wasn't enough to trigger the other thread.
so I increased the sleep time a bit and added an explicit yield - which has the intent that we're expecting another thread to kick in (subjective preference on mine). if the yield seems confusing I am happy to revert.

@@ -44,15 +42,14 @@ public static void before() {
private static void ensureLoadpath() {
final LoadService loader = RubyUtil.RUBY.getLoadService();
if (loader.findFileForLoad("logstash/compiler").library == null) {
final String gems = LS_HOME.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems a variation of our safePath helper from elsewhere may come in handy here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

honestly - this whole piece could use a refactoring as it's a bit worrying.
would rather drop it (had issues with this running against jruby-head) - but decided it's out-of-scope here.
there isn't anything new here - the vendor/bundle/jruby wasn't validated before ...
what's new is that we no longer have to rely on a system property being set to resolve the path.

@kares kares requested a review from yaauie October 26, 2021 11:32
kares and others added 3 commits January 20, 2022 09:23
Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
@kares
Copy link
Contributor Author

kares commented Jan 20, 2022

finally, managed to do a Windows tests (using this branch) - running LS seems to be working as expected:

win-no-jruby-complete

@kares
Copy link
Contributor Author

kares commented Jan 20, 2022

unfortunately things got 🔴 again due changes upstream - this time the introduction of JavaVersionCheck

@kares
Copy link
Contributor Author

kares commented Jan 20, 2022

@jsvd adressed concerns, tested under Windows and merged with main. please let me know if there's anything else.

@kares kares requested a review from jsvd January 20, 2022 15:39
@kares kares closed this Jan 28, 2022
@kares kares reopened this Feb 1, 2022
Copy link
Member

@jsvd jsvd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚢 :shipit:

@kares kares merged commit 3637a30 into elastic:main Feb 2, 2022
kaisecheng pushed a commit to kaisecheng/logstash that referenced this pull request Aug 27, 2022
Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
@kaisecheng kaisecheng mentioned this pull request Sep 5, 2022
6 tasks
andsel added a commit to andsel/logstash that referenced this pull request Oct 17, 2024
- usage of `data.convertToString().split(context, delimiter, MINUS_ONE);` instead of `data.convertToString().split(delimiter, -1);`
- avoid to extend BuffererdTokenir test cases from `org.logstash.RubyTestBase` which was introduced in elastic#13159
- JDK 8 compatibilities:
  - `Arrays.asList` vs `List.of`
  - `assertThrows` method from JUnit5 not available in JUnit4 so reimplemented in the test
edmocosta pushed a commit that referenced this pull request Oct 22, 2024
…onsume lines in case of lines bigger then sizeLimit (#16577)

* Bugfix for BufferedTokenizer to completely consume lines in case of lines bigger then sizeLimit (#16482)

Fixes the behaviour of the tokenizer to be able to work properly when buffer full conditions are met.

Updates BufferedTokenizerExt so that can accumulate token fragments coming from different data segments. When a "buffer full" condition is matched, it record this state in a local field so that on next data segment it can consume all the token fragments till the next token delimiter.
Updated the accumulation variable from RubyArray containing strings to a StringBuilder which contains the head token, plus the remaining token fragments are stored in the input array.
Furthermore it translates the `buftok_spec` tests into JUnit tests.

* Fixed compilation error due to libraries incompatibilities

- usage of `data.convertToString().split(context, delimiter, MINUS_ONE);` instead of `data.convertToString().split(delimiter, -1);`
- avoid to extend BuffererdTokenir test cases from `org.logstash.RubyTestBase` which was introduced in #13159
- JDK 8 compatibilities:
  - `Arrays.asList` vs `List.of`
  - `assertThrows` method from JUnit5 not available in JUnit4 so reimplemented in the test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants