Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pin fileutils version to 1.7+ #16250

Merged
merged 2 commits into from
Jun 25, 2024
Merged

Pin fileutils version to 1.7+ #16250

merged 2 commits into from
Jun 25, 2024

Conversation

mashhurs
Copy link
Contributor

@mashhurs mashhurs commented Jun 21, 2024

Release notes

[rn:skip]

What does this PR do?

Pins the fileutils to 1.7+ version to apply the file removal logic updates.
Refer to the issue Logstash is facing:

Why is it important/What is the impact to the user?

The users who are on Windows OS and using aws-integration plugin S3 output feature, are having an issue that many temporary files are left that fileutil couldn't remove. With this change, temp dirs will be removed properly.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files (and/or docker env variables)
  • I have added tests that prove my fix is effective or that my feature works

Author's Checklist

  • [ ]

How to test this PR locally

Testing this behavior is very hard. In order to reproduce, as I remember, I did run for 2-3 days to get following exception (see the details):

An error occurred in the `on_complete` 
uploader {:exception=>Errno::ENOTEMPTY, :message=>"Directory not empty - D:/Logstash/temp/718dd509-c0f5-4da5-
b4b4-92608579e799", :path=>"D:\\Logstash\\temp/718dd509-c0f5-4da5-b4b4-
92608579e799/2023/05/25/ls.s3.b21c1c38-1dc2-443e-8491-847968549a30.2023-05-25T09.42.part0.txt.gz", 
:backtrace=>["org/jruby/RubyDir.java:471:in `rmdir'", "C:/Program 
...

Exhaustive pipeline status: https://buildkite.com/elastic/logstash-exhaustive-tests-pipeline/builds/546
To locally test:

  • Setup an environment
    • Windows OS with antivirus
    • Clone LS repo with current change and build. I have build artifacts (rake artifact:all) and copied generated artifact to Windows host.
    • Setup the pipeline which uploads files to S3
         input {
           generator {
                 id => "generator-id"
                 ecs_compatibility => disabled
                 count => 1000000
                 threads => 10
                 codec => json
                 lines => [
                     '{"fileset":{"module":"system","name":"asa"},"system":{"auth":{"timestamp":"May 17  05:17:00","ssh":{"source":{"ip":"54.160.29.55"}}}},"event":{"category":"cisco-category", "type":"cisco-type", "data":{"User-Name":"mashhur"}},"client":{"ar_net":"54.160.29.55", "ongisac_ip":"54.160.29.55", "ip":"54.160.29.55"}, "destination": {"ar_net":"54.160.29.55", "ongisac_ip":"54.160.29.55"}, "source": {"ar_net":"54.160.29.55", "ongisac_ip":"54.160.29.55"}, "url":{"ongisac_domain": "ip-172-31-4-132"}, "DstIP":"54.160.29.55","SrcIP":"54.160.29.55","orginalClientSrcIP":"54.160.29.55","ReferencedHost":"ip-172-31-4-132", "dns":{"question": {"ongisac_domain":"example.com/my-path?query=value"}}}'
                 ]
               }
         }
         output {
             s3 {
                 access_key_id => "{yourAccessID}"
                 secret_access_key => "{yourAccessKey}"
                 region => "aws-region-n"
                 bucket => "buckjet-name"
                 codec => "json_lines"
                 canned_acl => "private"
                 prefix => "test-%{+YYYY.MM.dd}"
                 additional_settings => {
                     "force_path_style" => true
                 }
                 upload_queue_size => 10
                 upload_workers_count => 10
                 rotation_strategy => "size_and_time"
         	size_file => 500 # make faster rotation, so that upload happens frequently
         	time_file => 1
                 temporary_directory => "/elastic/s3-output/temp"
                 validate_credentials_on_root_bucket => false
             }
         }
      
  • Run Logstash multiple times
    Note that, I have been running for over 2-days so far didn't face the issue.

Related issues

Use cases

  • aws-integration plugin with s3-output in use, on Windows OS, has especially file scanning tools like antivirus

Screenshots

Logs

PS C:\ls> .\bin\logstash -f .\config\s3.config.txt
"Using bundled JDK: C:\ls\jdk\bin\java.exe"
Sending Logstash logs to C:/ls/logs which is now configured via log4j2.properties
[2024-06-21T22:16:00,584][INFO ][logstash.runner          ] Log4j configuration path used is: C:\ls\config\log4j2.properties
[2024-06-21T22:16:00,597][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"8.15.0", "jruby.version"=>"jruby 9.4.7.0 (3.1.4) 2024-04-29 597ff08ac1 OpenJDK 64-Bit Server VM 21.0.3+9-LTS on 21.0.3+9-LTS +indy +jit [x86_64-mswin32]"}
[2024-06-21T22:16:00,597][INFO ][logstash.runner          ] JVM bootstrap flags: [-Xms1g, -Xmx1g, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djruby.compile.invokedynamic=true, -XX:+HeapDumpOnOutOfMemoryError, -Djava.security.egd=file:/dev/urandom, -Dlog4j2.isThreadContextMapInheritable=true, -Dlogstash.jackson.stream-read-constraints.max-string-length=200000000, -Dlogstash.jackson.stream-read-constraints.max-number-length=10000, -Djruby.regexp.interruptible=true, -Djdk.io.File.enableADS=true, --add-exports=jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED, --add-exports=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED, --add-exports=jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED, --add-exports=jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED, --add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED, --add-opens=java.base/java.security=ALL-UNNAMED, --add-opens=java.base/java.io=ALL-UNNAMED, --add-opens=java.base/java.nio.channels=ALL-UNNAMED, --add-opens=java.base/sun.nio.ch=ALL-UNNAMED, --add-opens=java.management/sun.management=ALL-UNNAMED, -Dio.netty.allocator.maxOrder=11]
[2024-06-21T22:16:00,597][INFO ][logstash.runner          ] Jackson default value override `logstash.jackson.stream-read-constraints.max-string-length` configured to `200000000`
[2024-06-21T22:16:00,597][INFO ][logstash.runner          ] Jackson default value override `logstash.jackson.stream-read-constraints.max-number-length` configured to `10000`
[2024-06-21T22:16:00,678][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2024-06-21T22:16:02,119][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600, :ssl_enabled=>false}
[2024-06-21T22:16:02,913][INFO ][org.reflections.Reflections] Reflections took 175 ms to scan 1 urls, producing 138 keys and 481 values
[2024-06-21T22:16:03,062][INFO ][logstash.codecs.json     ] ECS compatibility is enabled but `target` option was not specified. This may cause fields to be set at the top-level of the event where they are likely to clash with the Elastic Common Schema. It is recommended to set the `target` option to avoid potential schema conflicts (if your data is ECS compliant or non-conflicting, feel free to ignore this message)
[2024-06-21T22:16:06,613][INFO ][logstash.codecs.jsonlines] ECS compatibility is enabled but `target` option was not specified. This may cause fields to be set at the top-level of the event where they are likely to clash with the Elastic Common Schema. It is recommended to set the `target` option to avoid potential schema conflicts (if your data is ECS compliant or non-conflicting, feel free to ignore this message)
[2024-06-21T22:16:08,726][INFO ][logstash.javapipeline    ] Pipeline `main` is configured with `pipeline.ecs_compatibility: v8` setting. All plugins in this pipeline will default to `ecs_compatibility => v8` unless explicitly configured otherwise.
[2024-06-21T22:16:09,025][INFO ][logstash.javapipeline    ][main] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>1000, "pipeline.sources"=>["C:/ls/config/s3.config.txt"], :thread=>"#<Thread:0x4f976cae C:/ls/logstash-core/lib/logstash/java_pipeline.rb:134 run>"}
[2024-06-21T22:16:10,164][INFO ][logstash.javapipeline    ][main] Pipeline Java execution initialization time {"seconds"=>1.14}
[2024-06-21T22:16:10,171][INFO ][logstash.javapipeline    ][main] Pipeline started {"pipeline.id"=>"main"}
[2024-06-21T22:16:10,190][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2024-06-21T22:16:15,688][INFO ][logstash.javapipeline    ][main] Pipeline terminated {"pipeline.id"=>"main"}
[2024-06-21T22:16:15,848][INFO ][logstash.pipelinesregistry] Removed pipeline from registry successfully {:pipeline_id=>:main}
[2024-06-21T22:16:15,860][INFO ][logstash.runner          ] Logstash shut down.
PS C:\ls>

Copy link

Quality Gate passed Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarQube

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @mashhurs

@mashhurs mashhurs added the bug label Jun 21, 2024
@mashhurs mashhurs marked this pull request as ready for review June 21, 2024 23:30
@yaauie yaauie merged commit e6682c9 into elastic:main Jun 25, 2024
6 checks passed
@yaauie
Copy link
Member

yaauie commented Jun 25, 2024

@logstashmachine backport 8.14

@yaauie yaauie mentioned this pull request Jun 25, 2024
andsel pushed a commit to andsel/logstash that referenced this pull request Jul 12, 2024
* Pin fileutils version to 1.7+

* Add fileutils license notice.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

S3-output doesn't remove temporary dir.
4 participants