Refactor: improve plugin logging xp #13038

kares · 2021-07-01T09:53:47Z

Release notes

What does this PR do?

Improves logging experience for the user (as a minor side effect LS should be faster to generated plugin ids).

generated ids won't be recycled -> on a restart new ids are generated even if the configuration did not change
previously, the id generation mechanism always fell back to walking the AST to generate an id by hashing the tree (EDIT: stricken. We rely on recycled ids)
~~generated plugin ids will be shorter (16 characters) instead of 36 hex chars - logs should be easier to read~~ (EDIT: stricken. loosely a breaking change given above)
loosened logger(msg) contract allows for logging any (to_s) message not just strings - just like Log4j2 loggers do
an internal PluginLogger gets introduced to always have a plugin.id logging context w plugin.logger ...
a (single) explicit generatePluginId implementation
there's a change to use the same logic in tests -> tested plugin will no longer have the name_... prefix

Other changes:

fixed logger.trace? (previously delegated to logger.isDebugEnabled())

Breaking changes:

~ very low impact

class LogStash::Input::SomePlugin < ...

  def initialize(params)
    logger.info "this would no longer work - previously did but only if not using @logger.info ..."
    super(params)
    # logging after super works fine
  end
...

potential plugin test failures ... mocking such as the following would need updating*:

  expect(LogStash::Codecs::Multiline).to receive(:logger).and_return(Mlc::MultilineLogTracer.new).at_least(:once)

or keeping def class.logger(plugin = self) functional in a backwards compatible way

Why is it important/What is the impact to the user?

Logs are Logstash's primary way of communicating with the user, always having a plugin id logged greatly improves the user (debugging) experience.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
~~I have made corresponding changes to the documentation~~
~~I have made corresponding change to the default configuration files (and/or docker env variables)~~
I have added tests that prove my fix is effective or that my feature works

Author's Checklist

TODO do we need to worry about always generating a random id for a codec
like we did before this got refactored

How to test this PR locally

Related issues

target check confusing & printed multiple times (on reload) logstash-plugins/logstash-mixin-ecs_compatibility_support#7

Logs

Sample (artificial) logging session (just to demonstrate plugins always log with their id as context):

[2021-07-27T14:32:34,501][INFO ][logstash.codecs.plain    ][955460acfaa1d9f2] REGISTER
[2021-07-27T14:32:34,643][INFO ][logstash.inputs.file     ][0138d563e04582e3] INITIALIZE {"path"=>"/var/log/auth.log", "id"=>"0138d563e04582e3"}
[2021-07-27T14:32:34,746][INFO ][logstash.filters.ruby    ][f569244cde563334] INITIALIZE
[2021-07-27T14:32:37,211][INFO ][logstash.filters.ruby    ][main][f569244cde563334] REGISTER
[2021-07-27T14:32:37,355][INFO ][logstash.javapipeline    ][main] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>16, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>2000, "pipeline.sources"=>["config string"], :thread=>"#<Thread:0x163b0af1 run>"}
[2021-07-27T14:32:38,790][INFO ][logstash.javapipeline    ][main] Pipeline Java execution initialization time {"seconds"=>1.43}
[2021-07-27T14:32:38,840][INFO ][logstash.inputs.file     ][main][0138d563e04582e3] REGISTER
[2021-07-27T14:32:38,943][INFO ][logstash.inputs.file     ][main][0138d563e04582e3] No sincedb_path set, generating one based on the "path" setting {:sincedb_path=>"/home/kares/workspace/work/elastic/logstash/data/plugins/inputs/file/.sincedb_0776ae3d702d482a9bdd8900c6550225", :path=>["/var/log/auth.log"]}
[2021-07-27T14:32:38,974][INFO ][logstash.javapipeline    ][main] Pipeline started {"pipeline.id"=>"main"}
[2021-07-27T14:32:39,000][INFO ][logstash.inputs.file     ][main][0138d563e04582e3] RUN
[2021-07-27T14:32:39,090][INFO ][filewatch.observingtail  ][main] START, creating Discoverer, Watch with file and sincedb collections
[2021-07-27T14:32:39,097][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2021-07-27T14:32:39,678][INFO ][logstash.codecs.plain    ][main][955460acfaa1d9f2] DECODE: Jul 27 14:32:38 precision sshd[442003]: Connection closed by 127.0.0.1 port 53520 [preauth]
[2021-07-27T14:32:39,858][INFO ][logstash.filters.ruby    ][main][f569244cde563334] FILTER {"@version"=>"1", "@timestamp"=>2021-07-27T12:32:39.702Z, "message"=>"Jul 27 14:32:38 precision sshd[442003]: Connection closed by 127.0.0.1 port 53520 [preauth]", "host"=>"precision", "path"=>"/var/log/auth.log"}

- previously (w a source) we would always end up with an id - (potentially auto) generated from `extractIdFromLIR` - unless an explicit id is set in the configuration the generated default id was a "slow" graph content hash (same configuration would always end up with the same id)

(in tests) as when booting the pipeline

what was meant was likely to do a `self.class.logger` here

also changes the contract from msg.to_str to msg.to_s

also improves performance of plugin.logger invocations

yaauie · 2021-07-28T14:34:52Z

generated ids won't be recycled -> on a restart new ids are generated even if the configuration did not change
previously, the id generation mechanism always fell back to walking the AST to generate an id by hashing the tree

Recycling ids across restarts has proven very useful in the past, especially when users don't follow the "best practice" of defining their own plugin ids. It allows us to use captures of the node stats API to compare flow across restarts, especially in larger pipelines.

Is this change intrinsic to ensuring that the logger always has an id?

kares · 2021-07-29T07:45:51Z

Recycling ids across restarts has proven very useful in the past, especially when users don't follow the "best practice" of defining their own plugin ids. It allows us to use captures of the node stats API to compare flow across restarts, especially in larger pipelines.

Thanks, thought there might be a reason esp. since there was a comment which said smt like "do not walk the AST graph to generate ids as it's slow" and we did it anyway. The annoying thing for me with recycling is that we often get incomplete logs - judging from the generated plugin id I can not tell if it's the same LS instance or a restart has been made ... 😞

Is this change intrinsic to ensuring that the logger always has an id?

We would continue to have 2 ways of auto generating ids - one for manual plugin instantiation, another used w the pipeline.
At the time the plugin logger is setup (as laid out in the PR) an id would be present one way or another.
And shorter ids (if we're fine to have those) should be doable with the AST tree mechanism.

... for restore to keep the previously set id

yaauie · 2021-08-10T20:00:15Z

I would be okay with shorter default ids in 8.0 (or perhaps rethinking ids in general), but a restart across minors or on the same version of Logstash should generate the same ids.

There may be an opportunity to add a pipeline.default_id_format setting or similar that controls id generation as a separate effort. We could then default to a more concise/readable format in a later release of Logstash.

Or we could keep the full-length ids and use log4j formatters to output only the first 16 chars into the log line, which would ensure collisions still get routed to separate files when log-per-pipeline is enabled:

-%notEmpty{[%X{pipeline.id}]}
+%notEmpty{[%maxLen{%X{pipeline.id}}{16}]}

kares · 2021-08-12T11:02:22Z

Thanks, that's a 🔐 was assuming changing ids would target a minor (7.15) and be acceptable.
Especially since LS lacks on consistency with the id format - regarding codecs (which were kind of the motivator):

<LogStash::Inputs::Elasticsearch id=>"4e805afb4960a70553434dc5d7ffc52f8b2410239d6cfc4372f5a66ea701b94c", codec=><LogStash::Codecs::JSON id=>"json_45212884-952d-4ce8-96a2-e6d87cc19447", ...>

... outputing first 16 chars gets us with: json_45212884-95 but might get weirder the longer the "prefix_" 😞

Also id generation is unnecessarily slow - using a 'slow' hashing alg SHA-256 + secure random while none of that in necessary.

Anyhow, I will try to split the PR effort to only include the new PluginLogger and leave the rest for later.

roaksoax added the status:work-in-progress label Jul 1, 2021

kares force-pushed the better-logging-xp branch from e21f1b1 to 9b9b7b1 Compare July 7, 2021 11:47

kares added 13 commits July 27, 2021 09:20

Refactor: keep a Java plugin.id + cache the Ruby one

be16008

Refactor: extract context for readability

b5af811

Refactor: filter to provide logging context on close

fc1345c

Refactor: output to provide logging context on close

e24b0f5

Refactor: filter.register to have logging context

067a442

Refactor: output.register to have logging context

64363ec

Refactor: simplify config_hash generation

163652a

Refactor: centralize plugin id generation

660fe51

Refactor: simplify one less Map to allocate

ef3c2d3

Feat: (fast) generated 16 hex character plugin ids

b3613f1

Refactor: use same plugin id generation

3686d22

(in tests) as when booting the pipeline

Test: adjust (random) plugin.id expectations

fc3f898

kares force-pushed the better-logging-xp branch from 5ed09ac to fc3f898 Compare July 27, 2021 07:21

kares added 9 commits July 27, 2021 09:32

Refactor: do not generate singleton_class for Loggable

893dfc8

what was meant was likely to do a `self.class.logger` here

simplify logging Ruby facade impl + fix trace?

c265c1d

Refactor: use asString (CharSequence) where possible

97a1ec3

also changes the contract from msg.to_str to msg.to_s

align msg.to_s logging convention change

a48bc36

Refactor: prepare for plugin.logger customization

6a30707

also improves performance of plugin.logger invocations

javadoc for logger methods + prepare for sub-classing

da95262

Feat: a custom logger per plugin instance

b401a76

Test: adjust spec to consider PluginLogger

6a6b93f

Chore: keep a todo for plugin.class.logger method

0a51727

kares added 2 commits August 4, 2021 13:47

only set plugin.id if isn't already set

0d13c83

... for restore to keep the previously set id

Revert back to generating same ids (from LIR)

5d56b06

kares mentioned this pull request Sep 2, 2021

Refactor: make sure plugin logger sets context #13201

Draft

6 tasks

kares self-assigned this Jan 20, 2022

kares closed this Oct 4, 2022

kares removed their assignment Oct 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: improve plugin logging xp #13038

Refactor: improve plugin logging xp #13038

kares commented Jul 1, 2021 •

edited by yaauie

Loading

yaauie commented Jul 28, 2021

kares commented Jul 29, 2021

yaauie commented Aug 10, 2021

kares commented Aug 12, 2021

Refactor: improve plugin logging xp #13038

Refactor: improve plugin logging xp #13038

Conversation

kares commented Jul 1, 2021 • edited by yaauie Loading

Release notes

What does this PR do?

Why is it important/What is the impact to the user?

Checklist

Author's Checklist

How to test this PR locally

Related issues

Logs

yaauie commented Jul 28, 2021

kares commented Jul 29, 2021

yaauie commented Aug 10, 2021

kares commented Aug 12, 2021

kares commented Jul 1, 2021 •

edited by yaauie

Loading