Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added line and columns to plugin name #11155

Closed
wants to merge 21 commits into from

Conversation

andsel
Copy link
Contributor

@andsel andsel commented Sep 20, 2019

Note
this PR is superseeded by #11288 for the Java pipeline part, for the Ruby pipeline we have to check if it worthwhile

[Feature Discussion]

We want to give a more useful name/id to the plugin so that could be easily mapped to which point in the pipeline it refers to. I think the best option would be to have a line number referring the position into the pipeline config file, like input-L5. What do you think about it?

solves #11154

@yaauie
Copy link
Member

yaauie commented Oct 3, 2019

IIRC, the line_and_column isn't quite as helpful as we would expect it to be, because it represents the position of the element after concatenating all source configuration files together, and it is an especially common practice in large pipelines to split up the configuration into many source files.

@andsel
Copy link
Contributor Author

andsel commented Oct 7, 2019

@yaauie thanks for the hint. We could do this:

  • in the management HTTP API for pipelines (localhost:9600/_node/stats/pipelines?pretty) expose the triple source file, line, column as code reference
  • in the logs keep the plugin.id
  • provide an HTTP API (eg: localhost:9600/_node/pipelines/main/<plugin.id>) that return the plugin information regarding its definition

@yaauie
Copy link
Member

yaauie commented Oct 7, 2019

IIRC, the source files for a pipeline are concatenated together into a single string, which is fed through the grammar to get an AST, the nodes of which are used to create the plugins and control structure. Because the AST contains line and column metadata from the concatenated source string and not the original files, there is no way presently for a plugin instance to be mapped back to a specific file source.

We would either need to change how the AST is built (e.g., build multiple independent AST's and allow them to be concatenated after parsing to produce a net-same resulting AST with improved source metadata), or somehow map the line_and_column through a utility that is aware of which lines came from which files (e.g., if we know lines 0-72 in file X, 73-97 in file Y, an error on line 77 of the concatenated source is actually line 4 of file Y).

@jsvd
Copy link
Member

jsvd commented Oct 7, 2019

I agree that providing this information means carrying new metadata from much earlier in the compilation process.

The goal of this suggestion is to have a way for users to correlate logs and metrics from plugins with the actual plugins without requiring the user to id them all. Linking auto generated IDs with file+line+column seems to be most user friendly, but happy to go back to drawing board at any time.

@andsel
Copy link
Contributor Author

andsel commented Oct 7, 2019

Merging ASTs could be a mess, because there is not any formal rule how the pipelines config files could be sliced, so we could have the filter section splitted in something like :

filter {
    geoip {
   }

and

   grok{}
}

At this point the remapping of line and cols after the global AST in created is the unique viable solution

@andsel
Copy link
Contributor Author

andsel commented Oct 10, 2019

Added the source file and source line into configuration reference for each metric node (Ruby pipeline)

Copy link
Member

@yaauie yaauie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are going to have to find a way to get this metadata to plugin instances without routing through their initialize methods (e.g., a map on the pipeline indexed by plugin id?). Changing the method signatures of classes that are extended by plugins in external codebases (including customer-owned private plugins) is going to put us in a position where we cannot be confident that our changes do not break real-world plugins.

logstash-core/lib/logstash/config/pipeline_config.rb Outdated Show resolved Hide resolved
logstash-core/lib/logstash/config/pipeline_config.rb Outdated Show resolved Hide resolved
logstash-core/lib/logstash/outputs/base.rb Outdated Show resolved Hide resolved
logstash-core/lib/logstash/codecs/delegator.rb Outdated Show resolved Hide resolved
logstash-core/lib/logstash/codecs/delegator.rb Outdated Show resolved Hide resolved
logstash-core/lib/logstash/config/mixin.rb Outdated Show resolved Hide resolved
logstash-core/lib/logstash/compiler.rb Outdated Show resolved Hide resolved
logstash-core/lib/logstash/pipeline.rb Outdated Show resolved Hide resolved
logstash-core/lib/logstash/config/pipeline_config.rb Outdated Show resolved Hide resolved
logstash-core/lib/logstash/config/pipeline_config.rb Outdated Show resolved Hide resolved
@@ -44,5 +44,19 @@ def display_debug_information
logger.debug("Merged config")
logger.debug("\n\n#{config_string}")
end

def lookup_source_and_line(merged_config_line)
remaining_lines = merged_config_line
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iterating over all of the parts for each lookup, doing all of the work to count lines of each config part each time, seems like it would have performance side-effects.

I think we can do the bulk of the work just once, by caching a mapping of file names, offsets, and sizes, and then using that mapping at lookup time instead of going all the way to the config parts.

Below is untested, and may be subject to off-by-one errors

    def lookup_source_and_line(merged_line_number)
      source_map.each do |part_offset, part_lines, part_id|
        rebased_line_number = merged_line_number - part_offset
        next if rebased_line_number > part_lines

        return [part_id, rebased_line_number]
      end
      
      raise IndexError
    end

    private

    def source_map
      @source_map ||= begin
        offset = 0
        source_map = []
        config_parts.each do |config_part|
          part_lines = config_part.getLinesCount()
          @source_map << [offset, part_lines, config_part.id]
          offset += part_lines
        end
        source_map.freeze
      end
    end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the suggestions. I've also introduced a class to express the concept of segments, and make it more readable

@andsel andsel force-pushed the feature/better_plugin_name branch from a0061d1 to 2e8cba9 Compare October 21, 2019 16:19
@andsel
Copy link
Contributor Author

andsel commented Oct 22, 2019

Jenkins test this please

@andsel andsel requested a review from yaauie October 22, 2019 12:06
Copy link
Member

@yaauie yaauie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I think this works as-is, but the complexity of routing everything through serialisation and the changes to the delegators and what they have to break out of the config arguments feels like it will be fragile.

I also do not immediately see a path forward that is less fragile. I'm going to take a stab at reworking this, and will circle back in the next day or so with a more concrete review and/or suggestions for a path forward.

context "when pipeline is constructed from multiple files" do
let (:pipeline_conf_string_part1) { 'input {
generator1
}' }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: test case where first file has trailing newline to ensure it doesn't offset the line numbers from subsequent files

@@ -39,6 +39,10 @@ public String getText() {
return text;
}

public int getLinesCount() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This counter could be cached, calculated only when text field is changed

@andsel
Copy link
Contributor Author

andsel commented Oct 31, 2019

I've created a PR #11288 which contains the common parts and the Java execution pipeline logic.
The files not ported that interests the Ruby's execution are:

  • logstash-core/lib/logstash/config/config_ast.rb
  • logstash-core/lib/logstash/pipeline.rb
  • logstash-core/spec/logstash/pipeline_spec.rb
  • logstash-core/lib/logstash/inputs/base.rb
  • logstash-core/spec/logstash/inputs/base_spec.rb

@andsel
Copy link
Contributor Author

andsel commented Jan 15, 2020

Closed new work #11288 for reason of #11497 (comment)

@andsel andsel closed this Jan 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants