-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added line and columns to plugin name #11155
Conversation
IIRC, the |
@yaauie thanks for the hint. We could do this:
|
IIRC, the source files for a pipeline are concatenated together into a single string, which is fed through the grammar to get an AST, the nodes of which are used to create the plugins and control structure. Because the AST contains line and column metadata from the concatenated source string and not the original files, there is no way presently for a plugin instance to be mapped back to a specific file source. We would either need to change how the AST is built (e.g., build multiple independent AST's and allow them to be concatenated after parsing to produce a net-same resulting AST with improved source metadata), or somehow map the |
I agree that providing this information means carrying new metadata from much earlier in the compilation process. The goal of this suggestion is to have a way for users to correlate logs and metrics from plugins with the actual plugins without requiring the user to id them all. Linking auto generated IDs with file+line+column seems to be most user friendly, but happy to go back to drawing board at any time. |
Merging ASTs could be a mess, because there is not any formal rule how the pipelines config files could be sliced, so we could have the
and
At this point the remapping of line and cols after the global AST in created is the unique viable solution |
Added the source file and source line into configuration reference for each metric node (Ruby pipeline) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are going to have to find a way to get this metadata to plugin instances without routing through their initialize methods (e.g., a map on the pipeline indexed by plugin id?). Changing the method signatures of classes that are extended by plugins in external codebases (including customer-owned private plugins) is going to put us in a position where we cannot be confident that our changes do not break real-world plugins.
logstash-core/src/main/java/org/logstash/common/SourceWithMetadata.java
Outdated
Show resolved
Hide resolved
@@ -44,5 +44,19 @@ def display_debug_information | |||
logger.debug("Merged config") | |||
logger.debug("\n\n#{config_string}") | |||
end | |||
|
|||
def lookup_source_and_line(merged_config_line) | |||
remaining_lines = merged_config_line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iterating over all of the parts for each lookup, doing all of the work to count lines of each config part each time, seems like it would have performance side-effects.
I think we can do the bulk of the work just once, by caching a mapping of file names, offsets, and sizes, and then using that mapping at lookup time instead of going all the way to the config parts.
Below is untested, and may be subject to off-by-one errors
def lookup_source_and_line(merged_line_number)
source_map.each do |part_offset, part_lines, part_id|
rebased_line_number = merged_line_number - part_offset
next if rebased_line_number > part_lines
return [part_id, rebased_line_number]
end
raise IndexError
end
private
def source_map
@source_map ||= begin
offset = 0
source_map = []
config_parts.each do |config_part|
part_lines = config_part.getLinesCount()
@source_map << [offset, part_lines, config_part.id]
offset += part_lines
end
source_map.freeze
end
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the suggestions. I've also introduced a class to express the concept of segments, and make it more readable
…h line references
…trics) and at runtime with method
…avoid multiple calculations
a0061d1
to
2e8cba9
Compare
Jenkins test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I think this works as-is, but the complexity of routing everything through serialisation and the changes to the delegators and what they have to break out of the config arguments feels like it will be fragile.
I also do not immediately see a path forward that is less fragile. I'm going to take a stab at reworking this, and will circle back in the next day or so with a more concrete review and/or suggestions for a path forward.
context "when pipeline is constructed from multiple files" do | ||
let (:pipeline_conf_string_part1) { 'input { | ||
generator1 | ||
}' } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: test case where first file has trailing newline to ensure it doesn't offset the line numbers from subsequent files
@@ -39,6 +39,10 @@ public String getText() { | |||
return text; | |||
} | |||
|
|||
public int getLinesCount() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This counter could be cached, calculated only when text
field is changed
I've created a PR #11288 which contains the common parts and the Java execution pipeline logic.
|
Closed new work #11288 for reason of #11497 (comment) |
Note
this PR is superseeded by #11288 for the Java pipeline part, for the Ruby pipeline we have to check if it worthwhile
[Feature Discussion]
We want to give a more useful name/id to the plugin so that could be easily mapped to which point in the pipeline it refers to. I think the best option would be to have a line number referring the position into the pipeline config file, like input-L5. What do you think about it?
solves #11154