Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge ValuesParser and TypeConverter into Parser #1286

Merged
merged 15 commits into from
Oct 25, 2016

Conversation

tagomoris
Copy link
Member

@tagomoris tagomoris commented Oct 21, 2016

This change does:

  • merge features of ValuesParser and TypeConverter into Parser
  • move ValuesParser implementation under Compat (TypeConverter is already there)
  • update built-in parsers with this change

Many parser plugins (including 3rd party ones) needs features of TypeConverter and time parser.
These features are provided only in ValuesParser, but ValuesParser is originally for parsers which supports fixed number fields, like tsv and csv.
(ltsv parser inherits ValuesParser for TypeConverter, and it has unbelievable implementation...)

This change makes parser plugins to:

  • use type converters, null field handling and time parsing very easily in the user-configurable way
  • update configuration parameters to use hash and array to provide safe structured values without additional parameters
  • make estimate_current_event as a configuration parameter (can be overloaded by config_set_default of 3rd party parser plugins)

See parser_ltsv.rb and parser_regexp.rb to show how this change makes writing plugins understandable.

… inter-class dependency

And moved ValuesParser to Compat module completely.

The reason of this changes are:
* TypeConverter was old-fashioned module with configuration params ("types_delimiter", "types_label_delimiter") and cannot log anything via PluginLogger
* Many Parser plugins have choosen ValuesParser as superclass, but doesn't use "Values" features (with `@keys`), but uses just TypeConverter features
* "tag_key" feature is now supported by "extract" plugin helper, so parser should not support it
@tagomoris
Copy link
Member Author

@repeatedly could you review this change?

@tagomoris
Copy link
Member Author

@repeatedly ping?

types = {}
conf['types'].split(delimiter).each do |pair|
key, value = pair.split(label_delimiter, 2)
if value.start_with?("time#{label_delimiter}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array should be also care?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be. I'll add a fix.

end

class JSONParser < Fluent::Plugin::JSONParser
include TypeConverterCompatParameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this include needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It is a subclass of Fluent::Plugin::JSONParser, and it doesn't include it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSONParser doesn't use TypeConverter feature.
Do you have a plan to update JSONParser?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I missed to update code of JSONParser. I pushed some commits including one to do it.

@repeatedly
Copy link
Member

Idea is good for me and add 2 comments.

One concern is regexp parser performance because this parser is used in high-volume environment.
I will check it with micro-benchmark.

@repeatedly
Copy link
Member

repeatedly commented Oct 24, 2016

I re-think the convert_values API.
Current API creates temporary array for each record.
In high-volume environment, it increases CPU usage / memory usage.
I changed Parser API to avoid this problem in v0.12, so v0.14 should keep same performance.

@tagomoris
Copy link
Member Author

@repeatedly I made a very simple benchmark, and found that multi value return is faster than passing blocks: https://gist.github.com/tagomoris/7bca9fe4fdc3201f691094e6aad3b177

@tagomoris
Copy link
Member Author

@repeatedly do you have anything more?

@repeatedly
Copy link
Member

LGTM. If we have a performance issue, we should rewrite parser code in the future.

@tagomoris tagomoris merged commit 2009cd4 into master Oct 25, 2016
@tagomoris tagomoris deleted the bye-valueparser-and-typeconverter branch October 25, 2016 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants