Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LIBBEAT: Enhancement Convert dissected values from String to other basic data types and IP #18683

Merged
merged 31 commits into from
Jul 13, 2020

Conversation

premendrasingh
Copy link
Contributor

@premendrasingh premendrasingh commented May 21, 2020

What does this PR do?

This PR enhances dissect processor to convert string values to integer, long, float, double, boolean or IP. Key and convert data type are separated by | separator.

Why is it important?

It will save CPU cycles converting data from string to other data type using Convert processor

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Added unit and benchmark test cases.

Related issues

Closes elastic/dissect-specification#10

@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

1 similar comment
@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label May 21, 2020
@elasticmachine
Copy link
Collaborator

elasticmachine commented May 21, 2020

❕ Build Aborted

The PR is not allowed to run in the CI yet

Pipeline View Test View Changes Artifacts

Expand to view the summary

Build stats

  • Build Cause: [Pull request #18683 event]

  • Reason: The PR is not allowed to run in the CI yet

  • Start Time: 2020-05-25T14:33:14.162+0000

  • Duration: 4 min 36 sec

  • Commit: 7fc0caf

Steps errors

Expand to view the steps failures

  • Name: Error signal
    • Description: githubPrCheckApproved: The PR is not allowed to run in the CI yet. (Only users with write permission

    • Duration: 0 min 0 sec

    • Start Time: 2020-05-25T14:36:46.104+0000

    • log

Log output

Expand to view the last 100 lines of log output

[2020-05-25T14:36:45.773Z] [WARN] githubApiCall: The REST API call https://api.github.com/repos/elastic/beats/pulls/18683/reviews return 0 elements
[2020-05-25T14:36:45.820Z] [INFO] githubPrCheckApproved: Title: LIBBEAT: Enhancement Convert dissected values from String to other basic data types and IP - User: premendrasingh - Author Association: CONTRIBUTOR
[2020-05-25T14:36:46.108Z] ERROR: githubPrCheckApproved: The PR is not allowed to run in the CI yet
[2020-05-25T14:36:46.108Z] ERROR: githubPrCheckApproved: The PR is not allowed to run in the CI yet. (Only users with write permissions can do so.)
[2020-05-25T14:36:46.158Z] [INFO] Let's stop build #5. The PR is not allowed to run in the CI yet
[2020-05-25T14:36:46.180Z] Sleeping for 5 sec
[2020-05-25T14:36:47.257Z] Stage "Lint" skipped due to earlier failure(s)
[2020-05-25T14:36:47.317Z] Stage "Build and Test" skipped due to earlier failure(s)
[2020-05-25T14:36:47.524Z] Stage "Elastic Agent x-pack" skipped due to earlier failure(s)
[2020-05-25T14:36:47.526Z] Stage "Elastic Agent x-pack Windows" skipped due to earlier failure(s)
[2020-05-25T14:36:47.527Z] Stage "Elastic Agent Mac OS X" skipped due to earlier failure(s)
[2020-05-25T14:36:47.528Z] Stage "Filebeat oss" skipped due to earlier failure(s)
[2020-05-25T14:36:47.528Z] Stage "Filebeat x-pack" skipped due to earlier failure(s)
[2020-05-25T14:36:47.529Z] Stage "Filebeat Mac OS X" skipped due to earlier failure(s)
[2020-05-25T14:36:47.530Z] Stage "Filebeat x-pack Mac OS X" skipped due to earlier failure(s)
[2020-05-25T14:36:47.531Z] Stage "Filebeat Windows" skipped due to earlier failure(s)
[2020-05-25T14:36:47.532Z] Stage "Filebeat x-pack Windows" skipped due to earlier failure(s)
[2020-05-25T14:36:47.533Z] Stage "Heartbeat" skipped due to earlier failure(s)
[2020-05-25T14:36:47.533Z] Stage "Auditbeat oss" skipped due to earlier failure(s)
[2020-05-25T14:36:47.534Z] Stage "Auditbeat x-pack" skipped due to earlier failure(s)
[2020-05-25T14:36:47.535Z] Stage "Libbeat" skipped due to earlier failure(s)
[2020-05-25T14:36:47.537Z] Stage "Libbeat x-pack" skipped due to earlier failure(s)
[2020-05-25T14:36:47.538Z] Stage "Metricbeat OSS Unit tests" skipped due to earlier failure(s)
[2020-05-25T14:36:47.539Z] Stage "Metricbeat OSS Integration tests" skipped due to earlier failure(s)
[2020-05-25T14:36:47.540Z] Stage "Metricbeat Python integration tests" skipped due to earlier failure(s)
[2020-05-25T14:36:47.541Z] Stage "Metricbeat x-pack" skipped due to earlier failure(s)
[2020-05-25T14:36:47.542Z] Stage "Metricbeat crosscompile" skipped due to earlier failure(s)
[2020-05-25T14:36:47.543Z] Stage "Metricbeat Mac OS X" skipped due to earlier failure(s)
[2020-05-25T14:36:47.545Z] Stage "Metricbeat x-pack Mac OS X" skipped due to earlier failure(s)
[2020-05-25T14:36:47.546Z] Stage "Metricbeat Windows" skipped due to earlier failure(s)
[2020-05-25T14:36:47.547Z] Stage "Metricbeat x-pack Windows" skipped due to earlier failure(s)
[2020-05-25T14:36:47.549Z] Stage "Packetbeat" skipped due to earlier failure(s)
[2020-05-25T14:36:47.549Z] Stage "dockerlogbeat" skipped due to earlier failure(s)
[2020-05-25T14:36:47.550Z] Stage "Winlogbeat" skipped due to earlier failure(s)
[2020-05-25T14:36:47.551Z] Stage "Winlogbeat Windows x-pack" skipped due to earlier failure(s)
[2020-05-25T14:36:47.552Z] Stage "Functionbeat" skipped due to earlier failure(s)
[2020-05-25T14:36:47.553Z] Stage "Journalbeat" skipped due to earlier failure(s)
[2020-05-25T14:36:47.554Z] Stage "Generators" skipped due to earlier failure(s)
[2020-05-25T14:36:47.555Z] Stage "Kubernetes" skipped due to earlier failure(s)
[2020-05-25T14:36:47.666Z] Stage "Heartbeat" skipped due to earlier failure(s)
[2020-05-25T14:36:47.668Z] Stage "Auditbeat oss" skipped due to earlier failure(s)
[2020-05-25T14:36:47.669Z] Stage "Libbeat" skipped due to earlier failure(s)
[2020-05-25T14:36:47.670Z] Stage "Metricbeat x-pack" skipped due to earlier failure(s)
[2020-05-25T14:36:47.671Z] Stage "Packetbeat" skipped due to earlier failure(s)
[2020-05-25T14:36:47.672Z] Stage "dockerlogbeat" skipped due to earlier failure(s)
[2020-05-25T14:36:47.673Z] Stage "Winlogbeat" skipped due to earlier failure(s)
[2020-05-25T14:36:47.674Z] Stage "Functionbeat" skipped due to earlier failure(s)
[2020-05-25T14:36:47.675Z] Stage "Journalbeat" skipped due to earlier failure(s)
[2020-05-25T14:36:47.676Z] Stage "Generators" skipped due to earlier failure(s)
[2020-05-25T14:36:48.043Z] Failed in branch Elastic Agent x-pack
[2020-05-25T14:36:48.045Z] Failed in branch Elastic Agent x-pack Windows
[2020-05-25T14:36:48.045Z] Failed in branch Elastic Agent Mac OS X
[2020-05-25T14:36:48.046Z] Failed in branch Filebeat oss
[2020-05-25T14:36:48.047Z] Failed in branch Filebeat x-pack
[2020-05-25T14:36:48.047Z] Failed in branch Filebeat Mac OS X
[2020-05-25T14:36:48.048Z] Failed in branch Filebeat x-pack Mac OS X
[2020-05-25T14:36:48.049Z] Failed in branch Filebeat Windows
[2020-05-25T14:36:48.049Z] Failed in branch Filebeat x-pack Windows
[2020-05-25T14:36:48.050Z] Failed in branch Auditbeat x-pack
[2020-05-25T14:36:48.051Z] Failed in branch Libbeat x-pack
[2020-05-25T14:36:48.052Z] Failed in branch Metricbeat OSS Unit tests
[2020-05-25T14:36:48.052Z] Failed in branch Metricbeat OSS Integration tests
[2020-05-25T14:36:48.053Z] Failed in branch Metricbeat Python integration tests
[2020-05-25T14:36:48.054Z] Failed in branch Metricbeat crosscompile
[2020-05-25T14:36:48.055Z] Failed in branch Metricbeat Mac OS X
[2020-05-25T14:36:48.055Z] Failed in branch Metricbeat x-pack Mac OS X
[2020-05-25T14:36:48.056Z] Failed in branch Metricbeat Windows
[2020-05-25T14:36:48.057Z] Failed in branch Metricbeat x-pack Windows
[2020-05-25T14:36:48.058Z] Failed in branch Winlogbeat Windows x-pack
[2020-05-25T14:36:48.059Z] Failed in branch Kubernetes
[2020-05-25T14:36:48.348Z] Stage "Heartbeat" skipped due to earlier failure(s)
[2020-05-25T14:36:48.349Z] Stage "Auditbeat oss" skipped due to earlier failure(s)
[2020-05-25T14:36:48.351Z] Stage "Libbeat" skipped due to earlier failure(s)
[2020-05-25T14:36:48.352Z] Stage "Metricbeat x-pack" skipped due to earlier failure(s)
[2020-05-25T14:36:48.353Z] Stage "Winlogbeat" skipped due to earlier failure(s)
[2020-05-25T14:36:48.354Z] Stage "Functionbeat" skipped due to earlier failure(s)
[2020-05-25T14:36:48.355Z] Stage "Generators" skipped due to earlier failure(s)
[2020-05-25T14:36:48.407Z] Failed in branch Packetbeat
[2020-05-25T14:36:48.408Z] Failed in branch dockerlogbeat
[2020-05-25T14:36:48.409Z] Failed in branch Journalbeat
[2020-05-25T14:36:48.609Z] Stage "Heartbeat" skipped due to earlier failure(s)
[2020-05-25T14:36:48.611Z] Stage "Auditbeat oss" skipped due to earlier failure(s)
[2020-05-25T14:36:48.612Z] Stage "Libbeat" skipped due to earlier failure(s)
[2020-05-25T14:36:48.614Z] Stage "Functionbeat" skipped due to earlier failure(s)
[2020-05-25T14:36:48.615Z] Stage "Generators" skipped due to earlier failure(s)
[2020-05-25T14:36:48.654Z] Failed in branch Metricbeat x-pack
[2020-05-25T14:36:48.655Z] Failed in branch Winlogbeat
[2020-05-25T14:36:48.843Z] Failed in branch Heartbeat
[2020-05-25T14:36:48.844Z] Failed in branch Libbeat
[2020-05-25T14:36:48.844Z] Failed in branch Functionbeat
[2020-05-25T14:36:48.845Z] Stage "Auditbeat oss" skipped due to earlier failure(s)
[2020-05-25T14:36:48.846Z] Stage "Generators" skipped due to earlier failure(s)
[2020-05-25T14:36:48.953Z] Failed in branch Auditbeat oss
[2020-05-25T14:36:48.955Z] Failed in branch Generators
[2020-05-25T14:36:49.294Z] Running on Jenkins in /var/lib/jenkins/workspace/Beats_beats-beats-mbp_PR-18683
[2020-05-25T14:36:49.413Z] [INFO] getVaultSecret: Getting secrets
[2020-05-25T14:36:49.475Z] Masking supported pattern matches of $VAULT_ADDR or $VAULT_ROLE_ID or $VAULT_SECRET_ID
[2020-05-25T14:36:50.240Z] + chmod 755 generate-build-data.sh
[2020-05-25T14:36:50.240Z] + ./generate-build-data.sh https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats-beats-mbp/PR-18683/ https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats-beats-mbp/PR-18683/runs/5 ABORTED 215815
[2020-05-25T14:36:50.791Z] INFO: curl https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats-beats-mbp/PR-18683/runs/5/steps/?limit=10000 -o steps-info.json

@ChrsMark ChrsMark added the Team:Integrations Label for the Integrations team label May 21, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label May 21, 2020
@andresrc andresrc added the Team:Services (Deprecated) Label for the former Integrations-Services team label May 22, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-services (Team:Services)

@jsoriano
Copy link
Member

Nice enhancement!
I have a question about the way the type is specified. Is there any special reason to use the | separator?
In Grok the : separator is used for the same purpose. And in the logstash implementation of dissect, the types are declared with an additional convert_datatype option.
If there is no special reason to use the | separator, I think we should use one of the already existing methods to declare types.

@premendrasingh
Copy link
Contributor Author

Nice enhancement!
I have a question about the way the type is specified. Is there any special reason to use the | separator?
In Grok the : separator is used for the same purpose. And in the logstash implementation of dissect, the types are declared with an additional convert_datatype option.
If there is no special reason to use the | separator, I think we should use one of the already existing methods to declare types.

| was chosen based on discussion in Issue #10 elastic/dissect-specification#10
Please let me know whether I should change it to : (colon).

I think annotating key with data type right next to it, is more readable compared to convert_datatype option.

Thanks for reviewing.

@vjsamuel
Copy link
Contributor

@sayden any more outstanding concerns? @premendrasingh can you please post updated benchmarks here.

@premendrasingh
Copy link
Contributor Author

Ran the benchmark on master and new branch and took average of 3 runs.

Test case New feature ns/op Master ns/op Diff
BenchmarkDissect/When_all_the_defined_fields_are_captured_by_we_have_remaining_data-12 1115 1131 -1%
BenchmarkDissect/Complex_stack_trace-12 1451 1407 3%
BenchmarkDissect/fails_when_delimiter_is_not_found_at_the_beginning_of_the_string-12 2.03E-05 1.97E-05 3%
BenchmarkDissect/fails_when_delimiter_is_not_found_after_the_key-12 2.83E-05 3.57E-05 -21%
BenchmarkDissect/simple_dissect-12 873 882 -1%
BenchmarkDissect/dissect_two_replacement-12 961 1024 -6%
BenchmarkDissect/one_level_dissect_not_end_of_string-12 933 956 -2%
BenchmarkDissect/one_level_dissect-12 898 957 -6%
BenchmarkDissect/multiple_keys_dissect_end_of_string-12 967 1018 -5%
BenchmarkDissect/multiple_keys_not_end_of_string-12 1045 1072 -3%
BenchmarkDissect/simple_ordered-12 1238 1241 0%
BenchmarkDissect/simple_append-12 1168 1148 2%
BenchmarkDissect/indirect_field-12 1049 1004 4%
BenchmarkDissect/skip_field-12 1053 989 6%
BenchmarkDissect/named_skiped_field_with_indirect-12 1101 1082 2%
BenchmarkDissect/pointer_field_with_indirect-12 1043 1085 -4%
BenchmarkDissect/missing_fields-12 1262 1329 -5%
BenchmarkDissect/ignore_right_padding-12 1099 1085 1%
BenchmarkDissect/padding_on_the_last_key_need_a_delimiter-12 1123 1109 1%
BenchmarkDissect/ignore_left_padding-12 1125 1069 5%
BenchmarkDissect/when_the_delimiters_contains_{and}-12 1125 1070 5%
BenchmarkDissect/simple_fixed_length-12 1039 1027 1%
BenchmarkDissect/simple_ordered_and_fixed_length_field-12 1201 1155 4%
BenchmarkDissect/simple_padding_and_fixed_length_field-12 1260 1172 8%
BenchmarkDissect/mixed_pointer_and_indirect_and_fixed_length-12 1087 1025 6%
BenchmarkDissect/fails_when_there_is_remaining_string_after_the_fixed-length_key-12 0.000035 4.67E-05 -25%
BenchmarkDissect/fails_when_there_is_no_enough_string_for_the_fixed-length_key-12 0.000044 0.000035 26%
BenchmarkDissect/Regular_expression-12 431 450 -4%
BenchmarkDissect/Larger_regular_expression-12 2977 3173 -6%
BenchmarkDissect/regular_expression_to_match_end_of_line-12 610 631 -3%

@premendrasingh
Copy link
Contributor Author

@sayden Please review the changes.
Thanks and regards.

Copy link
Contributor

@sayden sayden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, I wanted to spend some time in local to ensure everything was as expected 😉

It's also awesome to have some benchmarks now.

Impressive work! Seriously, thank you very much for the contribution 😃

CI errors seems unrelated too

@jsoriano
Copy link
Member

jsoriano commented Jul 1, 2020

jenkins run the tests please

@premendrasingh
Copy link
Contributor Author

premendrasingh commented Jul 9, 2020

@jsoriano Can you please merge? Errors are not related to my changes.
Thanks

@jsoriano
Copy link
Member

@sayden could you please go on with merge and backport to 7.x if you think this is ready? Thanks!

@jsoriano jsoriano added needs_backport PR is waiting to be backported to other branches. v7.9.0 labels Jul 10, 2020
@sayden sayden merged commit 84d75e5 into elastic:master Jul 13, 2020
@sayden
Copy link
Contributor

sayden commented Jul 13, 2020

Thank you for contributing!

@premendrasingh
Copy link
Contributor Author

Thank you @sayden, @jsoriano for the guidance.

sayden pushed a commit to sayden/beats that referenced this pull request Jul 14, 2020
v1v added a commit to v1v/beats that referenced this pull request Jul 14, 2020
* upstream/master: (25 commits)
  [Elastic Agent] Send checkin payload to Fleet (elastic#19857)
  [Ingest Manager] Fixed tests across agent elastic#19877
  [Ingest Manager] Fix serialization test  elastic#19876
  Fix service start type mapping in windows/service metricset (elastic#19551)
  ci: Change comment trigger detection method (elastic#19827)
  Add 21 autogenerated filesets from rsa2elk devices (elastic#19713)
  [Ingest Manager] Agent config cleanup (elastic#19848)
  libbeat/publisher/pipeline: fix data races (elastic#19821)
  Update monitoring-internal-collection.asciidoc (elastic#19422) (elastic#19697)
  [Elastic Agent] Trust exchange endpoint must bind to 127.0.0.1 (elastic#19861)
  Specify an ECS version in Auditbeat/Packetbeat/Winlogbeat (elastic#19159)
  Add azure billing metricset (elastic#19207)
  Add support for appinsights in the metricbeat azure module (elastic#18940)
  Add MySQL query metricset with lightweight module and SQL helper (elastic#18955)
  [Ingest Manager] Refuse invalid stream values in configuration (elastic#19587)
  Do not use vendor during integration tests (elastic#19839)
  LIBBEAT: Enhancement Convert dissected values from String to other basic data types and IP (elastic#18683)
  [Elastic Agent] Remove support for "logs" and only support logfile (elastic#19761)
  [CI] support windows-2012 (elastic#19773)
  Do not update go.mod during packaging and testing (elastic#19823)
  ...