Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extract device type from user agent info #69322

Merged
merged 24 commits into from
Mar 29, 2021

Conversation

shahzad31
Copy link
Contributor

@shahzad31 shahzad31 commented Feb 22, 2021

Adds a device type into user agent processor

Matching Algorithm

Process is pretty simple, based on OS and browser extracted via UA parser lib, this PR creates few simple patterns based on those , correct device type is matched,

one pattern for example to match Desktop devices is this

- regex: '^(Windows$|Windows NT$|Mac OS X|Linux$|Chrome OS|Fedora$|Ubuntu$)'
so if extracted OS name is one of these, there are high chances that device is desktop. Same goes for mobile OS, along with this it tries to match browser names as well and correlates both results.

For bot, it looks for following words in any place

- regex: 'Bot|bot|spider|Spider|Crawler|crawler|AppEngine-Google'
Same goes for tablet etc

Eample:

In dev tools:

PUT _ingest/pipeline/user_agent
{
  "description" : "Add user agent information",
  "processors" : [
    {
      "user_agent" : {
        "field" : "agent"
      }
    }
  ]
}

PUT my-index-000001/_doc/my_id?pipeline=user_agent
{
  "agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"
}
GET my-index-000001/_doc/my_id

result:

    "user_agent" : {
      "original" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36",
      "os" : {
        "name" : "Mac OS X",
        "version" : "10.10.5",
        "full" : "Mac OS X 10.10.5"
      },
      "name" : "Chrome",
      "device" : {
        "name" : "Mac",
        "type" : "Desktop"
      },
      "version" : "51.0.2704.103"
    }

image

Real user data analysis

Ran anaylysis on data from elastic.co using rum-agent which is deployed on observability clusters,

there were unique 60019 user agent strings in the data, extracted those strings and pushed them into es using this PR user agent ingest pipeline

and this PR was able to match mora than 99% successfully , here is the analysis in lens

Note: This doesn't represent traffic, it represents ratio of extracted categories from uniquer UA strings

image

Testing

Tested by building it via kibana

yarn es source
and tested via devtools as desribed above in example and screenshot

@shahzad31 shahzad31 changed the title [POC] extract device type from user agent info [WIP] extract device type from user agent info Feb 22, 2021
@shahzad31 shahzad31 changed the title [WIP] extract device type from user agent info extract device type from user agent info Feb 22, 2021
@shahzad31 shahzad31 marked this pull request as ready for review February 22, 2021 17:35
@shahzad31 shahzad31 added the :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP label Feb 22, 2021
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Feb 22, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (Team:Core/Features)

@danhermann
Copy link
Contributor

@shahzad31, is this related to #65057? I think we'd be happy to have the ability to extract device types, but we'd want it to be usable across multiple use cases and stable so that device types wouldn't change as the implementation evolved.

@shahzad31
Copy link
Contributor Author

shahzad31 commented Feb 22, 2021

@shahzad31, is this related to #65057? I think we'd be happy to have the ability to extract device types, but we'd want it to be usable across multiple use cases and stable so that device types wouldn't change as the implementation evolved.

Yes it's related to #65057 yes definitely goal is to have stable implementation. I mean it might miss some use cases where it will mark those as "Others" or miss new UA string being added just like any other UA parser. But current implementation does make sure it aligns with UA parser we are using to extract other info, so yeah it won't change over time for current parser.

@paulb-elastic
Copy link
Contributor

This relates to elastic/uptime#296

@shahzad31
Copy link
Contributor Author

Performed a simple performance comparison on this branch vs master
From unit tests i picked a testCommonBrowser() unit test and ran it 100000 times


        long startTime = System.currentTimeMillis();

        for (int i = 0; i < 100000; i++) {

			... actual test code
        }

        long endTime = System.currentTimeMillis();
        System.out.println("Total execution time: " + (endTime - startTime) + "ms");

i don't see any performance different, in both branches, time recorded varies between 550ms and 700ms, i ran this comparison about 20 times, ~10 times on each branch.

To run test i used Intellijidea IDE an example run

image

@danhermann
Copy link
Contributor

Thanks, @shahzad31. I think that test would be significantly affected by the test setup and teardown operations. I ran several tests of my own that did more to isolate processor execution time and there was a pretty consistent performance cost of ~10% for extracting the device type. I think that's small enough that we don't need an option to disable it. I'll start on the code review for this shortly.

Copy link
Contributor

@danhermann danhermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shahzad31, I made an initial review pass and have some suggestions noted below. Once those are addressed, I'll do at least one more pass on it.

@danhermann
Copy link
Contributor

Fyi, you can see the reason for the ci/1 build failure by clicking "Details" and then "Console Output". You'll see there that the "forbiddenApis" test failed because we don't allow use of the java.io.File class. There's an exists method on the java.nio.file.Files class that is permitted and I think that should work for what you want.

@danhermann danhermann merged commit f7efa3e into elastic:master Mar 29, 2021
@danhermann
Copy link
Contributor

@shahzad31, thanks for your work and iteration here. It looks good now so I've merged it in and will get it backported for the next 7.x release.

@shahzad31 shahzad31 deleted the device-type-from-ua branch March 30, 2021 07:24
axw added a commit to axw/apm-server that referenced this pull request Mar 31, 2021
elastic/elasticsearch#69322
added support for extracting device types to the
user_agent ingest processors. Update approvals to
match.
axw added a commit to axw/apm-server that referenced this pull request Mar 31, 2021
elastic/elasticsearch#69322
added support for extracting device types to the
user_agent ingest processors. Update approvals to
match.
axw added a commit to axw/apm-server that referenced this pull request Mar 31, 2021
elastic/elasticsearch#69322
added support for extracting device types to the
user_agent ingest processors. Update approvals to
match.
axw added a commit to elastic/apm-server that referenced this pull request Mar 31, 2021
* tests/system: adapt to new API response

In elastic/kibana#95146
the response structure for listing APM agent central
config changed. Update system tests to match.

* tests/system: add user_agent.device.type

elastic/elasticsearch#69322
added support for extracting device types to the
user_agent ingest processors. Update approvals to
match.
mergify bot pushed a commit to elastic/apm-server that referenced this pull request Mar 31, 2021
* tests/system: adapt to new API response

In elastic/kibana#95146
the response structure for listing APM agent central
config changed. Update system tests to match.

* tests/system: add user_agent.device.type

elastic/elasticsearch#69322
added support for extracting device types to the
user_agent ingest processors. Update approvals to
match.

(cherry picked from commit 0e09aa6)
jsoriano added a commit to elastic/beats that referenced this pull request Mar 31, 2021
Add field definition for new user agent field added in elastic/elasticsearch#69322
Regenerate Filebeat test files with this new field.
Should fix Filebeat builds.
jsoriano added a commit to jsoriano/beats that referenced this pull request Mar 31, 2021
Add field definition for new user agent field added in elastic/elasticsearch#69322

(cherry picked from commit 6454736)
jsoriano added a commit to elastic/beats that referenced this pull request Mar 31, 2021
Add field definition for new user agent field added in elastic/elasticsearch#69322

(cherry picked from commit 6454736)
axw added a commit to elastic/apm-server that referenced this pull request Apr 1, 2021
* tests/system: fix system tests (#5037)

* tests/system: adapt to new API response

In elastic/kibana#95146
the response structure for listing APM agent central
config changed. Update system tests to match.

* tests/system: add user_agent.device.type

elastic/elasticsearch#69322
added support for extracting device types to the
user_agent ingest processors. Update approvals to
match.

(cherry picked from commit 0e09aa6)

* user_agent.device.type isn't in 7.x yet

* make update

* systemtest: revert approvals changes

Co-authored-by: Andrew Wilkins <axw@elastic.co>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
danhermann pushed a commit to danhermann/elasticsearch that referenced this pull request Apr 20, 2021
bmorelli25 pushed a commit to bmorelli25/observability-docs that referenced this pull request Dec 18, 2023
* tests/system: adapt to new API response

In elastic/kibana#95146
the response structure for listing APM agent central
config changed. Update system tests to match.

* tests/system: add user_agent.device.type

elastic/elasticsearch#69322
added support for extracting device types to the
user_agent ingest processors. Update approvals to
match.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement Team:Data Management Meta label for data/management team v7.13.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants