New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Security solution] Remove extra data from tracking clicks #164378

Merged

stephmilovic merged 9 commits into elastic:main from stephmilovic:rm_user_data_from_clicks

Sep 14, 2023

Contributor

stephmilovic commented Aug 21, 2023 •

edited by kibanamachine

Loading

Removed a few html properties from the properties tree of analytics tracking click events. These also contained "extra data". Here is an example of elements with a data-provider-id before and after these changes:

Before

After

stephmilovic added 3 commits

August 14, 2023 07:53

wip

ef70e82


          fix and test

ed6c034


          another property

1b5fbae

stephmilovic added release_note:skip Team:Threat Hunting Team: SecuritySolution v8.10.0 v8.11.0 labels

stephmilovic requested a review from a team as a code owner

August 21, 2023 21:20

Contributor

elasticmachine commented Aug 21, 2023

Pinging @elastic/security-threat-hunting (Team:Threat Hunting)

Contributor

elasticmachine commented Aug 21, 2023

Pinging @elastic/security-solution (Team: SecuritySolution)

stephmilovic added 2 commits

August 21, 2023 15:33


          Merge branch 'main' into rm_user_data_from_clicks

fab13e0


          add more html properties

c88a555

afharo requested changes

View reviewed changes

Member

afharo left a comment

The changes to track_clicks LGTM (with a minor nit regarding the naming of the var).
The name of the PR refers to that change only.

Regarding the URL Fix, I have a few thoughts:

I'd rather change it in the appropriate context provider rather than the client. This would be more performant because it runs once per URL change instead of for each event. On top of that, there's no tempering of the same data in 2 places, which can be hard to follow and troubleshoot in the future.
I wonder if we really need this masking:
- The URL is automatically reported by the browser in the Referer header: this means that any potential PII is still present in any HTTP request made by the browser. Not only to the Telemetry Web API, but also in requests to the Kibana server, any potential CDN (once we decide to serve assets from CDNs), Security Solution's newsfeed, and any other external requests the browser may make.
- FullStory collects it. I wonder if it makes sense to tackle this effort in our custom telemetry client if our "other telemetry channel" collects it without a problem.

To be clear, I think it'll be best if the page_url context provider only reported the path without the query strings because it'll make it easier on the analysis end. But not for PII reasons.

WDYT?

packages/core/analytics/core-analytics-browser-internal/src/track_clicks.ts Outdated

Comment on lines 12 to 22

+              /** HTML attributes that should be skipped from reporting because they might contain data we do not wish to collect */
+              const HTML_ATTRIBUTES_WITH_EXTRA_DATA = [
+                'data-href',
+                'data-ech-series-name',
+                'data-provider-id',
+                'data-rfd-drag-handle-draggable-id',
+                'data-rfd-droppable-id',
+                'data-rfd-draggable-id',
+                'href',
+                'value',
+              ];

Member

afharo Aug 22, 2023

🧡

nit: The name HTML_ATTRIBUTES_WITH_EXTRA_DATA doesn't really mean the extra data is necessarily bad or skippable. I wonder if we should use a name that specifies that we want to skip them. If we want to avoid using PII, we could probably use the name HTML_ATTRIBUTES_TO_REMOVE or *_TO_SKIP, or something along those lines?
Naming is hard ™️

packages/analytics/client/src/analytics_client/analytics_client.ts Outdated

+              export function maskSecurityUrls(properties: Partial<EventContext>): Record<string, unknown> {
+                const maskedProperties: Partial<EventContext> = {};
+                if (properties.page_url) {
+                  maskedProperties.page_url = fixUrl(properties.page_url as string);

Member

afharo Aug 22, 2023

I'd prefer to fix it on the source rather than the client.

page_url comes from

kibana/packages/core/application/core-application-browser-internal/src/utils/get_location_observable.ts

Line 34 in 97dc2ec

return `${location.pathname}${location.hash}`;

Contributor Author

stephmilovic Sep 11, 2023

This only solves for page_url, but we need to mask for the other properties (page, pageName) too. was hoping to find a more central definition but not sure one exists?

Member

afharo Sep 12, 2023 •

edited

Loading

page and pageName are populated here:

kibana/packages/core/execution-context/core-execution-context-browser-internal/src/execution_context_service.ts

Lines 124 to 129 in 4c626f1

    
           map(({ type, name, page, id }) => ({ 
        
             pageName: `${compact([type, name, page]).join(':')}`, 
        
             applicationId: name ?? type ?? 'unknown', 
        
             page, 
        
             entityId: id, 
        
           }))

TBH, bearing in mind the source of that information (execution context), it shouldn't expose any PII. page is provided by the app/plugin. After double-checking our telemetry data, we can confirm these usages for the field:

Some plugins provide human-readable indicators of subsections of the app ("app", "listing", "editor", "console")
- Special mention to management that shows the name of each section in the Stack Management view "ingestPipelines", "watcher", "indexManagement", etc.
- visualize also shows nested values to the editor value to indicate the type of viz that's being used ("editor:data_table", "editor:tsvb", "editor:vertical_bar", ...)
Others like Security Solution, show the URL path. I can see versions prior to 8.8.0 show generic URLs like /rules/id/:detailName/:tabName(rule_exceptions). After 8.8.0, they show the actual values. I wonder if this PR introduced the regression: [Security solutions] Fix execution context page #152816

FWIW, the execution context already carries the pathname (intentionally not used in telemetry, only for logging purposes to make troubleshooting easier). So I wonder if we should fix the page reported by Security Solutions instead of "polluting" the EBT client with this filter.

To be clear: I'm not opposed to running PII filters in the EBT client. FWIW, I think we should traverse the entire object to ensure we don't leak any PII anywhere (things like the one we are looking at may occur) #132259. However, for some reason, patching these fields here instead of fixing the source of the leak seems wrong. I think it's mostly because we are implementing the filter in a reactive way: if a new URL is added nodes/name/:piiSensitiveName, we'll have to remember to add it to the list, or we'll leak those values in the versions released until we fix them.

At the same time, the Referrer header (refer to my previous comment) keeps coming to mind as a reminder of "is it really worth the effort of hiding those here?".

Fixing the source of page (and hence pageName) would improve analytics, as we'll be able to see all events in page /rules/id/:detailName/:tabName(alerts), rather than /rules/id/387817b0-2fb3-11ee-9a91-19aeacd9e5a8/alerts. We could refer to page_url to identify if they spend most of the time looking at the same rule/host or if the cardinality is high.

WDYT?

packages/analytics/client/src/analytics_client/analytics_client.ts Outdated

+              /** security paths that contain user data */
+              const SECURITY_PATHS = ['hosts/name', 'users/name', 'network/ip'];
+              // this indicates a user query
+              const queryMarker = `?_g=`;

Member

afharo Aug 22, 2023

Could it happen that the _g query param doesn't come the first one? As in https://kibanaHost/my/path?someQuery=value&_g={...}?

Also, would it be safer to remove all the querystrings in the URL? If not, I think it'd be best if we could parse them and maintain a blocklist like the HTML attributes.

WDYT?

Contributor Author

stephmilovic Sep 11, 2023

sure i will change it to all querystrings

packages/analytics/client/src/analytics_client/analytics_client.ts Outdated

Comment on lines 387 to 392

+                if (properties.page) {
+                  maskedProperties.page = fixUrl(properties.page as string);
+                }
+                if (properties.pageName) {
+                  maskedProperties.pageName = fixUrl(properties.pageName as string);
+                }

Member

afharo Aug 22, 2023

Is it possible that these properties have that info? AFAIK, they come from the executionContext service:

kibana/packages/core/execution-context/core-execution-context-browser-internal/src/execution_context_service.ts

Lines 125 to 128 in 4c626f1

    
           pageName: `${compact([type, name, page]).join(':')}`, 
        
           applicationId: name ?? type ?? 'unknown', 
        
           page, 
        
           entityId: id,

Contributor Author

stephmilovic Sep 11, 2023

Yes, for example on the host page:

page: "/hosts/name/Stephs-MBP/authentications"
pageName: "application:securitySolutionUI:/hosts/name/Stephs-MBP/authentications"
page_url: "/app/security/hosts/name/Stephs-MBP/authentications"

This is why I fixed it in the client, fixing it in get_location_observable will only capture the page_url field

packages/analytics/client/src/analytics_client/analytics_client.ts Outdated

@@ @@ -360,3 +360,39 @@ export class AnalyticsClient implements IAnalyticsClient { @@
                     });
                 }
               }
+              /** security paths that contain user data */
+              const SECURITY_PATHS = ['hosts/name', 'users/name', 'network/ip'];

Member

afharo Aug 22, 2023

Can we generalize the name of the variable so other solutions can add their paths to the list whenever they notice them?

packages/analytics/client/src/analytics_client/analytics_client.ts Outdated

Comment on lines 375 to 376

		const res = url.split(matchedKnownPiiData);
		return res[0] + matchedKnownPiiData + res[1].replace(/^\/[^\/\r\n]+(?=\/)/, '/MASKED');

Member

afharo Aug 22, 2023

nit: using split + regexp looks kind of odd to me... I wonder if we should replace the regexp with another split by / and drop the first element (AFAIK, that's essentially what the regexp is doing).

stephmilovic added 4 commits

September 11, 2023 11:57


          Merge branch 'main' into rm_user_data_from_clicks


          pr changes

8cdd4a4


          revert url masking

838dade


          Merge branch 'main' into rm_user_data_from_clicks

7eeba80

Collaborator

kibana-ci commented Sep 13, 2023

💚 Build Succeeded

Buildkite Build
Commit: 7eeba80

Metrics [docs]

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id	before	after	diff
`core`	365.3KB	365.4KB	+145.0B

History

💛 Build #152160 was flaky c88a555

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

afharo approved these changes

View reviewed changes

Member

afharo left a comment

LGTM! Thank you for iterating! 🧡

stephmilovic merged commit 14af57b into elastic:main

kibanamachine mentioned this pull request

[8.10] [Security solution] Remove extra data from tracking clicks (#164378) #166464

Merged

kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request


          [Security solution] Remove extra data from tracking clicks (elastic#1…

6b9feef

…64378)

(cherry picked from commit 14af57b)

Contributor

kibanamachine commented Sep 14, 2023

💚 All backports created successfully

Status	Branch	Result
✅	8.10

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kibanamachine added a commit that referenced this pull request


          [8.10] [Security solution] Remove extra data from tracking clicks (#1…

bcb8c5d

…64378) (#166464)

# Backport

This will backport the following commits from `main` to `8.10`:
- [[Security solution] Remove extra data from tracking clicks
(#164378)](#164378)

<!--- Backport version: 8.9.7 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Steph
Milovic","email":"stephanie.milovic@elastic.co"},"sourceCommit":{"committedDate":"2023-09-14T13:32:04Z","message":"[Security
solution] Remove extra data from tracking clicks
(#164378)","sha":"14af57b2e80041fee856789681afd8b2c95936af","branchLabelMapping":{"^v8.11.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:Threat
Hunting","Team:
SecuritySolution","v8.10.0","v8.11.0"],"number":164378,"url":"https://github.com/elastic/kibana/pull/164378","mergeCommit":{"message":"[Security
solution] Remove extra data from tracking clicks
(#164378)","sha":"14af57b2e80041fee856789681afd8b2c95936af"}},"sourceBranch":"main","suggestedTargetBranches":["8.10"],"targetPullRequestStates":[{"branch":"8.10","label":"v8.10.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.11.0","labelRegex":"^v8.11.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/164378","number":164378,"mergeCommit":{"message":"[Security
solution] Remove extra data from tracking clicks
(#164378)","sha":"14af57b2e80041fee856789681afd8b2c95936af"}}]}]
BACKPORT-->

Co-authored-by: Steph Milovic <stephanie.milovic@elastic.co>

kibanamachine added the v8.10.1 label

This was referenced Oct 1, 2023

[8.10] fix(slo): Handle partial indicator url state (#167247) #167722

Merged

[8.10] backport SLO fixes #168231

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release_note:skip Team: SecuritySolution Team:Threat Hunting v8.10.0 v8.10.1 v8.11.0