Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security solution] Remove extra data from tracking clicks #164378

Merged
merged 9 commits into from
Sep 14, 2023

Conversation

stephmilovic
Copy link
Contributor

@stephmilovic stephmilovic commented Aug 21, 2023

Removed a few html properties from the properties tree of analytics tracking click events. These also contained "extra data". Here is an example of elements with a data-provider-id before and after these changes:

Before

image

After

image

@stephmilovic stephmilovic added release_note:skip Skip the PR/issue when compiling release notes Team:Threat Hunting Security Solution Threat Hunting Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.10.0 v8.11.0 labels Aug 21, 2023
@stephmilovic stephmilovic requested a review from a team as a code owner August 21, 2023 21:20
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-threat-hunting (Team:Threat Hunting)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

Copy link
Member

@afharo afharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to track_clicks LGTM (with a minor nit regarding the naming of the var).
The name of the PR refers to that change only.

Regarding the URL Fix, I have a few thoughts:

  1. I'd rather change it in the appropriate context provider rather than the client. This would be more performant because it runs once per URL change instead of for each event. On top of that, there's no tempering of the same data in 2 places, which can be hard to follow and troubleshoot in the future.
  2. I wonder if we really need this masking:
    • The URL is automatically reported by the browser in the Referer header: this means that any potential PII is still present in any HTTP request made by the browser. Not only to the Telemetry Web API, but also in requests to the Kibana server, any potential CDN (once we decide to serve assets from CDNs), Security Solution's newsfeed, and any other external requests the browser may make.
    • FullStory collects it. I wonder if it makes sense to tackle this effort in our custom telemetry client if our "other telemetry channel" collects it without a problem.

To be clear, I think it'll be best if the page_url context provider only reported the path without the query strings because it'll make it easier on the analysis end. But not for PII reasons.

WDYT?

Comment on lines 12 to 22
/** HTML attributes that should be skipped from reporting because they might contain data we do not wish to collect */
const HTML_ATTRIBUTES_WITH_EXTRA_DATA = [
'data-href',
'data-ech-series-name',
'data-provider-id',
'data-rfd-drag-handle-draggable-id',
'data-rfd-droppable-id',
'data-rfd-draggable-id',
'href',
'value',
];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧡

nit: The name HTML_ATTRIBUTES_WITH_EXTRA_DATA doesn't really mean the extra data is necessarily bad or skippable. I wonder if we should use a name that specifies that we want to skip them. If we want to avoid using PII, we could probably use the name HTML_ATTRIBUTES_TO_REMOVE or *_TO_SKIP, or something along those lines?
Naming is hard ™️

export function maskSecurityUrls(properties: Partial<EventContext>): Record<string, unknown> {
const maskedProperties: Partial<EventContext> = {};
if (properties.page_url) {
maskedProperties.page_url = fixUrl(properties.page_url as string);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to fix it on the source rather than the client.

page_url comes from

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only solves for page_url, but we need to mask for the other properties (page, pageName) too. was hoping to find a more central definition but not sure one exists?

Copy link
Member

@afharo afharo Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

page and pageName are populated here:

map(({ type, name, page, id }) => ({
pageName: `${compact([type, name, page]).join(':')}`,
applicationId: name ?? type ?? 'unknown',
page,
entityId: id,
}))

TBH, bearing in mind the source of that information (execution context), it shouldn't expose any PII. page is provided by the app/plugin. After double-checking our telemetry data, we can confirm these usages for the field:

  • Some plugins provide human-readable indicators of subsections of the app ("app", "listing", "editor", "console")
    • Special mention to management that shows the name of each section in the Stack Management view "ingestPipelines", "watcher", "indexManagement", etc.
    • visualize also shows nested values to the editor value to indicate the type of viz that's being used ("editor:data_table", "editor:tsvb", "editor:vertical_bar", ...)
  • Others like Security Solution, show the URL path. I can see versions prior to 8.8.0 show generic URLs like /rules/id/:detailName/:tabName(rule_exceptions). After 8.8.0, they show the actual values. I wonder if this PR introduced the regression: [Security solutions] Fix execution context page #152816

FWIW, the execution context already carries the pathname (intentionally not used in telemetry, only for logging purposes to make troubleshooting easier). So I wonder if we should fix the page reported by Security Solutions instead of "polluting" the EBT client with this filter.

To be clear: I'm not opposed to running PII filters in the EBT client. FWIW, I think we should traverse the entire object to ensure we don't leak any PII anywhere (things like the one we are looking at may occur) #132259. However, for some reason, patching these fields here instead of fixing the source of the leak seems wrong. I think it's mostly because we are implementing the filter in a reactive way: if a new URL is added nodes/name/:piiSensitiveName, we'll have to remember to add it to the list, or we'll leak those values in the versions released until we fix them.

At the same time, the Referrer header (refer to my previous comment) keeps coming to mind as a reminder of "is it really worth the effort of hiding those here?".

Fixing the source of page (and hence pageName) would improve analytics, as we'll be able to see all events in page /rules/id/:detailName/:tabName(alerts), rather than /rules/id/387817b0-2fb3-11ee-9a91-19aeacd9e5a8/alerts. We could refer to page_url to identify if they spend most of the time looking at the same rule/host or if the cardinality is high.

WDYT?

/** security paths that contain user data */
const SECURITY_PATHS = ['hosts/name', 'users/name', 'network/ip'];
// this indicates a user query
const queryMarker = `?_g=`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it happen that the _g query param doesn't come the first one? As in https://kibanaHost/my/path?someQuery=value&_g={...}?

Also, would it be safer to remove all the querystrings in the URL? If not, I think it'd be best if we could parse them and maintain a blocklist like the HTML attributes.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure i will change it to all querystrings

Comment on lines 387 to 392
if (properties.page) {
maskedProperties.page = fixUrl(properties.page as string);
}
if (properties.pageName) {
maskedProperties.pageName = fixUrl(properties.pageName as string);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that these properties have that info? AFAIK, they come from the executionContext service:

pageName: `${compact([type, name, page]).join(':')}`,
applicationId: name ?? type ?? 'unknown',
page,
entityId: id,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for example on the host page:

page: "/hosts/name/Stephs-MBP/authentications"
pageName: "application:securitySolutionUI:/hosts/name/Stephs-MBP/authentications"
page_url: "/app/security/hosts/name/Stephs-MBP/authentications"

This is why I fixed it in the client, fixing it in get_location_observable will only capture the page_url field

@@ -360,3 +360,39 @@ export class AnalyticsClient implements IAnalyticsClient {
});
}
}

/** security paths that contain user data */
const SECURITY_PATHS = ['hosts/name', 'users/name', 'network/ip'];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we generalize the name of the variable so other solutions can add their paths to the list whenever they notice them?

Comment on lines 375 to 376
const res = url.split(matchedKnownPiiData);
return res[0] + matchedKnownPiiData + res[1].replace(/^\/[^\/\r\n]+(?=\/)/, '/MASKED');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: using split + regexp looks kind of odd to me... I wonder if we should replace the regexp with another split by / and drop the first element (AFAIK, that's essentially what the regexp is doing).

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
core 365.3KB 365.4KB +145.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Member

@afharo afharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for iterating! 🧡

@stephmilovic stephmilovic merged commit 14af57b into elastic:main Sep 14, 2023
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Sep 14, 2023
@kibanamachine
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.10

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kibanamachine added a commit that referenced this pull request Sep 14, 2023
…64378) (#166464)

# Backport

This will backport the following commits from `main` to `8.10`:
- [[Security solution] Remove extra data from tracking clicks
(#164378)](#164378)

<!--- Backport version: 8.9.7 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Steph
Milovic","email":"stephanie.milovic@elastic.co"},"sourceCommit":{"committedDate":"2023-09-14T13:32:04Z","message":"[Security
solution] Remove extra data from tracking clicks
(#164378)","sha":"14af57b2e80041fee856789681afd8b2c95936af","branchLabelMapping":{"^v8.11.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:Threat
Hunting","Team:
SecuritySolution","v8.10.0","v8.11.0"],"number":164378,"url":"https://github.com/elastic/kibana/pull/164378","mergeCommit":{"message":"[Security
solution] Remove extra data from tracking clicks
(#164378)","sha":"14af57b2e80041fee856789681afd8b2c95936af"}},"sourceBranch":"main","suggestedTargetBranches":["8.10"],"targetPullRequestStates":[{"branch":"8.10","label":"v8.10.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.11.0","labelRegex":"^v8.11.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/164378","number":164378,"mergeCommit":{"message":"[Security
solution] Remove extra data from tracking clicks
(#164378)","sha":"14af57b2e80041fee856789681afd8b2c95936af"}}]}]
BACKPORT-->

Co-authored-by: Steph Milovic <stephanie.milovic@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_note:skip Skip the PR/issue when compiling release notes Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:Threat Hunting Security Solution Threat Hunting Team v8.10.0 v8.10.1 v8.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants