-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security solution] Remove extra data from tracking clicks #164378
[Security solution] Remove extra data from tracking clicks #164378
Conversation
Pinging @elastic/security-threat-hunting (Team:Threat Hunting) |
Pinging @elastic/security-solution (Team: SecuritySolution) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to track_clicks
LGTM (with a minor nit regarding the naming of the var).
The name of the PR refers to that change only.
Regarding the URL Fix, I have a few thoughts:
- I'd rather change it in the appropriate context provider rather than the client. This would be more performant because it runs once per URL change instead of for each event. On top of that, there's no tempering of the same data in 2 places, which can be hard to follow and troubleshoot in the future.
- I wonder if we really need this masking:
- The URL is automatically reported by the browser in the Referer header: this means that any potential PII is still present in any HTTP request made by the browser. Not only to the Telemetry Web API, but also in requests to the Kibana server, any potential CDN (once we decide to serve assets from CDNs), Security Solution's newsfeed, and any other external requests the browser may make.
- FullStory collects it. I wonder if it makes sense to tackle this effort in our custom telemetry client if our "other telemetry channel" collects it without a problem.
To be clear, I think it'll be best if the page_url
context provider only reported the path without the query strings because it'll make it easier on the analysis end. But not for PII reasons.
WDYT?
/** HTML attributes that should be skipped from reporting because they might contain data we do not wish to collect */ | ||
const HTML_ATTRIBUTES_WITH_EXTRA_DATA = [ | ||
'data-href', | ||
'data-ech-series-name', | ||
'data-provider-id', | ||
'data-rfd-drag-handle-draggable-id', | ||
'data-rfd-droppable-id', | ||
'data-rfd-draggable-id', | ||
'href', | ||
'value', | ||
]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧡
nit: The name HTML_ATTRIBUTES_WITH_EXTRA_DATA
doesn't really mean the extra data is necessarily bad or skippable. I wonder if we should use a name that specifies that we want to skip them. If we want to avoid using PII
, we could probably use the name HTML_ATTRIBUTES_TO_REMOVE
or *_TO_SKIP
, or something along those lines?
Naming is hard ™️
export function maskSecurityUrls(properties: Partial<EventContext>): Record<string, unknown> { | ||
const maskedProperties: Partial<EventContext> = {}; | ||
if (properties.page_url) { | ||
maskedProperties.page_url = fixUrl(properties.page_url as string); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to fix it on the source rather than the client.
page_url
comes from
Line 34 in 97dc2ec
return `${location.pathname}${location.hash}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only solves for page_url
, but we need to mask for the other properties (page
, pageName
) too. was hoping to find a more central definition but not sure one exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
page
and pageName
are populated here:
Lines 124 to 129 in 4c626f1
map(({ type, name, page, id }) => ({ | |
pageName: `${compact([type, name, page]).join(':')}`, | |
applicationId: name ?? type ?? 'unknown', | |
page, | |
entityId: id, | |
})) |
TBH, bearing in mind the source of that information (execution context), it shouldn't expose any PII. page
is provided by the app/plugin. After double-checking our telemetry data, we can confirm these usages for the field:
- Some plugins provide human-readable indicators of subsections of the app ("app", "listing", "editor", "console")
- Special mention to
management
that shows the name of each section in the Stack Management view "ingestPipelines", "watcher", "indexManagement", etc. visualize
also shows nested values to the editor value to indicate the type of viz that's being used ("editor:data_table", "editor:tsvb", "editor:vertical_bar", ...)
- Special mention to
- Others like Security Solution, show the URL path. I can see versions prior to 8.8.0 show generic URLs like
/rules/id/:detailName/:tabName(rule_exceptions)
. After 8.8.0, they show the actual values. I wonder if this PR introduced the regression: [Security solutions] Fix execution context page #152816
FWIW, the execution context already carries the pathname (intentionally not used in telemetry, only for logging purposes to make troubleshooting easier). So I wonder if we should fix the page
reported by Security Solutions instead of "polluting" the EBT client with this filter.
To be clear: I'm not opposed to running PII filters in the EBT client. FWIW, I think we should traverse the entire object to ensure we don't leak any PII anywhere (things like the one we are looking at may occur) #132259. However, for some reason, patching these fields here instead of fixing the source of the leak seems wrong. I think it's mostly because we are implementing the filter in a reactive way: if a new URL is added nodes/name/:piiSensitiveName
, we'll have to remember to add it to the list, or we'll leak those values in the versions released until we fix them.
At the same time, the Referrer header (refer to my previous comment) keeps coming to mind as a reminder of "is it really worth the effort of hiding those here?".
Fixing the source of page
(and hence pageName
) would improve analytics, as we'll be able to see all events in page /rules/id/:detailName/:tabName(alerts)
, rather than /rules/id/387817b0-2fb3-11ee-9a91-19aeacd9e5a8/alerts
. We could refer to page_url
to identify if they spend most of the time looking at the same rule/host or if the cardinality is high.
WDYT?
/** security paths that contain user data */ | ||
const SECURITY_PATHS = ['hosts/name', 'users/name', 'network/ip']; | ||
// this indicates a user query | ||
const queryMarker = `?_g=`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it happen that the _g
query param doesn't come the first one? As in https://kibanaHost/my/path?someQuery=value&_g={...}
?
Also, would it be safer to remove all the querystrings in the URL? If not, I think it'd be best if we could parse them and maintain a blocklist like the HTML attributes.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure i will change it to all querystrings
if (properties.page) { | ||
maskedProperties.page = fixUrl(properties.page as string); | ||
} | ||
if (properties.pageName) { | ||
maskedProperties.pageName = fixUrl(properties.pageName as string); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that these properties have that info? AFAIK, they come from the executionContext
service:
Lines 125 to 128 in 4c626f1
pageName: `${compact([type, name, page]).join(':')}`, | |
applicationId: name ?? type ?? 'unknown', | |
page, | |
entityId: id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for example on the host page:
page: "/hosts/name/Stephs-MBP/authentications"
pageName: "application:securitySolutionUI:/hosts/name/Stephs-MBP/authentications"
page_url: "/app/security/hosts/name/Stephs-MBP/authentications"
This is why I fixed it in the client, fixing it in get_location_observable
will only capture the page_url field
@@ -360,3 +360,39 @@ export class AnalyticsClient implements IAnalyticsClient { | |||
}); | |||
} | |||
} | |||
|
|||
/** security paths that contain user data */ | |||
const SECURITY_PATHS = ['hosts/name', 'users/name', 'network/ip']; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we generalize the name of the variable so other solutions can add their paths to the list whenever they notice them?
const res = url.split(matchedKnownPiiData); | ||
return res[0] + matchedKnownPiiData + res[1].replace(/^\/[^\/\r\n]+(?=\/)/, '/MASKED'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: using split + regexp looks kind of odd to me... I wonder if we should replace the regexp with another split by /
and drop the first element (AFAIK, that's essentially what the regexp is doing).
💚 Build Succeeded
Metrics [docs]Page load bundle
History
To update your PR or re-run it, just comment with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you for iterating! 🧡
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
…64378) (#166464) # Backport This will backport the following commits from `main` to `8.10`: - [[Security solution] Remove extra data from tracking clicks (#164378)](#164378) <!--- Backport version: 8.9.7 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Steph Milovic","email":"stephanie.milovic@elastic.co"},"sourceCommit":{"committedDate":"2023-09-14T13:32:04Z","message":"[Security solution] Remove extra data from tracking clicks (#164378)","sha":"14af57b2e80041fee856789681afd8b2c95936af","branchLabelMapping":{"^v8.11.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:Threat Hunting","Team: SecuritySolution","v8.10.0","v8.11.0"],"number":164378,"url":"https://github.com/elastic/kibana/pull/164378","mergeCommit":{"message":"[Security solution] Remove extra data from tracking clicks (#164378)","sha":"14af57b2e80041fee856789681afd8b2c95936af"}},"sourceBranch":"main","suggestedTargetBranches":["8.10"],"targetPullRequestStates":[{"branch":"8.10","label":"v8.10.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.11.0","labelRegex":"^v8.11.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/164378","number":164378,"mergeCommit":{"message":"[Security solution] Remove extra data from tracking clicks (#164378)","sha":"14af57b2e80041fee856789681afd8b2c95936af"}}]}] BACKPORT--> Co-authored-by: Steph Milovic <stephanie.milovic@elastic.co>
Removed a few html properties from the
properties
tree of analytics tracking click events. These also contained "extra data". Here is an example of elements with a data-provider-id before and after these changes:Before
After