Skip to content
This repository has been archived by the owner on Feb 20, 2023. It is now read-only.

For #27549 - Collect shim data about the Pocket sponsored stories. #27550

Merged
merged 1 commit into from
Nov 10, 2022

Conversation

Mugurell
Copy link
Contributor

@Mugurell Mugurell commented Oct 26, 2022

Pull Request checklist

  • Tests: This PR includes thorough tests or an explanation of why it does not
  • Screenshots: This PR includes screenshots or GIFs of the changes made or an explanation of why it does not
  • Accessibility: The code in this PR follows accessibility best practices or does not include any user facing features. In addition, it includes a screenshot of a successful accessibility scan to ensure no new defects are added to the product.

QA

  • QA Needed

To download an APK when reviewing a PR (after all CI tasks finished running):

  1. Click on Checks at the top of the PR page.
  2. Click on the firefoxci-taskcluster group on the left to expand all tasks.
  3. Click on the build-debug task.
  4. Click on View task in Taskcluster in the new DETAILS section.
  5. The APK links should be on the right side of the screen, named for each CPU architecture.

GitHub Automation

Fixes #27549

@Mugurell Mugurell requested review from a team as code owners October 26, 2022 09:31
@Mugurell Mugurell marked this pull request as draft October 26, 2022 09:31
@Mugurell
Copy link
Contributor Author

Mugurell commented Oct 26, 2022

Draft for now.
When building the project I get strange errors like:

      ```
      pocket:
        spoc_shown:
          data_sensitivity:
          - technical
      ```
  
      'technical' is not one of ['web_activity', 'highly_sensitive']

Changing the probe type to event make the errors go away.
Seems like this is the first time we'd try to use type: text in project, maybe there's an issue about it that we didn't know about.
cc @Dexterp37

@Dexterp37
Copy link
Contributor

Draft for now. When building the project I get strange errors like:

      ```
      pocket:
        spoc_shown:
          data_sensitivity:
          - technical
      ```
  
      'technical' is not one of ['web_activity', 'highly_sensitive']

Changing the probe type to event make the errors go away. Seems like this is the first time we'd try to use type: text in project, maybe there's an issue about it that we didn't know about. cc @Dexterp37

As called out by @badboy in a private conversation, this is working as intended. When we spec'd the text type, we restricted its usage to cat 3 and cat 4 data only. To keep writing your code and prototyping your patch, just use web_activity instead of technical in the data sensitivity.

I think we should discuss whether or not the SHIM is cat 2 or cat 3, and whether or not we should relax this restriction.

@Mugurell
Copy link
Contributor Author

Thank you for the quick response.
Tried removing the data-sensitivity block or just set ['web_activity', 'highly_sensitive'] and then I get other errors in the generated code:

Unresolved reference: TextMetricType
..
Property delegate must have a 'getValue(Pocket, KProperty<>)' method. None of the following functions is suitable:
public inline operator fun Lazy.getValue(thisRef: Any?, property: KProperty<
>): Unit defined in kotlin

@badboy
Copy link
Member

badboy commented Oct 26, 2022

Unresolved reference: TextMetricType
..
Property delegate must have a 'getValue(Pocket, KProperty<>)' method. None of the following functions is suitable:
public inline operator fun Lazy.getValue(thisRef: Any?, property: KProperty<
>): Unit defined in kotlin

Seems like this is the first time we'd try to use type: text in project, maybe there's an issue about it that we didn't know about. cc @Dexterp37

Looks like there is a bug indeed: We did not re-export that metric from a-c yet. I'll prepare a PR, but that means this will need to hold off until the next a-c bump.

@badboy
Copy link
Member

badboy commented Oct 26, 2022

Fix coming in mozilla-mobile/android-components#13010

@badboy
Copy link
Member

badboy commented Oct 26, 2022

mozilla-mobile/android-components#13010 landed. The next a-c nightly should have this fix and this PR can then be rebased.

@Mugurell
Copy link
Contributor Author

Mugurell commented Oct 27, 2022

@badboy @Dexterp37 Not sure how to control when the new custom ping is sent.
Tested with

adb shell am start -n org.mozilla.fenix.debug/mozilla.telemetry.glean.debug.GleanDebugActivity --ez sendPing spoc --es debugViewTag shimmyYay

and got this results - https://debug-ping-preview.firebaseapp.com/pings/shimmyYay
In this case the spoc ping is sent only once, when the above command opens the app.
To note that in between app opens there would have been more impression / click events but I only get one ping. Maybe each new value overwrites the old and only the latest event is sent.

It's interesting that the topsites-impression ping does show a lot of times, seemingly recording as expected each new impression and sending this right away.
Looked into the implementation and don't see anything different other than the types of probes.
In the web documentation - https://mozilla.github.io/glean/book/user/pings/custom.html I didn't find how to control when the pings are sent.


I found Pings.topsitesImpression.submit() in the project, will try to use something similar.
And it works - https://debug-ping-preview.firebaseapp.com/pings/shimmyYay3

@Dexterp37
Copy link
Contributor

I found Pings.topsitesImpression.submit() in the project, will try to use something similar. And it works - https://debug-ping-preview.firebaseapp.com/pings/shimmyYay3

Yes, I was about to say that topsitesImpression just seems to be submitted in more places. Is topsitesImpression representing the same thing? Its description is:

Recorded when a sponsored top site is rendered and visible on the home screen. Visibility is qualified as when the homepage is brought to the front of the Browser, and sponsored tiles are 100% visible on screen.

@Mugurell
Copy link
Contributor Author

Is topsitesImpression representing the same thing? Its description is:

Recorded when a sponsored top site is rendered and visible on the home screen. Visibility is qualified as when the homepage is brought to the front of the Browser, and sponsored tiles are 100% visible on screen.

That telemetry is for another feature - the top sites from the top of the home screen
image

while the new telemetry is for the Pocket section from the bottom of the home screen, specifically just about the elements from this section which contain the word "Sponsored":

@Mugurell Mugurell marked this pull request as ready for review October 27, 2022 11:37
@Mugurell
Copy link
Contributor Author

Mugurell commented Oct 27, 2022

Request for data collection review form

All questions are mandatory. You must receive review from a data steward peer on your responses to these questions before shipping new data collection.

  1. What questions will you answer with this data?
  • This data will help us understand how the new Pocket sponsored stories feature is used.
  1. Why does Mozilla need to answer these questions? Are there benefits for users? Do we need this information to address product or business requirements?
  • This data will help in business related decisions.
  1. What alternative methods did you consider to answer these questions? Why were they not sufficient?
  • There are no other alternatives.

  1. Can current instrumentation answer these questions?
  • No.

  1. List all proposed measurements and indicate the category of data collection for each measurement, using the Firefox data collection categories found on the Mozilla wiki.

    Note that the data steward reviewing your request will characterize your data collection based on the highest (and most sensitive) category.
Measurement Description Data Collection Category Tracking Bug #
An indication of which Pocket sponsored story is shown through the collection of the story shim - a unique base64 string identifying each story and type of user interaction. Category 3 - web_activity #27549
An indication of which Pocket sponsored story is clicked through the collection of the story shim - a unique base64 string identifying each story and type of user interaction. Category 3 - web_activity #27549

These new probes will be sent immediately as they are set in a new custom ping - spoc that does not include the Glean client ID.

  1. Please provide a link to the documentation for this data collection which describes the ultimate data set in a public, complete, and accurate way.
  1. How long will this data be collected? Choose one of the following:
  • Until version 119 with the option to renew at that point.

  1. What populations will you measure?
  • All channels, all countries
, en-US and en-CA locales.
  1. If this data collection is default on, what is the opt-out mechanism for users?
  • Default Glean SDK opt-out mechanism.

  1. Please provide a general description of how you will analyze this data.
  • Glean and Amplitude.

  1. Where do you intend to share the results of your analysis?
  • 
On Glean, Amplitude, with mobile teams, the Pocket team and the Kevel provider for sponsored stories.
  1. Is there a third-party tool (i.e. not Telemetry) that you are proposing to use for this data collection?
  • No third-party tools.

@Mugurell Mugurell added needs:data-review PR is awaiting a data review needs:review PRs that need to be reviewed labels Oct 27, 2022
@Dexterp37
Copy link
Contributor

@Mugurell are you sure the category for this data is web_activity? Let's be absolutely sure and check with the data reviewer (cc @travis79 )

@Mugurell
Copy link
Contributor Author

@Mugurell are you sure the category for this data is web_activity? Let's be absolutely sure and check with the data reviewer (cc @travis79 )

Based on #27550 (comment) I just went with web_activity.
technical && interaction would seem more appropriate but if they cannot be set atm I went with web_activity which seems less delicate than highly_sensitive.

@travis79
Copy link
Member

Unless you have any objections, I'll go ahead and get this data-review going since Category 3 is going to require Trust approval.

@Dexterp37
Copy link
Contributor

Unless you have any objections, I'll go ahead and get this data-review going since Category 3 is going to require Trust approval.

Works for me. FTR, this is the same data we're sending with impressions on desktop: https://firefox-source-docs.mozilla.org/browser/components/newtab/docs/v2-system-addon/data_dictionary.html

@travis79
Copy link
Member

travis79 commented Oct 27, 2022

@Mugurell Can you please clarify in the data-review request that this data will be sent in a custom ping andband will not be associated with the Glean client-id? Also, since this data will be matching the shim data sent by Activity Stream on Firefox Desktop, could you please elaborate on what is being collected by this? This may be as easy as linking to the Activity Stream docs on this or perhaps there is a better description of this data to link to or use here.

I think that these are important details for the review and will help to ensure a speedy escalation turnaround.

Thank you!

app/metrics.yaml Show resolved Hide resolved
app/metrics.yaml Outdated Show resolved Hide resolved
app/pings.yaml Show resolved Hide resolved
app/pings.yaml Outdated

spoc:
description: |
Contains data about user's interactions with Pocket sponsored stories.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be a bit more specific about what this is about? (e.g. impressions?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the ping description to

Contains data identifying with which Pocket sponsored story the user
interacted with and the type of interaction: story impression or click.

app/metrics.yaml Outdated Show resolved Hide resolved
app/metrics.yaml Outdated Show resolved Hide resolved
app/metrics.yaml Outdated Show resolved Hide resolved
app/pings.yaml Show resolved Hide resolved
@Dexterp37
Copy link
Contributor

@Mugurell Can you please clarify in the data-review request that this data will be sent in a custom ping and will not be associated with the Glean client-id? Also, since this data will be matching the shim data sent by Activity Stream on Firefox Desktop, could you please elaborate on what is being collected by this? This may be as easy as linking to the Activity Stream docs on this or perhaps there is a better description of this data to link to or use here.

Unfortunately ActivityStream does not really document what a shim is. We do have an example of a SHIM in this jira ticket.

@Mugurell
Copy link
Contributor Author

@Mugurell Can you please clarify in the data-review request that this data will be sent in a custom ping and will not be associated with the Glean client-id? Also, since this data will be matching the shim data sent by Activity Stream on Firefox Desktop, could you please elaborate on what is being collected by this?

I updated point 5 of the data collection review form and the new probe description with more details about what the shim is.

@Mugurell
Copy link
Contributor Author

Thank you both for the support.
I update the patch and data collection review form based on your recommendation and tested that the telemetry is collected as expected with the results available at https://debug-ping-preview.firebaseapp.com/pings/shimmyYay5.

Copy link
Contributor

@Dexterp37 Dexterp37 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r+ from a data collection/glean POV. Please have a Fenix peer review this as well and wait for the data review response before merging. Thanks!

app/pings.yaml Outdated
- https://github.com/mozilla-mobile/fenix/issues/27549
- https://mozilla-hub.atlassian.net/browse/FNXV2-21791
data_reviews:
- ???
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not forget to update this (and in other places too) before merging :)

@travis79
Copy link
Member

Request for data collection review form

All questions are mandatory. You must receive review from a data steward peer on your responses to these questions before shipping new data collection.

1. What questions will you answer with this data?


* This data will help us understand how the new Pocket sponsored stories feature is used.


2. Why does Mozilla need to answer these questions? Are there benefits for users? Do we need this information to address product or business requirements?


* This data will help in business related decisions.


3. What alternative methods did you consider to answer these questions? Why were they not sufficient?


* There are no other alternatives.


4. Can current instrumentation answer these questions?


* No.


5. List all proposed measurements and indicate the category of data collection for each measurement, using the Firefox data collection categories found on the Mozilla wiki.
   
   Note that the data steward reviewing your request will characterize your data collection based on the highest (and most sensitive) category.

Measurement Description Data Collection Category Tracking Bug #
An indication of which Pocket sponsored story is shown through the collection of the story shim - a unique base64 string identifying each story and type of user interaction. Category 3 - web_activity #27549
An indication of which Pocket sponsored story is clicked through the collection of the story shim - a unique base64 string identifying each story and type of user interaction. Category 3 - web_activity #27549

These new probes will be sent immediately as they are set in a new custom ping - spoc that does not include the Glean client ID.

6. Please provide a link to the documentation for this data collection which describes the ultimate data set in a public, complete, and accurate way.


* [dictionary.telemetry.mozilla.org/apps/fenix](https://dictionary.telemetry.mozilla.org/apps/fenix)


7. How long will this data be collected? Choose one of the following:


* Until version 119 with the option to renew at that point.


8. What populations will you measure?


* All channels, all countries
  , `en-US` and `en-CA` locales.


9. If this data collection is default on, what is the opt-out mechanism for users?


* Default Glean SDK opt-out mechanism.


10. Please provide a general description of how you will analyze this data.


* Glean and Amplitude.


11. Where do you intend to share the results of your analysis?


* On Glean, Amplitude, with mobile teams, the Pocket team and the Kevel provider for sponsored stories.


12. Is there a third-party tool (i.e. not Telemetry) that you are proposing to use for this data collection?


* No third-party tools.

Data Review

  1. Is there or will there be documentation that describes the schema for the ultimate data set in a public, complete, and accurate way?

Yes, through the metrics.yaml file and the Glean Dictionary.

  1. Is there a control mechanism that allows the user to turn the data collection on and off?

Yes, through the "Send Usage Data" preference in the application settings.

  1. If the request is for permanent data collection, is there someone who will monitor the data over time?

N/A, collection to end in version 119

  1. Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 2 & Category 3 (Web Interaction Data)

  1. Is the data collection request for default-on or default-off?

Default-on

  1. Does the instrumentation include the addition of any new identifiers (whether anonymous or otherwise; e.g., username, random IDs, etc. See the appendix for more details)?

No

  1. Is the data collection covered by the existing Firefox privacy notice?

Yes

  1. Does the data collection use a third-party collection tool?

No

Result

Pending escalation to Trust for approval cc Nneka

@Dexterp37
Copy link
Contributor

Hey @Mugurell , given this response, I believe we're good to move on.

@travis79
Copy link
Member

Echoing what @Dexterp37 said, the escalation for the data-review is approved. Please let me know if you have any questions regarding the review, or any feedback to the process.

@travis79 travis79 removed the needs:data-review PR is awaiting a data review label Nov 10, 2022
…ed stories.

With the new telemetry will immediately report when a certain Pocket sponsored
story is shown (visible more than 50%) or clicked by the user.
The reasons for the new ping help easily identify the probe being sent and the
type of shim data.
@Mugurell Mugurell added pr:needs-landing PRs that are ready to land [Will be merged by Mergify] and removed pr:do-not-land needs:review PRs that need to be reviewed labels Nov 10, 2022
@mergify mergify bot merged commit 18d51d9 into mozilla-mobile:main Nov 10, 2022
@Mugurell Mugurell deleted the spocShim branch December 5, 2022 07:21
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr:needs-landing PRs that are ready to land [Will be merged by Mergify]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Collect shim information about sponsored stories interactions
5 participants