Adds telemetry #255

skrawcz · 2022-12-21T23:41:49Z

This adds telemetry capture to Hamilton.

After this change, by default, when using Hamilton, it will collect anonymous usage data to help us improve Hamilton and know where to apply development efforts.

We capture two events: one when a driver object is instantiated, and one when the execute() call on the driver completes.
No user data or potentially sensitive information is or ever will be collected. The captured data is limited to:

Operating System and Python version
A persistent UUID to indentify the session, stored in ~/.hamilton.conf.
Error stack trace limited to Hamilton code, if one occurs.
Information on what features you're using from Hamilton: decorators, adapters, result builders.
How Hamilton is being used: number of final nodes in DAG, number of modules, size of objects passed to execute().

If you do not wish to participate, one can opt-out with one of the following methods:

Set it to false programmatically in your code before creating a Hamilton driver:
```
from hamilton import telemetry
telemetry.disable_telemetry()
```
Set the key telemetry_enabled to false in ~/.hamilton.conf under the DEFAULT section:
```
[DEFAULT]
telemetry_enabled = True
```
Set HAMILTON_TELEMETRY_ENABLED=false as an environment variable. Either setting it for your shell session:
```
export HAMILTON_TELEMETRY_ENABLED=false
```
or passing it as part of the run command:
```
HAMILTON_TELEMETRY_ENABLED=false python NAME_OF_MY_DRIVER.py 
```

Changes

adds telemetry.py
adds code to driver.Driver to capture telemetry
the code should be wrapped in enough try excepts that it should not impact running operations
adds code to function_modifiers base.
adds tests

How I tested this

Tested this locally, and added unit tests.

Notes

Checklist

PR has an informative and human-readable title (this will be pulled into the release notes)
Changes are limited to a single goal (no scope creep)
Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
Any change in functionality is tested
New functions are documented (with a description, list of inputs, and expected output)
Placeholder code is flagged / future TODOs are captured in comments
Project documentation has been updated if adding/changing functionality.

skrawcz · 2022-12-22T20:08:00Z

Quick review notes:

check for module to not capture custom decorators
add run_id to tie starts and finishes
wire up posthog (create account on free tier)
write code to sanitize error trace to only capture hamilton related code
write unit tests
integration tests
documentation on how to opt-out
usage policy

elshize · 2022-12-23T13:49:48Z

I'm wondering if you wouldn't also want to generate a unique ID for each execution to tie start and end events together. It would probably simplify data analysis, but also help to find started but unfinished runs.

skrawcz · 2022-12-23T17:58:54Z

I'm wondering if you wouldn't also want to generate a unique ID for each execution to tie start and end events together. It would probably simplify data analysis, but also help to find started but unfinished runs.

Good idea! Can add it.

skrawcz · 2022-12-25T06:51:19Z

@elshize would you mind taking this branch for a spin please?
pip install git+ssh://git@github.com/stitchfix/hamilton.git@add_telemetry

Otherwise I just need to add tests and this PR is good to go.

elshize · 2022-12-25T21:55:36Z

@skrawcz I updated Hamilton version in my projet and ran my pipeline, worked fine, and got these messages:

DEBUG:root:Succeed in sending telemetry consisting of [b'{"api_key": "phc_mZg8bkn3yvMxqvZKRlMlxjekFU5DFDdcdAsijJ2EH5e", "event": "os_hamilton_run_start", "properties": {"os_type": "posix", "os_version": "Linux-6.0.12-200.fc36.x86_64-x86_64-with-glibc2.35", "python_version": "3.9.11/CPython", "distinct_id": "cda55e8f-785c-4948-86ec-8b3ad3630d64", "hamilton_version:": [1, 11, 1], "telemetry_version": "0.0.1", "number_of_nodes": 21, "number_of_modules": 1, "number_of_config_items": 5, "decorators_used": {"extract_columns": 2}, "graph_adapter_used": "hamilton.base.SimplePythonDataFrameGraphAdapter", "result_builder_used": "custom_builder", "driver_run_id": "965d5d90-9e37-421c-b670-297abe1beddc", "error": null}}'].
DEBUG:root:Succeed in sending telemetry consisting of [b'{"api_key": "phc_mZg8bkn3yvMxqvZKRlMlxjekFU5DFDdcdAsijJ2EH5e", "event": "os_hamilton_run_end", "properties": {"os_type": "posix", "os_version": "Linux-6.0.12-200.fc36.x86_64-x86_64-with-glibc2.35", "python_version": "3.9.11/CPython", "distinct_id": "cda55e8f-785c-4948-86ec-8b3ad3630d64", "hamilton_version:": [1, 11, 1], "telemetry_version": "0.0.1", "is_success": true, "runtime_seconds": 0.3053734302520752, "number_of_outputs": 9, "number_of_overrides": 0, "number_of_inputs": 0, "driver_run_id": "965d5d90-9e37-421c-b670-297abe1beddc", "error": null}}'].

hamilton/driver.py

hamilton/telemetry.py

`async` is a deprecated python library that hasn't been updated since 2014.

We do not want to show case `raw_execute` anywhere. People should be using `execute()` exclusively. So fixing this example to show how to get a dictionary back, rather than using `raw_execute`. Also added documentation to show you can instantiate it in a function, or outside in the module.

elijahbenizzy

A few nits but looks good

README.md

examples/async/fastapi_example.py

graph_adapter_tests/h_async/test_h_async.py

hamilton/base.py

hamilton/driver.py

hamilton/experimental/h_async.py

hamilton/function_modifiers/base.py

After this change, by default, when using Hamilton, it will collect anonymous usage data to help us improve Hamilton and know where to apply development efforts. We capture two events: one when a driver object is instantiated, and one when the `execute()` call on the driver completes. No user data or potentially sensitive information is or ever will be collected. The captured data is limited to: * Operating System and Python version * A persistent UUID to indentify the session, stored in ~/.hamilton.conf. * Error stack trace limited to Hamilton code, if one occurs. * Information on what features you're using from Hamilton: decorators, adapters, result builders. * How Hamilton is being used: number of final nodes in DAG, number of modules, size of objects passed to `execute()`. If you do not wish to participate, one can opt-out with one of the following methods: 1. Set it to false programmatically in your code before creating a Hamilton driver: ```python from hamilton import telemetry telemetry.disable_telemetry() ``` 2. Set the key `telemetry_enabled` to `false` in ~/.hamilton.conf under the `DEFAULT` section: ``` [DEFAULT] telemetry_enabled = True ``` 3. Set HAMILTON_TELEMETRY_ENABLED=false as an environment variable. Either setting it for your shell session: ```bash export HAMILTON_TELEMETRY_ENABLED=false ``` or passing it as part of the run command: ```bash HAMILTON_TELEMETRY_ENABLED=false python NAME_OF_MY_DRIVER.py ``` Otherwise, this commit is a large one, it: * adds a telemetry.py that handles the schema, sending logic, and related logic for capturing telemetry about hamilton usage. Note: we stop capturing after 1000 checks for is_telemetry_enabled to handle the case someone is doing something in bulk; we likely don’t care too much pass 1000 invocation. It also creates a thread that sends the telemetry; this should work in all contexts. We did not want to pull in any other python dependences, so that’s why we’re using urllib. * makes the two Drivers (regular, and async) orchestrate the logic to capture telemetry. So we will only capture telemetry if people are using the standard drivers. Rather than instrumentation graph, I think driver is the better place for it, since that’s where all the context is. * we add some global state to capture decorator usage and expose it via the graph object. This felt like the most natural way to do it. * adds tests and adjusts things to ensure telemetry is disabled for unit tests/circleci. Note: the sanitize error test depends on paths -- so circleci is the best place to ensure it works. We should fix this if it becomes an issue. * adds documentation on how to opt-out. —— Former commits that are being squashed: Adds async adapter telemetry unit test To ensure that the changes to the async driver work as expected. (+12 squashed commits) Squashed commits: [4f25e41] Adds unit tests for telemetry addition This fixes up a few functions and refactors them to be more easily unit testable. It also ensures that by default, telemetry is disabled for unit tests and circleci. [36e5a7e] Fixing doc strings [b0d4c4d] Refactors decorator counter methodology Now it's a decorator on the __call__ function. That way we decouple the logic for telemetry needs -- without it explicitly living within the NodeTransformLifecycle class. I mean it's still coupled, it's just we can now change that functionality more clearly. [57e209b] Adjust telemetry documentation and functions In response to PR comments. Adds some helper functions to make them easier to unit test. I put them in `telemetry.py` because they're static, and only relevant for telemetry, so it didn't seem too bad to put there... [1bda6a6] Fixes up imports to enable running driver.py as a script Legacy requirement. Just propagating it. [6f0c7b0] Adds telemetry tracking ability to async driver The async driver needs to have special casing to ensure it can also emit telemetry in an async friendly way. So added it to handle sending constructor and execute tracking that should not impact, for example, running within a fastapi webserver. [0bac34a] Wraps sending telemetry request in own thread For performance reasons we should spawn a thread to ensure we don't slow down an app's performance. [cba7bc3] Simplifies sanitize_error logic Removes unnecessary code, and makes the variable names a little easier to follow. [34e574e] Wraps sanitize_error in try except Since we don't want this code to cause a cryptic error message for the end user, so we wrap it in a try except. [f1d44b9] Adds usage and data privacy section to main README So that people know what we're doing and how to opt-out of it. [5ac73a9] Fixes to adjust pending changes to main [2a84121] Refactors and adds functionality This commit will be squashed in to the final, but it does the following: 1. Hooks up posthog to capture telemetry. They have a free tier that should be sufficient for our needs. 2. Refactors code into functions to enable better testing (TODO). 3. Adds logic to sanitize an error. We don't pull the name, just where in the hamilton code it runs from. This should suffice in helping us understand where people are encountering errors. 4. Adds logic to not capture custom code with respect to decorators and adapters. 5. Adds three ways to disable telemetry and documents it in the module. (+1 squashed commit) Squashed commits: [bb7376a] WIP sketch of telemetry This is just a rough sketch. It shows one way we might implement things. I.e. have it all be in the driver. So if someone is using their own custom driver, we would not get telemetry. AFAIK most people use the current driver. TODO: - actually check whether telemetry gathering is enabled - hook it up to something like posthog - test, test, test

For 3.7+ we can make it no depend on a sleep. For 3.6 I believe the easiest is to make it sleep.

We also should capture driver function invocation. This will be useful to know what functionality from the driver is being utilized, in addition to execution. E.g. this will help shed light on whether people are using the DAG and tags together for example. Useful data to know whether we should enhance capabilities there.

Async test was breaking because of the extra telemetry invocation. Adding mock context manager ensures that it is disabled for that one invocation. Otherwise permanently fixing the sanitize error issue by regex replacing the line number, since that will be changing depending on the number of lines in telemetry.py and we don't want people to have to update this test because of that.

skrawcz requested a review from elijahbenizzy December 21, 2022 23:41

skrawcz force-pushed the add_telemetry branch from 143f644 to bb7376a Compare December 21, 2022 23:45

skrawcz linked an issue Dec 21, 2022 that may be closed by this pull request

Usage telemetry of Hamilton features #248

Closed

skrawcz mentioned this pull request Dec 21, 2022

Usage telemetry of Hamilton features #248

Closed

skrawcz force-pushed the add_telemetry branch 3 times, most recently from 75eae71 to c32920a Compare December 24, 2022 23:57

skrawcz changed the title ~~WIP sketch of telemetry~~ Adds telemetry Dec 25, 2022

elijahbenizzy reviewed Dec 26, 2022

View reviewed changes

skrawcz force-pushed the add_telemetry branch from e3ea2d9 to 4b2e1d6 Compare December 27, 2022 04:43

skrawcz added 2 commits December 26, 2022 20:52

Removes unnecessary requirement for async example

fc5c724

`async` is a deprecated python library that hasn't been updated since 2014.

skrawcz force-pushed the add_telemetry branch from 4b2e1d6 to 464956c Compare December 27, 2022 04:52

skrawcz marked this pull request as ready for review December 27, 2022 04:52

skrawcz force-pushed the add_telemetry branch 2 times, most recently from 96f3af3 to a33119f Compare December 27, 2022 06:35

skrawcz requested a review from elijahbenizzy December 27, 2022 06:35

skrawcz force-pushed the add_telemetry branch from a33119f to 4316e01 Compare December 27, 2022 06:38

elijahbenizzy reviewed Dec 27, 2022

View reviewed changes

skrawcz added 2 commits December 28, 2022 10:51

Fixes up async test to not depend on sleep

7951aec

For 3.7+ we can make it no depend on a sleep. For 3.6 I believe the easiest is to make it sleep.

skrawcz force-pushed the add_telemetry branch from 4316e01 to 7951aec Compare December 28, 2022 18:51

elijahbenizzy approved these changes Dec 28, 2022

View reviewed changes

skrawcz added 2 commits December 29, 2022 09:46

skrawcz merged commit 64c014a into main Jan 2, 2023

skrawcz deleted the add_telemetry branch January 2, 2023 15:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds telemetry #255

Adds telemetry #255

skrawcz commented Dec 21, 2022 •

edited

Loading

skrawcz commented Dec 22, 2022 •

edited

Loading

elshize commented Dec 23, 2022

skrawcz commented Dec 23, 2022

skrawcz commented Dec 25, 2022 •

edited

Loading

elshize commented Dec 25, 2022

elijahbenizzy left a comment

Adds telemetry #255

Adds telemetry #255

Conversation

skrawcz commented Dec 21, 2022 • edited Loading

Changes

How I tested this

Notes

Checklist

skrawcz commented Dec 22, 2022 • edited Loading

elshize commented Dec 23, 2022

skrawcz commented Dec 23, 2022

skrawcz commented Dec 25, 2022 • edited Loading

elshize commented Dec 25, 2022

elijahbenizzy left a comment

Choose a reason for hiding this comment

skrawcz commented Dec 21, 2022 •

edited

Loading

skrawcz commented Dec 22, 2022 •

edited

Loading

skrawcz commented Dec 25, 2022 •

edited

Loading