Skip to content
This repository has been archived by the owner on Jul 3, 2023. It is now read-only.

Adds telemetry #255

Merged
merged 6 commits into from
Jan 2, 2023
Merged

Adds telemetry #255

merged 6 commits into from
Jan 2, 2023

Conversation

skrawcz
Copy link
Collaborator

@skrawcz skrawcz commented Dec 21, 2022

This adds telemetry capture to Hamilton.

After this change, by default, when using Hamilton, it will collect anonymous usage data to help us improve Hamilton and know where to apply development efforts.

We capture two events: one when a driver object is instantiated, and one when the execute() call on the driver completes.
No user data or potentially sensitive information is or ever will be collected. The captured data is limited to:

  • Operating System and Python version
  • A persistent UUID to indentify the session, stored in ~/.hamilton.conf.
  • Error stack trace limited to Hamilton code, if one occurs.
  • Information on what features you're using from Hamilton: decorators, adapters, result builders.
  • How Hamilton is being used: number of final nodes in DAG, number of modules, size of objects passed to execute().

If you do not wish to participate, one can opt-out with one of the following methods:

  1. Set it to false programmatically in your code before creating a Hamilton driver:
    from hamilton import telemetry
    telemetry.disable_telemetry()
  2. Set the key telemetry_enabled to false in ~/.hamilton.conf under the DEFAULT section:
    [DEFAULT]
    telemetry_enabled = True
    
  3. Set HAMILTON_TELEMETRY_ENABLED=false as an environment variable. Either setting it for your shell session:
    export HAMILTON_TELEMETRY_ENABLED=false
    or passing it as part of the run command:
    HAMILTON_TELEMETRY_ENABLED=false python NAME_OF_MY_DRIVER.py 

Changes

  • adds telemetry.py
  • adds code to driver.Driver to capture telemetry
  • the code should be wrapped in enough try excepts that it should not impact running operations
  • adds code to function_modifiers base.
  • adds tests

How I tested this

Tested this locally, and added unit tests.

Notes

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

@skrawcz skrawcz linked an issue Dec 21, 2022 that may be closed by this pull request
@skrawcz
Copy link
Collaborator Author

skrawcz commented Dec 22, 2022

Quick review notes:

  • check for module to not capture custom decorators
  • add run_id to tie starts and finishes
  • wire up posthog (create account on free tier)
  • write code to sanitize error trace to only capture hamilton related code
  • write unit tests
  • integration tests
  • documentation on how to opt-out
  • usage policy

@elshize
Copy link

elshize commented Dec 23, 2022

I'm wondering if you wouldn't also want to generate a unique ID for each execution to tie start and end events together. It would probably simplify data analysis, but also help to find started but unfinished runs.

@skrawcz
Copy link
Collaborator Author

skrawcz commented Dec 23, 2022

I'm wondering if you wouldn't also want to generate a unique ID for each execution to tie start and end events together. It would probably simplify data analysis, but also help to find started but unfinished runs.

Good idea! Can add it.

@skrawcz skrawcz force-pushed the add_telemetry branch 3 times, most recently from 75eae71 to c32920a Compare December 24, 2022 23:57
@skrawcz skrawcz changed the title WIP sketch of telemetry Adds telemetry Dec 25, 2022
@skrawcz
Copy link
Collaborator Author

skrawcz commented Dec 25, 2022

@elshize would you mind taking this branch for a spin please?
pip install git+ssh://git@github.com/stitchfix/hamilton.git@add_telemetry

Otherwise I just need to add tests and this PR is good to go.

@elshize
Copy link

elshize commented Dec 25, 2022

@skrawcz I updated Hamilton version in my projet and ran my pipeline, worked fine, and got these messages:

DEBUG:root:Succeed in sending telemetry consisting of [b'{"api_key": "phc_mZg8bkn3yvMxqvZKRlMlxjekFU5DFDdcdAsijJ2EH5e", "event": "os_hamilton_run_start", "properties": {"os_type": "posix", "os_version": "Linux-6.0.12-200.fc36.x86_64-x86_64-with-glibc2.35", "python_version": "3.9.11/CPython", "distinct_id": "cda55e8f-785c-4948-86ec-8b3ad3630d64", "hamilton_version:": [1, 11, 1], "telemetry_version": "0.0.1", "number_of_nodes": 21, "number_of_modules": 1, "number_of_config_items": 5, "decorators_used": {"extract_columns": 2}, "graph_adapter_used": "hamilton.base.SimplePythonDataFrameGraphAdapter", "result_builder_used": "custom_builder", "driver_run_id": "965d5d90-9e37-421c-b670-297abe1beddc", "error": null}}'].
DEBUG:root:Succeed in sending telemetry consisting of [b'{"api_key": "phc_mZg8bkn3yvMxqvZKRlMlxjekFU5DFDdcdAsijJ2EH5e", "event": "os_hamilton_run_end", "properties": {"os_type": "posix", "os_version": "Linux-6.0.12-200.fc36.x86_64-x86_64-with-glibc2.35", "python_version": "3.9.11/CPython", "distinct_id": "cda55e8f-785c-4948-86ec-8b3ad3630d64", "hamilton_version:": [1, 11, 1], "telemetry_version": "0.0.1", "is_success": true, "runtime_seconds": 0.3053734302520752, "number_of_outputs": 9, "number_of_overrides": 0, "number_of_inputs": 0, "driver_run_id": "965d5d90-9e37-421c-b670-297abe1beddc", "error": null}}'].

hamilton/driver.py Show resolved Hide resolved
hamilton/driver.py Show resolved Hide resolved
hamilton/driver.py Show resolved Hide resolved
hamilton/driver.py Outdated Show resolved Hide resolved
hamilton/driver.py Show resolved Hide resolved
hamilton/telemetry.py Show resolved Hide resolved
hamilton/telemetry.py Show resolved Hide resolved
hamilton/telemetry.py Outdated Show resolved Hide resolved
hamilton/telemetry.py Outdated Show resolved Hide resolved
hamilton/telemetry.py Outdated Show resolved Hide resolved
`async` is a deprecated python library that hasn't been updated
since 2014.
We do not want to show case `raw_execute` anywhere.
People should be using `execute()` exclusively. So
fixing this example to show how to get a dictionary back,
rather than using `raw_execute`.

Also added documentation to show you can instantiate
it in a function, or outside in the module.
@skrawcz skrawcz marked this pull request as ready for review December 27, 2022 04:52
@skrawcz skrawcz force-pushed the add_telemetry branch 2 times, most recently from 96f3af3 to a33119f Compare December 27, 2022 06:35
Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits but looks good

README.md Show resolved Hide resolved
examples/async/fastapi_example.py Show resolved Hide resolved
graph_adapter_tests/h_async/test_h_async.py Outdated Show resolved Hide resolved
hamilton/base.py Show resolved Hide resolved
hamilton/driver.py Show resolved Hide resolved
hamilton/experimental/h_async.py Show resolved Hide resolved
hamilton/experimental/h_async.py Show resolved Hide resolved
hamilton/function_modifiers/base.py Outdated Show resolved Hide resolved
After this change, by default, when using Hamilton, it will collect anonymous usage data to help us improve Hamilton and know where to apply development efforts.

We capture two events: one when a driver object is instantiated, and one when the `execute()` call on the driver completes.
No user data or potentially sensitive information is or ever will be collected. The captured data is limited to:

* Operating System and Python version
* A persistent UUID to indentify the session, stored in ~/.hamilton.conf.
* Error stack trace limited to Hamilton code, if one occurs.
* Information on what features you're using from Hamilton: decorators, adapters, result builders.
* How Hamilton is being used: number of final nodes in DAG, number of modules, size of objects passed to `execute()`.

If you do not wish to participate, one can opt-out with one of the following methods:
1. Set it to false programmatically in your code before creating a Hamilton driver:
   ```python
   from hamilton import telemetry
   telemetry.disable_telemetry()
   ```
2. Set the key `telemetry_enabled` to `false` in ~/.hamilton.conf under the `DEFAULT` section:
   ```
   [DEFAULT]
   telemetry_enabled = True
   ```
3. Set HAMILTON_TELEMETRY_ENABLED=false as an environment variable. Either setting it for your shell session:
   ```bash
   export HAMILTON_TELEMETRY_ENABLED=false
   ```
   or passing it as part of the run command:
   ```bash
   HAMILTON_TELEMETRY_ENABLED=false python NAME_OF_MY_DRIVER.py
   ```

Otherwise, this commit is a large one, it:

* adds a telemetry.py that handles the schema, sending logic, and related logic for capturing telemetry about hamilton usage. Note: we stop capturing after 1000 checks
for is_telemetry_enabled to handle the case someone is doing something in bulk; we likely don’t care too much pass 1000 invocation. It also creates a thread that sends the telemetry; this should work in all contexts. We did not want to pull in any other python dependences, so that’s why we’re using urllib.
* makes the two Drivers (regular, and async) orchestrate the logic to capture telemetry. So we will only capture telemetry if people are using the standard drivers. Rather than instrumentation graph, I think driver is the better place for it, since that’s where all the context is.
* we add some global state to capture decorator usage and expose it via the graph object. This felt like the most natural way to do it.
* adds tests and adjusts things to ensure telemetry is disabled for unit tests/circleci. Note: the sanitize error test depends on paths -- so circleci is the best place to ensure it works. We should fix this if it becomes an issue.
* adds documentation on how to opt-out.

—— Former commits that are being squashed:
Adds async adapter telemetry unit test

To ensure that the changes to the async driver work
as expected. (+12 squashed commits)
Squashed commits:
[4f25e41] Adds unit tests for telemetry addition

This fixes up a few functions and refactors them to be more easily
unit testable. It also ensures that by default, telemetry is disabled
for unit tests and circleci.
[36e5a7e] Fixing doc strings
[b0d4c4d] Refactors decorator counter methodology

Now it's a decorator on the __call__ function.

That way we decouple the logic for telemetry needs -- without
it explicitly living within the NodeTransformLifecycle class.
I mean it's still coupled, it's just we can now change
that functionality more clearly.
[57e209b] Adjust telemetry documentation and functions

In response to PR comments.

Adds some helper functions to make them easier to unit test.
I put them in `telemetry.py` because they're static, and only
relevant for telemetry, so it didn't seem too bad to put there...
[1bda6a6] Fixes up imports to enable running driver.py as a script

Legacy requirement. Just propagating it.
[6f0c7b0] Adds telemetry tracking ability to async driver

The async driver needs to have special casing
to ensure it can also emit telemetry in an async
friendly way.

So added it to handle sending constructor and execute
tracking that should not impact, for example, running
within a fastapi webserver.
[0bac34a] Wraps sending telemetry request in own thread

For performance reasons we should spawn a thread to ensure
we don't slow down an app's performance.
[cba7bc3] Simplifies sanitize_error logic

Removes unnecessary code, and makes the
variable names a little easier to follow.
[34e574e] Wraps sanitize_error in try except

Since we don't want this code to cause
a cryptic error message for the end user,
so we wrap it in a try except.
[f1d44b9] Adds usage and data privacy section to main README

So that people know what we're doing and how to opt-out of it.
[5ac73a9] Fixes to adjust pending changes to main
[2a84121] Refactors and adds functionality

This commit will be squashed in to the final, but it does the following:

1. Hooks up posthog to capture telemetry. They have a free tier that should
be sufficient for our needs.
2. Refactors code into functions to enable better testing (TODO).
3. Adds logic to sanitize an error. We don't pull the name, just where in the
hamilton code it runs from. This should suffice in helping us understand where
people are encountering errors.
4. Adds logic to not capture custom code with respect to decorators and adapters.
5. Adds three ways to disable telemetry and documents it in the module. (+1 squashed commit)
Squashed commits:
[bb7376a] WIP sketch of telemetry

This is just a rough sketch. It shows one way we might implement things.
I.e. have it all be in the driver. So if someone is using their own custom driver,
we would not get telemetry. AFAIK most people use the current driver.

TODO:
 - actually check whether telemetry gathering is enabled
 - hook it up to something like posthog
 - test, test, test
For 3.7+ we can make it no depend on a sleep.
For 3.6 I believe the easiest is to make it sleep.
We also should capture driver function invocation. This will be useful
to know what functionality from the driver is being utilized, in addition
to execution. E.g. this will help shed light on whether people are using
the DAG and tags together for example. Useful data to know whether
we should enhance capabilities there.
Async test was breaking because of the extra
telemetry invocation. Adding mock context manager
ensures that it is disabled for that one invocation.

Otherwise permanently fixing the sanitize error
issue by regex replacing the line number, since that
will be changing depending on the number of
lines in telemetry.py and we don't want people
to have to update this test because of that.
@skrawcz skrawcz merged commit 64c014a into main Jan 2, 2023
@skrawcz skrawcz deleted the add_telemetry branch January 2, 2023 15:24
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Usage telemetry of Hamilton features
3 participants