-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Datadog contrib for monitoring purpose #2434
Add Datadog contrib for monitoring purpose #2434
Conversation
Interesting, I was certain that I had run the spec suite locally to make sure that all was well. |
5ed031c
to
3d4f9d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to add tests of this contrib module?
@dlstadther I'll take a look at how we could test this contrib without it being painful. Thanks for your input! |
a87b877
to
b5de03b
Compare
@dlstadther Hey! I've added a bunch of fun tests to make sure everything was working and I'm having a bunch of trouble running the spec suite locally. When running each individual new tests locally all works, but I want to make sure that running the whole suite works fine.
Let me know how to proceed further! :) |
I've not seen that doc failure before. I've restarted the build and will review again after the build completes. (Thanks for your efforts!) |
docs tests are still failing with same error Are you able to build the docs locally? |
0ff975f
to
700172d
Compare
@dlstadther Docs are running file locally:
Also when I'm running the tests locally that seems to hang on Travis, they all succeed. Not sure what this is all about :/ I think it's worth investigating deeper. |
@Tarrasch Have you experienced this sort of Travis doc failure or test exit before? |
No idea, I restarted an older build to know if it's due to changes in this PR or not. Maybe Travis just had a bad day? |
@cabouffard Could you resolve conflicts and let's see if the build can be resolved! |
e4b6c0c
to
0d30791
Compare
@dlstadther I've tried many different things, I can't figure out why the specs, with the addition of this PR, keeps hanging on Travis. I also can't figure out why the docs keep failing on Travis while I can run it on my local machine without any problems. When I try to run Fortunately, those specs aren't failing on Travis, but I can't reproduce Travis' behavior locally. I would need help on this. |
This is bizarre... Nothing is standing out to me as an issue and yet there is clearly a problem |
@dlstadther What would you suggest the next step be? Can you investigate the issue on your side, can any other collaborator jump and give us their point of view on the matter? Thanks! :) |
@dlstadther Hey, what's the latest on this one? :) |
So sorry @cabouffard - been kinda busy and unintentionally neglected this. Hoping to have some more time next week - i'll set myself a reminder! Thanks for your patience :) |
(Found a little time today to look into this briefly) @cabouffard I've downloaded this branch locally and tried building docs ( 2.7.15 failed with the import error. (Spit balling here a bit...) I'm inclined to suspect the I've had personal experience with unittest where I've received misleading errors due to package import errors. Mind looking into a 3.4- solution for Enum and see if that resolves the issues here? Thanks! |
It has been my turn to be busy! Very clever find about the ENUM, I would have expected a better error message. Thank for taking your time to help me out on this! I'll make the additional changes right now! :) |
0d30791
to
e09888c
Compare
Datadog is a tool that allows you to send metrics that you create dashboard and add alerting on specific behaviors. Adding this contrib will allow for users of this tool to log their pipeline information to Datadog. Based on the status change of a task, we log that information to Datadog with the parameters that were used to run that specific task. This allow us to easily create dashboard to visualize the health. For example, we can be notified via Datadog if a task has failed, or we can graph the execution time of a specific task over a period of time. The implementation idea was strongly based on the stale PR spotify#2044.
I've also added a few test to ensure that the implementation was working well.
This takes care of ensuring that the proper metrics collection calls are being done when they are expected to be happening. We've also removed a few `@RPC_METHOD` that weren't actually being used and that wasn't required.
This makes sure that we're properly dispatching API and STATSD call with the proper parameter values to Datadog. This doesn't test all the different possible parameters configuration.
This adds a few extra documentation line for the configuration to allow user to find all the settings they can tweak for each individual contribs instead of having to go through each individual contrib files.
The original implementation was made when 0.16.0 was the latest version. Since there there have been a few improvements and bug fixes made to the library that we should be using. Reading through the release log there shouldn't be any feature-breaking changes so we should be good to update it!
Previously, the getter wasn't a class method and wouldn't work as expected. In order to ensure that the output is what we expect, we've added more tests.
There was multiple problems that needed to be solved in order to get the specs green again. Each individual specs were passing when ran individually, but when ran into tox as a group, some of them would pass and other would fail depending the tox environment. It came to my attention that the time function of this file, was creating an issue with other specs because we were not tearDowning it as expected. Also, using setTime within the setUp group had side effects with unexpected behaviors. Then, the way way that the task_id and task_family was named was also causing problems with the same spec that were failing prior. I'm unsure why this would be the case, but changing either fail, but changing both makes the spec to green. Finally, the last spec would always fail because the setTime was set AFTER the task was actually being run, which would always cause the execution time to be greater than 0. My understanding of all of this is still a bit fuzzy, but hey, now the spec suite passes.
84498cf
to
7756b37
Compare
98ad6dc
to
a7df8f0
Compare
This will force people to implement this methods of this class when they refer to it.
This allows for less-strict function calls.
a14df58
to
5ba5917
Compare
3e04070
to
d387b99
Compare
7d9077f
to
e876d9c
Compare
@dlstadther Sorry for all the force pushes, I've had trouble with my local environment so I decided to test my commits directly on Travis :) Now that the specs are all passing, I've tested this branch on our development system and all seems to work accordingly. I think we can finally 🚢 it! Thank you so much for the help! |
The underlying configuration of the Datadog metrics collector is a property, so it makes more sense that it's also a property when used within the class itself.
Hey yall - just wanna say that we're really excited to be able to use this soon! Good luck getting it past the finish line, can't wait to try to measure our pipeline w/ this integration. |
Thanks for the long, hard work @cabouffard ! |
🎉 |
Description
Datadog is a tool that allows you to send metrics that you create
dashboard and add alerting on specific behaviors.
Adding this contrib will allow for users of this tool to log their pipeline
information to Datadog.
Motivation and Context
Based on the status change of a task, we log that information to Datadog
with the parameters that were used to run that specific task.
This allows us to easily create dashboards to visualize the health. For
example, we can be notified via Datadog if a task has failed, or we can
graph the execution time of a specific task over a period of time.
The implementation idea was strongly based on the stale PR
#2044.
Have you tested this? If so, how?
We've been using this contrib for multiple months now (maybe a year?), at Glossier. This is the main point of reference to see the health of our pipeline.