Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluating experiment tracking adoption #1390

Closed
6 tasks done
yetudada opened this issue Mar 30, 2022 · 3 comments
Closed
6 tasks done

Evaluating experiment tracking adoption #1390

yetudada opened this issue Mar 30, 2022 · 3 comments
Assignees

Comments

@yetudada
Copy link
Contributor

yetudada commented Mar 30, 2022

Description

We shipped the first iteration of experiment tracking towards the end of November and we would like to understand why the feature has not been adopted. @NeroOkwa is leading this fantastic work.

Hypotheses

We suspect that Experiment Tracking is not being used because:

Screenshot 2022-03-30 at 15 58 53

Possible Implementation

We're going to structure our research by finding target populations:

  • Our reference users that the feature was built with
  • Our poll respondents; that needs to be segmented
  • Users which have run kedro viz from their CLI
  • Users who could be on Discord that haven't used the feature

We're going to approach these user groups in the following ways:

  • Our reference users that the feature was built with: We need to let this group know that the feature is active and running (One of the reference users knows already @Galileo-Galilei)
  • Our poll respondents that have experiment tracking on their list to try need to be nudged and encouraged to use the feature; might need to help them set it up on their project
  • Our poll respondents that had no awareness of experiment tracking need to have an introductory session
  • Our users that have run kedro viz from their CLI, we can schedule interviews and potentially run a survey with this group

Actions to do:

  • Organise new set of questions to put to new groups
  • Invent system to ensure we don't speak to users twice / annoy them. Set up on Airtable.
  • Decide strategy of interviews (based on kedro-viz users)— quantity vs quality based on user groups (surveys vs 1to1s)
  • Plan walkthroughs for those that need it
  • Launch poll on Discord
  • Get list of kedro-viz users

Measure of Success in 3 Months

  • 100 users of experiment tracking
  • Five presentations on "How I used Experiment Tracking in my workflow?"

Additional data

From @yetudada to @Galileo-Galilei: You were also part of our user testing for Experiment Tracking in Kedro. I want to show you how to set it up (unfortunately it’s only available from Kedro 0.17.5) and the demo.

From @Galileo-Galilei to @yetudada: We have not been using it for now for several reasons:

  • We are still using kedro==0.16.5 for now because we have legacy projects in production and we do not want to add extra maintenance burden (we have dozens of projects in production + internal plugins , so migration is not a cost we can pay very often). We plan to move to 0.18.x by the end of the year, but we want to wait a little to assess the migration impacts and be sure the version is stable.
  • We still use mlflow in production, and we don’t have time for testing extensively this new functionality for now. I think we will not use it before 2023.
  • But for what it’s worth, we are following the developments and hopefully give it a quick try after our 0.18.x migration.
@yetudada
Copy link
Contributor Author

yetudada commented Jun 1, 2022

Initial report up from @NeroOkwa, I'll be making some edits to it.

@NeroOkwa
Copy link
Contributor

NeroOkwa commented Jun 7, 2022

Experiment Tracking Research - Results

Goal

The goal of this research task is to understand why Kedro Experiment Tracking is not being adopted 

Research Question

Why isn't Kedro ET being used, according to top most reasons? 

Summary

The reason for the low adoption of Kedro Experiment Tracking is because users  prefer other tools that have specific features that solve their 'job to be done'. Below are the top most reasons (based on frequency)  that if addressed would increase user adoption of Kedro Experiment Tracking.

The top 3 reasons are:

  1. The ability to save and link images of plots/model artefacts to an experiment, providing more insight and enabling the user to track/compare the evolution of runs across a timeline - 5 users
  2. Visualisation: Ability to show plots /comparison graphs/hyper parameters to evaluate metrics tradeoff - 4 users 
  3. Having server/ other storage database would enable multi-user collaboration on an experiment, for a user and their team - 4 users + 1 slack user

The other reasons (and supporting quotes) are shown in the results section below.

Note: A demo session was also conducted for 12 users who had previously not used kedro experiment tracking.

Output

Results

Reasons sorted according to frequency of mention:

Reason 1 - Ability to save and link images of plots/model artefacts to an experiment. This would provide users with more insight (images and metrics together) to track/compare the evolution of runs across a timeline   - 5 users

  • "The, the other really big one is MLflow allows us to save images and not just metrics. And so the ability to save and view things like ROC curves or like confusion matrices, things like that is really helpful. So that's not just saving metrics, but also saving the images as well that go with them. "

  • Hi Existing Solution: "I am saving those as PNG files (in the azure blob storage) and using some parameters to set the sub folder names so that I can compare to previous runs … not perfect but works". "I’d like to be able to flag some pngs to be included in the experiment tracking so I have a record (with time line) how they’ve change"

  • "So we trigger a lot of plots. We, I mean, we keep like we join the tables and we put the exploratory data analysis in the plot. So if there is a way that we can show in the kedro visualisation, It's really, really helpful at the moment we are using the notebooks to run everything"

  • "So that, for example, you go into the UI, say, okay, this is the run that that's important to me. I can get certain objects that I store". "90% of the cases would be CSVs and images"

  • "If I run a model I want to save the columns that were created next to it, I might want to create a model saved next to it (artefacts below the model) - something I am used to that I didn't have. There is a lot of artefacts I would want to save with an experiment" 

Reason 2 - Visualisation: Ability to show plot /comparison graphs/hyper parameters to evaluate metrics tradeoff - 4 users

  • "Starburst pattern for identifying the tradeoff for different models was incredible and is how we chose which model to promote into production." 

  • "Just like to get live plots, or to get like a plot visualisation of the training directly without writing it yourself through file just inside Kedro the route, like just log to training or something else. And then you can just use Kedro instead of passing it to the node and then saving it to file just sort of easy to use thing."

    • "But that's, that's like the only thing I could think of that would help because it would directly create visualisations and kedro-viz, which would be nice that it's like directly the training because you have the run parameters parameters that I see myself and the there was like directly a graph of how the training went for that model. "
  • "Be able to map hyper parameters to the performance so later on more easily track what was used to produce something vs the timestamp and result were I have to ask myself what did I do at this timestamp what did I do  8 days ago, so tracking hyper parameters"

  • "I was lacking a way to compare values with charts - only way I found out - missed part of select x-axis with time of experiment and y-axis some values - don't know if it was there"

    • "I missed some additional charts I would create adhoc for a specific run like confusion metrics"

    • "Nice you can compare a few models, but why is the limit 3, is it because of squishing - maybe it should be scrollable"

Reason 3 - Having server/ other storage database would enable multi-user collaboration on an experiment, for a user and their team - 4 users + 1 slack user

  • "You might train one model locally on your computer. You might train another one in the cloud. Brendan might run another pipeline or another experiment. Having all of those experiments in one place as a single source of truth is really powerful. " 

    • "if we could write our metrics files to like an S3 bucket and then run experiment, tracking, pointing at that S3 bucket, man, that, that simplifies our workflow in a lot of different ways and, and would be really great and really helpful. And it would make kedro experiment tracking, you know, just as easy, if not more easy than MLflow for us"
  • "How would it work if I wanted to setup a remote server - centralised server for the team vs current on local machine".

  • "I'm running kedro-viz on my laptop, but what if I want to have like a centralised server for the whole team? How would it work? ".

  • "Is there a feature of saving this externally so Instead of it logging locally with log externally?"

  • DEMO - "Can you use an existing database so that we can keep track of runs happening in different places?"

    • "I see you store it on a database. Is it too hard to change it, to save it to a non sql or pure text?" 
  • Similar question on slack channel

    • "Question about Kedro Experiment Tracking (which is awesome by the way): is the use of a storage location, such as S3, supported for the database which is keeping track of experiments? The setup worked using a local "data/" folder as in the tutorial, but I failed to make it work from S3.Any guidance on this?"

      • "For now, we'll keep it simple and team members will track only their own experiments. Realistically, the tracker would  need to support filtering capabilities (and maybe handle the notion of "Experiments") to make the shared setup worth it.Keep on the good work, experiment tracking built-in Kedro is an amazing idea!"

Reason 4 - Having kedro - viz as a standalone product would enable users to integrate kedro-viz with other tools (such as MLflow suite) for non-kedro projects  - 2 users

  • "In this case if I really like experiment tracking I might not consider using it if it isn't a kedro project... I am not sure it is a good direction to go with it being completely integrated, especially if there is a new thing like Mlflow" 

  • DEMO - "Is this something that you can maybe use in a standalone way, the kedro experiment tracking as a service to kind of drop into other projects using MLflow for maybe their model registry and other pieces, but then you can kind of use this stuff coming from the kedro code base where they're without necessarily using kedro that can it essentially be like a standalone replacement for experiment tracking, even in a non-kedro sense."

    • "I'm looking to implement this in some sort of organisation. And I say that, oh, I I'm going to use the kedro experiment tracking alongside the other stuff coming from  mlflow, because it provides more functionality, maybe just go through off better. But I could feel like from that IT organisation and tech leadership for that organisation, they might say like, well, we have invested into using Mlflow suite overall. And it all kind of fits together in a way as our like strategic direction. And so that's why I'm going to use that anyway.
    • And you can't like for other projects using MLflow, we can't just drop this stuff in because we're not using Kedro across the whole organisation. So in, in that case, like if it's, if, if it's like an experiment tracking solution that only works in a few cases, I might pick that one, which has less functionality. And just because it fits into like all of like 90% of my use cases, rather than this one with more functionality, which you can only apply to the 20% in my organisation using Kedro. "

Reason 5 - Improved loading and the ability to loop over parameters. This eliminates the need to re-enter the parameters, and also enables the user run different experiments. Both would simplify the user's workflow - 2 users

  • "The way ET on Kedro loads parameters is not intuitive - it forces the user to  return the parameters that they want so that you can enter that in the catalog to logging, which is counter intuitive, because kedro already has access to them." 

  • "You will, for each experiment,  you would have a different type with the complete configuration for the, the metrics that you want to learn and the model. And so, yeah, so this means that you can run a lot of different experiments each time you get back to an experiment, you, you know exactly what, what was the config, which was used, which makes everything very, very simple and completely transparent."

Reason 6 - Ability to visualise large pipelines for inspection and presentation by CSTs - 1 user 

  • "The issue is when you have a large pipeline, the only way to access the view that allows you to compare metrics over time is through the pipeline visualisation, Which is challenging. If you have a large pipeline, because you have to search for the node that has the linked metrics graph." 

Reason 7 - Ability to parse a name to an experiment  would enable the output comparisons of different configs- 1 user

  • ''When I look at the different outputs in kedro-viz or something like that, then I, I know right from the beginning that this is a config one config two and config three, and then I can easily, easily compare the outputs of the different configs.''

Reason 8 - Division between metrics and parameters for tracked datasets (like in MLflow) makes comparing runs easier. - 1 user

  • "When you go into kedro-viz you see all the datasets that are tracked without much distinction. If you have that properly separated, when you're comparing runs, it's easier to understand what is happening rather than having a single list of tracked datasets."

Reason 9 - Ability to delete experiment tracking entries that are not useful - 1 user + 1 slack user

  • Slack user - "I’ve found that an experiment run is added to the tracking dashboard even if there was an error thrown and no metrics logged. As a result, my experiment tracking dashboard is quickly getting bogged down with multiple empty runs. Is there a way to delete runs from the experiment tracking dashboard?"

Reason 10 - Using older versions of Kedro/dependencies - 1 user

  • "Because it was not a kedro project"
    • "Projects that I currently have in Kedro are in older versions of Kedro so don't benefit from the feature just yet- so projects don't run on the latest version - the need came before the feature so was using something instead else"

    • "Stopped implementing them in 0.17 - most of my project broke, and migration was a pain,  don't have the capacity to go to all their projects to fix and migrate as much as I would like to. Well, probably I don't like to do that actually"

Reason 11 - Ability to address feature drift/input drift  would reduce the current high frequency of client reporting- 1 user 

  • "The only way that we can address this right now is actually looking at our training in creating a time series of various visualisations and metrics to look at how it changes through time. I don't know if this fits into kedro experiment tracking, or if it's something else, but there's definitely a hole in the market to address drift. And I think this is one of the reasons why, like, platforms like AWS SageMaker are becoming more and more common is because they actually have some solutions in place to detect drift before it becomes a problem." 

@yetudada
Copy link
Contributor Author

yetudada commented Aug 3, 2023

I'll close this 🥳 This research is complete.

@yetudada yetudada closed this as completed Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Status: Shipped 🚀
Development

No branches or pull requests

5 participants