Evaluating experiment tracking adoption #1390

yetudada · 2022-03-30T15:33:30Z

Description

We shipped the first iteration of experiment tracking towards the end of November and we would like to understand why the feature has not been adopted. @NeroOkwa is leading this fantastic work.

Hypotheses

We suspect that Experiment Tracking is not being used because:

There is a lack of awareness about the functionality, including that our outreach was not focused enough on targeted users
A subset of users, using Windows computers, cannot use the functionality
Users still prefer using MLflow for experiment tracking
They haven't upgraded their version of Kedro; this reason was raised by @Galileo-Galilei
They might not need the functionality, which would be counterintuitive to our previous research Experiment Tracking in Kedro #1070

Possible Implementation

We're going to structure our research by finding target populations:

Our reference users that the feature was built with
Our poll respondents; that needs to be segmented
Users which have run kedro viz from their CLI
Users who could be on Discord that haven't used the feature

We're going to approach these user groups in the following ways:

Our reference users that the feature was built with: We need to let this group know that the feature is active and running (One of the reference users knows already @Galileo-Galilei)
Our poll respondents that have experiment tracking on their list to try need to be nudged and encouraged to use the feature; might need to help them set it up on their project
Our poll respondents that had no awareness of experiment tracking need to have an introductory session
Our users that have run kedro viz from their CLI, we can schedule interviews and potentially run a survey with this group

Actions to do:

Organise new set of questions to put to new groups
Invent system to ensure we don't speak to users twice / annoy them. Set up on Airtable.
Decide strategy of interviews (based on kedro-viz users)— quantity vs quality based on user groups (surveys vs 1to1s)
Plan walkthroughs for those that need it
Launch poll on Discord
Get list of kedro-viz users

Measure of Success in 3 Months

100 users of experiment tracking
Five presentations on "How I used Experiment Tracking in my workflow?"

Additional data

From @yetudada to @Galileo-Galilei: You were also part of our user testing for Experiment Tracking in Kedro. I want to show you how to set it up (unfortunately it’s only available from Kedro 0.17.5) and the demo.

From @Galileo-Galilei to @yetudada: We have not been using it for now for several reasons:

We are still using kedro==0.16.5 for now because we have legacy projects in production and we do not want to add extra maintenance burden (we have dozens of projects in production + internal plugins , so migration is not a cost we can pay very often). We plan to move to 0.18.x by the end of the year, but we want to wait a little to assess the migration impacts and be sure the version is stable.
We still use mlflow in production, and we don’t have time for testing extensively this new functionality for now. I think we will not use it before 2023.
But for what it’s worth, we are following the developments and hopefully give it a quick try after our 0.18.x migration.

The text was updated successfully, but these errors were encountered:

yetudada · 2022-06-01T09:14:05Z

Initial report up from @NeroOkwa, I'll be making some edits to it.

NeroOkwa · 2022-06-07T08:59:23Z

Experiment Tracking Research - Results

Goal

The goal of this research task is to understand why Kedro Experiment Tracking is not being adopted

Research Question

Why isn't Kedro ET being used, according to top most reasons?

Summary

The reason for the low adoption of Kedro Experiment Tracking is because users prefer other tools that have specific features that solve their 'job to be done'. Below are the top most reasons (based on frequency) that if addressed would increase user adoption of Kedro Experiment Tracking.

The top 3 reasons are:

The ability to save and link images of plots/model artefacts to an experiment, providing more insight and enabling the user to track/compare the evolution of runs across a timeline - 5 users
Visualisation: Ability to show plots /comparison graphs/hyper parameters to evaluate metrics tradeoff - 4 users
Having server/ other storage database would enable multi-user collaboration on an experiment, for a user and their team - 4 users + 1 slack user

The other reasons (and supporting quotes) are shown in the results section below.

Note: A demo session was also conducted for 12 users who had previously not used kedro experiment tracking.

Output

Results

Reasons sorted according to frequency of mention:

Reason 1 - Ability to save and link images of plots/model artefacts to an experiment. This would provide users with more insight (images and metrics together) to track/compare the evolution of runs across a timeline - 5 users

"The, the other really big one is MLflow allows us to save images and not just metrics. And so the ability to save and view things like ROC curves or like confusion matrices, things like that is really helpful. So that's not just saving metrics, but also saving the images as well that go with them. "
Hi Existing Solution: "I am saving those as PNG files (in the azure blob storage) and using some parameters to set the sub folder names so that I can compare to previous runs … not perfect but works". "I’d like to be able to flag some pngs to be included in the experiment tracking so I have a record (with time line) how they’ve change"
"So we trigger a lot of plots. We, I mean, we keep like we join the tables and we put the exploratory data analysis in the plot. So if there is a way that we can show in the kedro visualisation, It's really, really helpful at the moment we are using the notebooks to run everything"
"So that, for example, you go into the UI, say, okay, this is the run that that's important to me. I can get certain objects that I store". "90% of the cases would be CSVs and images"
"If I run a model I want to save the columns that were created next to it, I might want to create a model saved next to it (artefacts below the model) - something I am used to that I didn't have. There is a lot of artefacts I would want to save with an experiment"

Reason 2 - Visualisation: Ability to show plot /comparison graphs/hyper parameters to evaluate metrics tradeoff - 4 users

"Starburst pattern for identifying the tradeoff for different models was incredible and is how we chose which model to promote into production."
"Just like to get live plots, or to get like a plot visualisation of the training directly without writing it yourself through file just inside Kedro the route, like just log to training or something else. And then you can just use Kedro instead of passing it to the node and then saving it to file just sort of easy to use thing."
- "But that's, that's like the only thing I could think of that would help because it would directly create visualisations and kedro-viz, which would be nice that it's like directly the training because you have the run parameters parameters that I see myself and the there was like directly a graph of how the training went for that model. "
"Be able to map hyper parameters to the performance so later on more easily track what was used to produce something vs the timestamp and result were I have to ask myself what did I do at this timestamp what did I do 8 days ago, so tracking hyper parameters"
"I was lacking a way to compare values with charts - only way I found out - missed part of select x-axis with time of experiment and y-axis some values - don't know if it was there"
- "I missed some additional charts I would create adhoc for a specific run like confusion metrics"
- "Nice you can compare a few models, but why is the limit 3, is it because of squishing - maybe it should be scrollable"

Reason 3 - Having server/ other storage database would enable multi-user collaboration on an experiment, for a user and their team - 4 users + 1 slack user

"You might train one model locally on your computer. You might train another one in the cloud. Brendan might run another pipeline or another experiment. Having all of those experiments in one place as a single source of truth is really powerful. "
- "if we could write our metrics files to like an S3 bucket and then run experiment, tracking, pointing at that S3 bucket, man, that, that simplifies our workflow in a lot of different ways and, and would be really great and really helpful. And it would make kedro experiment tracking, you know, just as easy, if not more easy than MLflow for us"
"How would it work if I wanted to setup a remote server - centralised server for the team vs current on local machine".
"I'm running kedro-viz on my laptop, but what if I want to have like a centralised server for the whole team? How would it work? ".
"Is there a feature of saving this externally so Instead of it logging locally with log externally?"
DEMO - "Can you use an existing database so that we can keep track of runs happening in different places?"
- "I see you store it on a database. Is it too hard to change it, to save it to a non sql or pure text?"
Similar question on slack channel:
- "Question about Kedro Experiment Tracking (which is awesome by the way): is the use of a storage location, such as S3, supported for the database which is keeping track of experiments? The setup worked using a local "data/" folder as in the tutorial, but I failed to make it work from S3.Any guidance on this?"
  - "For now, we'll keep it simple and team members will track only their own experiments. Realistically, the tracker would need to support filtering capabilities (and maybe handle the notion of "Experiments") to make the shared setup worth it.Keep on the good work, experiment tracking built-in Kedro is an amazing idea!"

Reason 4 - Having kedro - viz as a standalone product would enable users to integrate kedro-viz with other tools (such as MLflow suite) for non-kedro projects - 2 users

"In this case if I really like experiment tracking I might not consider using it if it isn't a kedro project... I am not sure it is a good direction to go with it being completely integrated, especially if there is a new thing like Mlflow"
DEMO - "Is this something that you can maybe use in a standalone way, the kedro experiment tracking as a service to kind of drop into other projects using MLflow for maybe their model registry and other pieces, but then you can kind of use this stuff coming from the kedro code base where they're without necessarily using kedro that can it essentially be like a standalone replacement for experiment tracking, even in a non-kedro sense."
- "I'm looking to implement this in some sort of organisation. And I say that, oh, I I'm going to use the kedro experiment tracking alongside the other stuff coming from mlflow, because it provides more functionality, maybe just go through off better. But I could feel like from that IT organisation and tech leadership for that organisation, they might say like, well, we have invested into using Mlflow suite overall. And it all kind of fits together in a way as our like strategic direction. And so that's why I'm going to use that anyway.
- And you can't like for other projects using MLflow, we can't just drop this stuff in because we're not using Kedro across the whole organisation. So in, in that case, like if it's, if, if it's like an experiment tracking solution that only works in a few cases, I might pick that one, which has less functionality. And just because it fits into like all of like 90% of my use cases, rather than this one with more functionality, which you can only apply to the 20% in my organisation using Kedro. "

Reason 5 - Improved loading and the ability to loop over parameters. This eliminates the need to re-enter the parameters, and also enables the user run different experiments. Both would simplify the user's workflow - 2 users

"The way ET on Kedro loads parameters is not intuitive - it forces the user to return the parameters that they want so that you can enter that in the catalog to logging, which is counter intuitive, because kedro already has access to them."
"You will, for each experiment, you would have a different type with the complete configuration for the, the metrics that you want to learn and the model. And so, yeah, so this means that you can run a lot of different experiments each time you get back to an experiment, you, you know exactly what, what was the config, which was used, which makes everything very, very simple and completely transparent."

Reason 6 - Ability to visualise large pipelines for inspection and presentation by CSTs - 1 user

"The issue is when you have a large pipeline, the only way to access the view that allows you to compare metrics over time is through the pipeline visualisation, Which is challenging. If you have a large pipeline, because you have to search for the node that has the linked metrics graph."

Reason 7 - Ability to parse a name to an experiment would enable the output comparisons of different configs- 1 user

''When I look at the different outputs in kedro-viz or something like that, then I, I know right from the beginning that this is a config one config two and config three, and then I can easily, easily compare the outputs of the different configs.''

Reason 8 - Division between metrics and parameters for tracked datasets (like in MLflow) makes comparing runs easier. - 1 user

"When you go into kedro-viz you see all the datasets that are tracked without much distinction. If you have that properly separated, when you're comparing runs, it's easier to understand what is happening rather than having a single list of tracked datasets."

Reason 9 - Ability to delete experiment tracking entries that are not useful - 1 user + 1 slack user

Slack user - "I’ve found that an experiment run is added to the tracking dashboard even if there was an error thrown and no metrics logged. As a result, my experiment tracking dashboard is quickly getting bogged down with multiple empty runs. Is there a way to delete runs from the experiment tracking dashboard?"

Reason 10 - Using older versions of Kedro/dependencies - 1 user

"Because it was not a kedro project"
- "Projects that I currently have in Kedro are in older versions of Kedro so don't benefit from the feature just yet- so projects don't run on the latest version - the need came before the feature so was using something instead else"
- "Stopped implementing them in 0.17 - most of my project broke, and migration was a pain, don't have the capacity to go to all their projects to fix and migrate as much as I would like to. Well, probably I don't like to do that actually"

Reason 11 - Ability to address feature drift/input drift would reduce the current high frequency of client reporting- 1 user

"The only way that we can address this right now is actually looking at our training in creating a time series of various visualisations and metrics to look at how it changes through time. I don't know if this fits into kedro experiment tracking, or if it's something else, but there's definitely a hole in the market to address drift. And I think this is one of the reasons why, like, platforms like AWS SageMaker are becoming more and more common is because they actually have some solutions in place to detect drift before it becomes a problem."

yetudada · 2023-08-03T15:18:05Z

I'll close this 🥳 This research is complete.

yetudada added the Issue: Feature Request New feature or improvement to existing feature label Mar 30, 2022

yetudada assigned NeroOkwa and comym Mar 30, 2022

Mackay031 pinned this issue Mar 30, 2022

yetudada assigned Mackay031 Mar 30, 2022

yetudada unpinned this issue Apr 29, 2022

yetudada added this to Kedro-Viz May 23, 2022

yetudada moved this to In Progress in Kedro-Viz May 23, 2022

merelcht added Design: Research and removed Issue: Feature Request New feature or improvement to existing feature labels May 23, 2022

yetudada moved this from In Progress to In Review in Kedro-Viz May 30, 2022

yetudada moved this from In Review to Done in Kedro-Viz Jun 8, 2022

This was referenced Jun 17, 2022

Ability to link plots to an experiment #1626

Closed

Exploring Metrics on Experiment Tracking - User Testing Synthesis #1627

Closed

Allow users to collaborate while using experiment tracking kedro-org/kedro-viz#1218

Closed

yetudada added this to Roadmap Jul 27, 2022

yetudada moved this to Shipped in Roadmap Jul 27, 2022

NeroOkwa mentioned this issue Jul 21, 2023

Enable Kedro-Viz functionality through a notebook, without Kedro Framework. kedro-org/kedro-viz#1459

Closed

yetudada closed this as completed Aug 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluating experiment tracking adoption #1390

Evaluating experiment tracking adoption #1390

yetudada commented Mar 30, 2022 •

edited by NeroOkwa

Loading

yetudada commented Jun 1, 2022

NeroOkwa commented Jun 7, 2022 •

edited

Loading

yetudada commented Aug 3, 2023

Evaluating experiment tracking adoption #1390

Evaluating experiment tracking adoption #1390

Comments

yetudada commented Mar 30, 2022 • edited by NeroOkwa Loading

Description

Hypotheses

Possible Implementation

Measure of Success in 3 Months

Additional data

yetudada commented Jun 1, 2022

NeroOkwa commented Jun 7, 2022 • edited Loading

Experiment Tracking Research - Results

Goal

Research Question

Summary

The top 3 reasons are:

The other reasons (and supporting quotes) are shown in the results section below.

Results

Reason 1 - Ability to save and link images of plots/model artefacts to an experiment. This would provide users with more insight (images and metrics together) to track/compare the evolution of runs across a timeline - 5 users

Reason 2 - Visualisation: Ability to show plot /comparison graphs/hyper parameters to evaluate metrics tradeoff - 4 users

Reason 3 - Having server/ other storage database would enable multi-user collaboration on an experiment, for a user and their team - 4 users + 1 slack user

Reason 4 - Having kedro - viz as a standalone product would enable users to integrate kedro-viz with other tools (such as MLflow suite) for non-kedro projects - 2 users

Reason 5 - Improved loading and the ability to loop over parameters. This eliminates the need to re-enter the parameters, and also enables the user run different experiments. Both would simplify the user's workflow - 2 users

Reason 6 - Ability to visualise large pipelines for inspection and presentation by CSTs - 1 user

Reason 7 - Ability to parse a name to an experiment would enable the output comparisons of different configs- 1 user

Reason 8 - Division between metrics and parameters for tracked datasets (like in MLflow) makes comparing runs easier. - 1 user

Reason 9 - Ability to delete experiment tracking entries that are not useful - 1 user + 1 slack user

Reason 10 - Using older versions of Kedro/dependencies - 1 user

Reason 11 - Ability to address feature drift/input drift would reduce the current high frequency of client reporting- 1 user

yetudada commented Aug 3, 2023

yetudada commented Mar 30, 2022 •

edited by NeroOkwa

Loading

NeroOkwa commented Jun 7, 2022 •

edited

Loading