-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exploring Metrics on Experiment Tracking - User Testing Synthesis #1627
Comments
For this issue, it's worth noting that we do have this functionality already. It's just that:
|
Note that plotting metrics against parameters and/or kedro runs is a big topic which has been considered by many different tools and also discussed by us before: Just don't want previous discussions or existing solutions from other products to be forgotten about here 🙂 |
We should be careful with our assumptions here. Some notes about that:
Bottom line is that the data aren't always going to be nice, not always between 0 and 1, or play nice together if a user is tracking multiple metrics on one plot. Do reference @AntonyMilneQB's comment here for more context. Let's pick @noklam's brain about this, too. He may have some great real-world experience with some other tools in this space that do similar things. |
As I understand we are discussing comparison plots across runs here.
This feature are almost available for most experiment tracking tool, but this is usually for a X-axis within the same run, but I think it's mostly valid for cross-runs as well.
See similar things on Weight & Biases, which is really flexible and you can configure
I think it all makes sense, but some of the features would be difficult to implement, and the live plot is mentioned in this issue. The more raw data you keep, the more flexible you can customize these plots later. Another limiting factor for the live plot is we only save output at the end of a node execution. We need to keep data at a more granular level to support live plots and these chart customizations. It will be a huge change on the backend though and doesn't feel quite well with the node execution paradigm. Side note: |
Just to clarify, I don't think live plotting of metric vs. epoch is in scope here at all (as @noklam says, we can't do anything like that without a lot more work on kedro core and it would be quite a paradigm shift). For now we're just concerned with comparing metrics saved as a dataset (so from a node output) in one |
Hey everyone! I won't be in the Experiment Tracking review session tomorrow and I just have some thoughts on the current prototype design. So from what I understand the original problem we're supposed to be solving is: "I'm choosing not to use Kedro-Viz Experiment Tracking because it doesn't allow me to visualise metrics over time." I may be wrong but I assumed it would as simple as saying, "I've done 20 pipeline runs, I was tracking The reason I ask this is because:
So at the end of the day, the question becomes which problem are we solving for our users to increase adoption of Kedro-Viz Experiment Tracking? Are our users choosing not to use Kedro-Viz Experiment Tracking because:
I'm inclined to think it's the first problem but I'm also happy to be proven wrong on this. So keeping in mind that I'm also making assumptions throughout this piece, I would propose the following structure for user testing, which would provide more insights into the impact of not delivering on either of those problem statements:
Visual ReferencesABC |
One final thought while it occurs to me: you can actually sort of retain the the time ordering in the parallel coordinates plot if you colour the lines somehow, e.g. to show the oldest ones fainter than the most recent ones. Not super important because I don't think the time ordering is that important, but at least highlighting the most recent run might be nice. |
As I said in the meeting yesterday, my intuition and instinct around what a user may want for new features here isn't sharp. I defer to @AntonyMilneQB, @noklam, and others who have used things like this in the past while doing real DS/DE/ML work. What I do think we need is consistency with our hierarchy of information and a viable amount of added value with whatever we develop next. A few things stood out to me during the meeting yesterday:
I'm excited to hear what our interviewees say when this is shown to them. Lastly, calling out @noklam here. Please add some thoughts and comments if you have some. I think they're invaluable here! |
While browsing the original issue I came across this from @mkretsch327 (ex-QB data scientist). Basically I think DS (me, Nok, Matt) like the parallel coordinates plot 👍
|
I'm happy for this. I will say that we will prioritise one view to solve the original user problem that was raised. At this point it's either parallel coordinates or time-series, it won't be both because we have other problems to solve once this is completed. And I want to feel certain that if we acted on kedro-org/kedro-viz#1000 that we would be doing the right thing. Admittedly, I am a bit nervous about the parallel plot because we had feedback about the spider diagram when we were evaluating PAI. I highlighted the relevant insight in dark pink. |
Let's see what users say. I see how the spider diagram might be confusing for some (even though is the same thing as parallel coordinates). It might look cool for some but the fact it was circular added too much in [visual] complexity and more difficult readability. This is not an issue related to this specific graphic but a universal visual design fact. When flattening "the same" into a horizontal alignment it becomes much more digestible. I understand picking one or the other for now for the sake of practicality and moving forward iteratively, but I would not ignore one or the other since they are different ways of exploring the data from different angles. Again, let's ask the right questions and listen to what users say over the sessions. Loads of great insights are coming. |
User Testing Synthesis - ResultsGoal and MethodologyThe goal of this session was to evaluate the usability and value risk of the proposed feature on #1627(tracking metrics over time) through a low-fidelity mockup and a high-fidelity prototype. The research used a qualitative (interview 🎤 - 6 participants) and quantitative (polls 🗳️) approach across the QuantumBlack and open-source user bases. 1 - Experiment Tracking Use CaseSummary: 2/6 users currently use kedro experiment tracking feature. Experiment tracking was used by users to understand their experiments and to find the best one by iterating with different parameter, to produce different metrics. This was done using MLflow, Weights & Biases, and Tableau
2 - On Plotly Visualisation in Flowchart ModeSummary: 3/6 users know of this feature and have used it to plot their metrics. One user mentioned that its location is non-intuitive and difficult to find for non-users
3 - Knowing which Metrics to trackSummary: 3/6 users start with a clear metric to track defined by the project, while others don’t and are more exploratory.
4 - On New Tab DesignSummary: All 6 users prefer this new tab design
5 - On Plots: Parallel Coordinates & Time SeriesSummary: 2 users each like time series and parallel coordinate plots, and 2 users like and would use both plots for different use cases.
6 - On Comparison ModeSummary: 4/6 users preferred comparison mode in parallel coordinate mode compared to time series. 1 user found comparison mode and the ‘metrics’ tab confusing.
7 - Pain PointsSummary: The most common pain point identified by 4/6 users was the axis, or the ability to change the scales or customize the values to be in percentages for easy comparison.
Features still missing for User’s Pain PointSummary: There were general feature requests and those specific to the plots. The most common general features identified by 3/6 users was Filtering, followed by the ability to change the axis or Customize the metric values.
Problems we still need to consider for the future
|
I'll close this 🥳 This theme is complete. |
Description
Ability to plot experiment metrics derived from pipeline runs.
This is based on the second high priority issue resulting from the experiment tracking user research, which is:
Visualisation: Ability to show plot /comparison graphs/hyper parameters to evaluate metrics tradeoff
What is the problem?
Who are the users of this functionality?
Why do our users currently have this problem?
What is the impact of solving this problem?
What could we possibly do?
The text was updated successfully, but these errors were encountered: