Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment tracking: big open questions #1217

Closed
antonymilne opened this issue Sep 7, 2022 · 1 comment
Closed

Experiment tracking: big open questions #1217

antonymilne opened this issue Sep 7, 2022 · 1 comment

Comments

@antonymilne
Copy link
Contributor

antonymilne commented Sep 7, 2022

Note. This is very much a kedro-viz and kedro core issue but I've put it here since that's where most of the big experiment tracking discussions are currently.

There are several parts of experiment tracking that already exist or we have always anticipated adding but feel very uncertain/unachievable at the moment because they either don't have a design or we've deviated from the original designs. Some of these already have their own issues here, but I want to get the ball rolling about what the overall solution here might be. Several of the issues are very closely connected and their solutions will impact each other (e.g. how tracking.MetricsDataSet works will affect the use of SQLite database, which will affect the multi-user experience). That doesn't mean we need to implement lots of new features all at once, but I think we need a holistic design here rather than building it piecemeal. At the moment I feel like we're a bit stuck on these questions and it would be great to get some clarity on them.

Open questions

1. What is SQLiteStore for?

In the original proposal @limdauto stated:

Once we are happy with the schema and implementation of the store, we can move it into Kedro core and make it the default session store.

I think @idanov feels otherwise though:

  1. the SQLiteStore should always live in kedro-viz
  2. SQLiteStore should not be used to record anything other than run metadata (like run command, timestamp, etc.)

Where the SQLiteStore code lives isn't such a big deal, but getting a clearer idea of what SQLiteStore is actually for is essential if we're going to add features like multi-user experience, searching by metric, etc.

Related:

2. What should happen to the tracking datasets?

The original proposal expected there to be three datasets for recording experiment tracking: tracking.MetricsDataSet (key-value pairs with numerical values), tracking.JSONDataSet (general JSON) and tracking.ArtifactDataSet (everything else). The first two of these exist but the third doesn't. Instead it was chosen to implement plots as versioned instances of matplotlib.MatplotlibWriter dataset.

Copying my comments from kedro-org/kedro#1626 (comment):

While I agree with the "tracked plot = versioned dataset" approach, it does feel like an inconsistent and confusing UX given the already-existing tracking datasets:

  • Want to track json data? Change your dataset type to tracking.JSONDataSet.
  • Want to track a plot? Keep the same dataset type but set versioned: true.

Hence I think we do need to work out what happens with tracking.JSONDataSet and tracking.MetricsDataSet sooner rather than later. tracking.JSONDataSet could be easily deprecated in favour of json.JSONDataSet with versioned: true, but tracking.MetricsDataSet is trickier. To me this is directly coupled to questions like "how do I search runs by metric" and "why not just do log_metric call" (which we decided against before). Overall, adding plots to experiment tracking sounds straightforward and I'm very happy to do it by versioned: true, but we need work out a more holistic and complete solution here or experiment tracking becomes a bit of a mish-mash of different approaches.

3. How do we enable a search functionality?

This was always on the roadmap as a feature and now it's been requested by a user: #1039.

The linked issue has the relevant quotes on @limdauto's idea for implementing this, but they all rely on SQLiteStore being used to store metrics in some way. This is something I personally feel most uncertain about because I don't really have any idea how to build a search functionality in either scenario (metrics in SQLiteStore or not).

4. How to enable multi-user experience?

Relevant issue: #1218

This is also very unclear to me currently and depends heavily on the role played by SQLiteStore.

@tynandebold
Copy link
Member

We'll reopen this ticket if we look at this again downstream.

@tynandebold tynandebold closed this as not planned Won't fix, can't repro, duplicate, stale Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

2 participants