Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow users to collaborate while using experiment tracking #1218

Closed
NeroOkwa opened this issue Jun 17, 2022 · 6 comments
Closed

Allow users to collaborate while using experiment tracking #1218

NeroOkwa opened this issue Jun 17, 2022 · 6 comments

Comments

@NeroOkwa
Copy link
Contributor

NeroOkwa commented Jun 17, 2022

Description

This is the third highest priority issue resulting from the experiment tracking adoption user research. Users want to be able to write their experiments to storage that is not on their local computer and share their experiments with other team members.

This is important as it enables a team of users to collaborate and see each other's results as they iterate on a pipeline, compared to the experience of being limited to one user's local machine. Hence, this is a deciding factor for the adoption of Kedro Experiment Tracking.

This pain point also came up in the experiment tracking user testing sessions:

"It was easier for me just to put MLflow in the cloud and report there. And I thought that the native Kedro one was still very coupled. I didn't know how to share the database across many data scientists."

Context

What is the problem?

Users can only perform a model run on a local machine, making it difficult to collaborate on a project with the rest of their team because experiment results are on multiple computers.

Additionally, users have raised other concerns:

"I can't store my experiment tracking data on S3, even though we are running everything else on S3."

Who are the users of this functionality?

Users are primarily data scientists, and data engineers are secondary users.

Why do our users currently have this problem?

We designed it this way to launch a simpler experiment tracking in Kedro, even though the predecessor of Experiment Tracking in Kedro (PerformanceAI) had this functionality.

Currently, users can only store their runs on a local machine via SQLite:

"For now, we'll keep it simple, and team members will track only their own experiments."

What is the impact of solving this problem?

It would be possible to view all user experiments across a team in one place and also solve an outstanding adoption issue for Kedro Experiment Tracking:

"If we could write our metrics files to like an S3 bucket and then run experiment tracking pointing at that S3 bucket, that simplifies our workflow in many different ways. It would make Kedro experiment tracking just as easy, if not easier than MLflow for us."

"You might train one model locally on your computer... Having all those experiments in one place as a single source of truth is powerful." 

How could we implement this functionality?

Option 1

Open the browser and see it - This involves having a server you can connect your Kedro-Viz and other users' Kedro-Viz to. Then you can share from that server.

Implication: That server introduces many complications and must be running all the time. We also need to decide who hosts the server, us or the user. This is related to the shareable URL work, which is another Kedro-Viz project.

Option 2

Create a mechanism where only data is shared and Kedro-Viz is still running locally but has access to the shared data service - An S3 database for example, which can provide the new data.

Implication: This is easier to implement and wouldn't need our constant support to run the service beyond S3, but users need to run locally vs through the browser.

Option 3

Connecting to other solutions (MLflow and Weights & Biases) - These tools provide this functionality natively so we would be relying on their implementations.

Implication: This would only show experiment tracking data and not the flowchart (which would only be local). We would also have to try and support multiple tooling.

What important considerations do we have?

All options above would require we redesign the backend data model. How do we contain all of the things currently shown on Kedro-Viz into a single database vs always deriving the data from code on Kedro framework?

Our current data model is that we currently read the data directly from the Kedro project. We need to solve the data model first - SQLite store - before considering any of these options.

What other related issues can I read?

This is related to other open issues: #1217, #1039, and #1116

@NeroOkwa NeroOkwa self-assigned this Jun 17, 2022
@yetudada
Copy link
Contributor

This functionality would help some of the internal teams that would like to use Kedro experiment tracking too; they cannot use it because they cannot configure the storage location.

@antonymilne
Copy link
Contributor

Very relevant, @limdauto pointed out this recently: https://fly.io/blog/all-in-on-sqlite-litestream/

@yetudada yetudada changed the title Experiment Tracking Adoption: Issue 3 - Providing remote server options for an experiment. Allow users to write experiments to a remote server Jun 23, 2022
@NeroOkwa
Copy link
Contributor Author

NeroOkwa commented Jan 11, 2023

Technical Design Discussion on 11/01/2023

The 3 options were evaluated for their advantages and feasibility, and Option 2 was selected as the most feasible.

Option 1

  • This option would require the team to always be on call for support
  • An alternative would be a solution in which we build Kedro-Viz in a way that it can be hosted, but hosting and the infrastructure would be managed by the user. Most users may not have the required skills or motivation to do this

Option 2

  • This was the most feasible solution as we could keep Kedro-Viz locally and focus on a shared data storage
  • This is supported by the fact that in the past, setting up a bucket or storage location hasn’t been an issue for our users

Option 3

  • Maintainability might be an issue as we would need to keep up with 3rd party changes
  • Another point was that this might be an experiment tracking solution if we decide to ‘sunset’ experiment tracking

Next Steps for Option 2

  • Do a full investigation on how to efficiently store experiment tracking in a bucket - S3, Axure
    • Investigate alternative to SQLite like DuckDB
  • How to consolidate all data in SQLite for Kedro-Viz
  • Redesign how users configure experiment tracking on Kedro

@MatthiasRoels
Copy link

MatthiasRoels commented Jan 19, 2023

Another possibility could be to draw inspiration from what Prefect is doing with their Orion UI. In my opinion, it’s the best of both worlds; you could set it up with a local SQLite db so that you use it locally (just like kedro-viz is currently working). But, you also have the option to set it up with a PostgreSQL backend db and run it as a remote web server. In both cases, the server is able to track metadata of your runs (e.g. DAG, how long each step runs, inputs/outputs generated). This is all metadata already available in kedro at runtime so it should be easy enough to expose it through an API call in a hook!

@limdauto
Copy link
Collaborator

limdauto commented Feb 2, 2023

FWIW what @MatthiasRoels mentioned was the original idea and was also why we bothered with SQLAlchemy in the first place. It should be backend* agnostic. Unless anything has changed recently, literally this is the only place we need to change to enable a different db than sqlite: https://github.com/kedro-org/kedro-viz/blob/main/package/kedro_viz/database.py#L15 -- and make that configurable by end user. I was planning to fork viz to do a PoC for a problem I'm facing at work that will require a different backend than sqlite. I could report some learning back if I get to it.

*: SQLAlchemy-compatible backend, not S3

@yetudada yetudada changed the title Allow users to write experiments to a remote server Allow users to collaborate while using experiment tracking Feb 23, 2023
rashidakanchwala added a commit that referenced this issue May 24, 2023
This PR addresses Issue 3 (#1218) from the user research for Experiment tracking. The update enables users to store their session_store and tracking data in the cloud.

Here's an overview of the Collaborative Experiment Tracking implementation:

In the settings.py, user specifies the S3 bucket location, which triggers the upload of their session_store.db (a SQLite database) to the cloud using fsspec. This process is executed via the SQLiteStore._upload() function during a Kedro run, when a user creates a new experiment.

When users execute Kedro-viz, session_store.db files from all other users are downloaded to the user's local machine through the SQLiteStore._download() function.

These downloaded databases are then merged with the user's current local session_store.db through SQLITEStore._merge() function . As a result, the local session_store.db contains not only the user's experiments, but also those conducted by other team members.

Every user collaborating on a particular experiment tracking project consistently maintains a copy of everyone's experiments, both locally and in the cloud. This synchronization is achieved through the SQLiteStore._sync function (which basically downloads, merges and uploads the session_store.db)
@tynandebold
Copy link
Member

Closing this ticket, as we have many others in the works for this feature. Follow along here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Status: Shipped 🚀
Development

No branches or pull requests

6 participants