Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Put kedro catalog on-line #1239

Closed
eepgwde opened this issue Feb 11, 2022 · 5 comments
Closed

Put kedro catalog on-line #1239

eepgwde opened this issue Feb 11, 2022 · 5 comments
Labels
Community Issue/PR opened by the open-source community Issue: Feature Request New feature or improvement to existing feature

Comments

@eepgwde
Copy link

eepgwde commented Feb 11, 2022

Description

The kedro catalog is so useful, it can be used by non-kedro users as a data dictionary.

Context

The data processing done by kedro is usually made available by users on Cloud Storage or cloud services.
It would be useful to see a table's load path, so that an end-user could take the S3 and use it with Spark, Athena or even PowerBI.

A chain of the load paths for a pipeline. A set of URIs for the tables.

And other useful things. Recording notes about tables. Writing up constraints. A multi-user Wiki on a Kedro project.

Possible Implementation

I think it would be a nodejs server. Mostly of the JavaScript could be server side.

Possible Alternatives

I have used similar data dictionaries from Altova. You have to do a lot of the coding yourself.

There are academics working on "Wikify your Metadata!"

@eepgwde eepgwde added the Issue: Feature Request New feature or improvement to existing feature label Feb 11, 2022
@datajoely
Copy link
Contributor

I think this one is also related to the #1076 - if you have the metadata to build docs, this becomes an implementation detail where you host them

Galileo-Galilei pushed a commit to Galileo-Galilei/kedro that referenced this issue Feb 19, 2022
@merelcht merelcht added the Community Issue/PR opened by the open-source community label Mar 7, 2022
@astrojuanlu
Copy link
Member

It is not entirely clear to me whether this issue is about putting the catalog.yml (and companion files, for example a directory with different catalog* patterns) in remote locations (say, an object storage like S3) and accessing them from Python, or rather creating a web application that serves the catalog under an API + deploying such app in a cloud service.

@eepgwde I know it's been a long time but by any chance would you like to provide a bit more context?

@datajoely
Copy link
Contributor

Slightly tangential - but I think it would be interesting to allow the kedro run --conf-source=<path-to-new-conf-directory> to support fsspec. It would also allow multiple projects to share a catalog.

@datajoely
Copy link
Contributor

We actually have this CLI level logic available for micropackaging

@merelcht
Copy link
Member

This issue hasn't had any recent activity, so I'm closing it.

@merelcht merelcht closed this as not planned Won't fix, can't repro, duplicate, stale Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Issue/PR opened by the open-source community Issue: Feature Request New feature or improvement to existing feature
Projects
Archived in project
Development

No branches or pull requests

4 participants