Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganise the documentation structure for Kedro / Databricks integration #2436

Closed
jmholzer opened this issue Mar 20, 2023 · 7 comments · Fixed by #2442
Closed

Reorganise the documentation structure for Kedro / Databricks integration #2436

jmholzer opened this issue Mar 20, 2023 · 7 comments · Fixed by #2442
Assignees
Labels
Component: Documentation 📄 Issue/PR for markdown and API documentation

Comments

@jmholzer
Copy link
Contributor

Description

We should reorganise the documentation on Databricks. Currently, it is all under the subheading Deployment to a Databricks cluster under the heading Deployment in our index. There are two problems with this:

  1. After our updates to the documentation (see below), we will have sections on development as well as deployment
  2. Having all our documentation in one file severely limits our ability to make specific sections more visible. Ideally, parts of the new and old documentation should be split into their own subheadings.

Context

We have a lot of new documentation regarding Kedro on Databricks arriving with the tickets #2283 and #2284. Reorganising the documentation is also an prerequisite for #2285. Rather than doing the reorganisation in one of these tickets, we should track it separately for better visibility.

Possible Implementation

I think we should make a new top-level heading in our index, as we have currently done for PySpark Integration. My preferred organisation would look something like this (all titles are subject to change):

Databricks Integration
├── Development workflows
│   ├── Using Databricks Workspace
│   ├── Using an IDE and Databricks Workspace
│   ├── IDE-only development
├── Deployment
│   ├── CI / CD on Azure Databricks
├── Visualising a Kedro Project on Databricks

What do you think @stichbury?

@jmholzer jmholzer added the Component: Documentation 📄 Issue/PR for markdown and API documentation label Mar 20, 2023
@jmholzer jmholzer moved this to To Do in Kedro Framework Mar 20, 2023
@stichbury
Copy link
Contributor

@jmholzer I like this. I think it makes a lot of sense. I'd be tempted to shift the Databrix Integration section (and the one for Pyspark) up the table of contents to make it more visible, maybe above "Deployment"?

I am also wondering if we need a more generic section which we then break down to "PySpark" and "Databricks", like this:

| Kedro in your workflow
|   ├── Databricks
│      ├── Using Databricks Workspace
│      ├── Using an IDE and Databricks Workspace
│      ├── IDE-only development
|   ├── Deployment to Databricks
│      ├── CI / CD on Azure Databricks
|   ├── Visualising a Kedro Project on Databricks
|   ├── PySpark
│      ├── PySpark section 1
│      ├── PySpark section 2

I think this will come down to whether we expect there to be more content about either topic, and additional integration platforms to include.

It's also somewhat dependent on information architecture changes and toolchain changes. So rather than block on too much "if" right now, I suggest we go ahead with your plan and anticipate that there may be some further changes at some point, but let's get something that works better for us right now. We will also need to think about:

  • redirects
  • internal linking

@jmholzer jmholzer self-assigned this Mar 20, 2023
@jmholzer
Copy link
Contributor Author

Excellent, thanks for the feedback @stichbury ⭐️

I agree with you, a new 'Integration' section may actually make a lot of sense. I'll add my proposed changes in a PR for now.

@stichbury
Copy link
Contributor

stichbury commented Mar 20, 2023

Perfect, thanks!

@jmholzer Don't forget to record which pages you move and where they'll redirect to, either here or in the PR, then when you merge the code and we make a release, we can set up a redirect in the RTD console. I do usually check before each release what has happened in index.rst to make sure we have covered every move, but it's handy to have a list in the ticket that moved things, so it's easy to see where they went. Thanks!

@yetudada
Copy link
Contributor

I'd love to make one suggestion. Could this section:

| Kedro in your workflow
|   ├── Databricks
│      ├── Using Databricks Workspace

Be:

| Kedro in your workflow
|   ├── Databricks
│      ├── Using Databricks Workspace and Notebooks

@stichbury
Copy link
Contributor

Absolutely, and I think we will revise the titles still (as @jmholzer says, they're still TBD) since they are gerunds and you know I'm not a huge fan. I'm not exactly sure but maybe just Integration with Databricks Workspace and Notebooks (capitalisation TBD).

@jmholzer
Copy link
Contributor Author

@yetudada Good idea, I like the inclusion of Notebooks in the title, I think it will help our users to choose the right workflow for them. I'll do the same for the hybrid workflow.

Also, I'm tracking the creation of a meta-guide for our users to choose the right workflow under this ticket.

@stichbury
Copy link
Contributor

@jmholzer I think we're going to have to do some more sleight-of-hand with the index.rst to get the structure you're proposing...it's part of the reason the build isn't working right now. I'm working on your branch and will make a few changes to the table of contents file.

@jmholzer jmholzer moved this from In Progress to In Review in Kedro Framework Mar 22, 2023
@github-project-automation github-project-automation bot moved this from In Review to Done in Kedro Framework Mar 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Documentation 📄 Issue/PR for markdown and API documentation
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants