Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make pip and venv installation instructions more visible #3281

Closed
astrojuanlu opened this issue Nov 7, 2023 · 7 comments · Fixed by #3686
Closed

Make pip and venv installation instructions more visible #3281

astrojuanlu opened this issue Nov 7, 2023 · 7 comments · Fixed by #3686
Assignees
Labels
Component: Documentation 📄 Issue/PR for markdown and API documentation Issue: Feature Request New feature or improvement to existing feature

Comments

@astrojuanlu
Copy link
Member

Description

Currently we're favouring conda in our installation instructions as the "recommended approach" https://docs.kedro.org/en/stable/get_started/install.html

image

But the Python packaging ecosystem is very fragmented, and conda usage for environment isolation seems to be around 22 % +- 1 % of the overall userbase:

Closer analysis reveals that conda usage is declining or stagnating around 41 % of data scientists, with 73 % reporting that they use virtualenv, Pipenv, or Poetry. From the raw data, these are the numbers I came up with:

Year None virtualenv + Pipenv + Poetry conda Other
2020 14.9 % 72.0 % 44.6 % 35.2 %
2021 16.2 % 70.0 % 40.9 % 37.0 %
2022 16.1 % 73.1 % 40.8 % 37.1 %

Among non-data scientists, conda usage lies between 13 % and 15 %.

In addition, an informal survey in Slack surfaced what I've been suspecting for long: that users have a "global" Kedro, plus a project-specific Kedro. For the "global" one, users have mentioned a variety of solutions: conda, pyenv, pipx, rtx, and more https://linen-slack.kedro.org/t/16040768/u05bdslpj72-finally-gave-the-steps-in-https-kedro-org-slack-#db2c34ed-43bf-4479-839c-5a4fb4154a10

Possible Implementation

In summary, I think we should

@astrojuanlu astrojuanlu added Component: Documentation 📄 Issue/PR for markdown and API documentation Issue: Feature Request New feature or improvement to existing feature labels Nov 7, 2023
@stichbury
Copy link
Contributor

Let's do this. I think a new page specifically about virtual environments may be the best place to do this so it doesn't clutter the get started instructions. It means it's a separate click away but then it's all self-contained. Some decent keyword optimization and we could even start ranking if we do a good job.

@astrojuanlu
Copy link
Member Author

astrojuanlu commented Nov 7, 2023

Is there a way to tweak the actual installation instructions at https://docs.kedro.org/en/stable/get_started/install.html to have the pip install kedro on top (possibly coupled with python -m venv .venv for the actual environment creation), and then add extra info in a virtual environments page as you mention?

@stichbury
Copy link
Contributor

Yes definitely. I won't be able to do that today but if you want to make a start and put a PR together, go ahead, otherwise I'll pick it up as soon as I can get to it.

@antonymilne
Copy link
Contributor

Possible text for this that's used on vizro: 8e222fe.

@noklam noklam moved this to To Do in Kedro Framework Mar 4, 2024
@DimedS DimedS moved this from To Do to In Progress in Kedro Framework Mar 6, 2024
@DimedS
Copy link
Contributor

DimedS commented Mar 6, 2024

@astrojuanlu , during my onboarding at McKinsey, I learned that the company recommends using VSCode + conda environments, supported by analytics indicating it's a globally popular choice. I recall the same recommendation from the company course "Introduction to Software Engineering Principles for Data Scientists and Data Engineers." It appears you and @stichbury were responsible for that course. Does this mean you set this standard across the company? Should we consider revising it as well?

@astrojuanlu
Copy link
Member Author

About the internal recommendations, let's talk about those in private.

About my own preference and what was included in the ISWE4DX course (you can see more about that here kedro-org/kedro-devrel#12 and also in https://github.com/kedro-org/kedro-academy/tree/main/iswe4dx/environment-dependencies), it's rooted on the fact that sometimes installing old packages on old Python versions and non-Linux architectures is easier with conda than with pip. However, this is evolving a lot, and in fact recently we've solved the last issue that was preventing me from completely ditching conda kedro-org/kedro-plugins#402

So I still think conda is a nice tool and ecosystem, but

So, with all these in mind, it's in our best interest to align ourselves with the broader Python ecosystem.

@DimedS
Copy link
Contributor

DimedS commented Mar 6, 2024

  • Not all packages are in conda-forge, sometimes you end up having to use pip inside conda environments - and this can create lots of problems for folks that are not experts in packaging

You right, the strange part of that recommendation was: VsCode + conda env + pip, I agree that better to change conda to virtualenv.

@DimedS DimedS linked a pull request Mar 7, 2024 that will close this issue
7 tasks
@DimedS DimedS moved this from In Progress to In Review in Kedro Framework Mar 7, 2024
@github-project-automation github-project-automation bot moved this from In Review to Done in Kedro Framework Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Documentation 📄 Issue/PR for markdown and API documentation Issue: Feature Request New feature or improvement to existing feature
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants