Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the A10G CI due to permission issues #2123

Closed
wants to merge 10 commits into from
Closed

Fix the A10G CI due to permission issues #2123

wants to merge 10 commits into from

Conversation

xuzhao9
Copy link
Contributor

@xuzhao9 xuzhao9 commented Jan 20, 2024

After the version upgrade, the runner UID changed from 1000 to 1001, so we need to setup the directory permission correctly to fix it on A10G CI.

@xuzhao9 xuzhao9 temporarily deployed to docker-s3-upload January 22, 2024 04:57 — with GitHub Actions Inactive
@xuzhao9 xuzhao9 changed the title Fix BERT_pytorch install on A10G CI Remove the A10G CI due to limited disk space Jan 22, 2024
@xuzhao9 xuzhao9 changed the title Remove the A10G CI due to limited disk space Fix the A10G CI due to permission issues Jan 22, 2024
@xuzhao9 xuzhao9 requested a review from atalman January 22, 2024 20:53
@facebook-github-bot
Copy link
Contributor

@xuzhao9 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@xuzhao9 xuzhao9 temporarily deployed to docker-s3-upload January 22, 2024 21:01 — with GitHub Actions Inactive
@xuzhao9 xuzhao9 temporarily deployed to docker-s3-upload January 22, 2024 21:01 — with GitHub Actions Inactive
@facebook-github-bot
Copy link
Contributor

@xuzhao9 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Member

@aaronenyeshi aaronenyeshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@facebook-github-bot
Copy link
Contributor

@xuzhao9 merged this pull request in dbc3201.

@xuzhao9 xuzhao9 deleted the xz9/fix-pr-ci branch January 23, 2024 03:44
@cota
Copy link

cota commented Jan 23, 2024

Hi @xuzhao9,
Just noticed on my nightly runs that the added --user flag to pip install here breaks when running inside a virtual environment. (This is a default python virtual env, i.e. a --no-site-packages env). A possible workaround would be to make the environment a --no-site-packages one, but that IMO misses the point of a full virtual environment (I create a new environment from scratch for every nightly run).
Is --user really needed here? If yes, can you suggest any other workaround? FWIW, I don't see any other package in torchbench installing with --user. Thanks.

@xuzhao9
Copy link
Contributor Author

xuzhao9 commented Jan 23, 2024

@cota
We changed from python setup.py develop to pip install --user -e . because setup.py is being deprecated: https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html
I can create a PR that removes --user and see if it passes the CI (I think it will). If it does, we can remove this.

Can you take a look at #2130 ?

facebook-github-bot pushed a commit that referenced this pull request Jan 25, 2024
Summary:
Addresses user comment on #2123

Pull Request resolved: #2130

Reviewed By: aaronenyeshi

Differential Revision: D53008231

Pulled By: xuzhao9

fbshipit-source-id: 10b6bfd92109ff7922ee9ff3a682587a8a09d956
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants