-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add notebook container #8
base: main
Are you sure you want to change the base?
Changes from 6 commits
76f0cd8
0bd9076
9b5545f
6b70006
6f0dbb2
c185237
61f24f7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,170 @@ | ||
### Example user template template | ||
### Example user template | ||
|
||
# IntelliJ project files | ||
.idea | ||
*.iml | ||
out | ||
gen | ||
### Python template | ||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
|
||
# C extensions | ||
*.so | ||
|
||
# Distribution / packaging | ||
.Python | ||
build/ | ||
develop-eggs/ | ||
dist/ | ||
downloads/ | ||
eggs/ | ||
.eggs/ | ||
lib/ | ||
lib64/ | ||
parts/ | ||
sdist/ | ||
var/ | ||
wheels/ | ||
share/python-wheels/ | ||
*.egg-info/ | ||
.installed.cfg | ||
*.egg | ||
MANIFEST | ||
|
||
# PyInstaller | ||
# Usually these files are written by a python script from a template | ||
# before PyInstaller builds the exe, so as to inject date/other infos into it. | ||
*.manifest | ||
*.spec | ||
|
||
# Installer logs | ||
pip-log.txt | ||
pip-delete-this-directory.txt | ||
|
||
# Unit test / coverage reports | ||
htmlcov/ | ||
.tox/ | ||
.nox/ | ||
.coverage | ||
.coverage.* | ||
.cache | ||
nosetests.xml | ||
coverage.xml | ||
*.cover | ||
*.py,cover | ||
.hypothesis/ | ||
.pytest_cache/ | ||
cover/ | ||
|
||
# Translations | ||
*.mo | ||
*.pot | ||
|
||
# Django stuff: | ||
*.log | ||
local_settings.py | ||
db.sqlite3 | ||
db.sqlite3-journal | ||
|
||
# Flask stuff: | ||
instance/ | ||
.webassets-cache | ||
|
||
# Scrapy stuff: | ||
.scrapy | ||
|
||
# Sphinx documentation | ||
docs/_build/ | ||
|
||
# PyBuilder | ||
.pybuilder/ | ||
target/ | ||
|
||
# Jupyter Notebook | ||
.ipynb_checkpoints | ||
|
||
# IPython | ||
profile_default/ | ||
ipython_config.py | ||
|
||
# pyenv | ||
# For a library or package, you might want to ignore these files since the code is | ||
# intended to run in multiple environments; otherwise, check them in: | ||
# .python-version | ||
|
||
# pipenv | ||
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. | ||
# However, in case of collaboration, if having platform-specific dependencies or dependencies | ||
# having no cross-platform support, pipenv may install dependencies that don't work, or not | ||
# install all needed dependencies. | ||
#Pipfile.lock | ||
|
||
# poetry | ||
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. | ||
# This is especially recommended for binary packages to ensure reproducibility, and is more | ||
# commonly ignored for libraries. | ||
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control | ||
#poetry.lock | ||
|
||
# pdm | ||
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. | ||
#pdm.lock | ||
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it | ||
# in version control. | ||
# https://pdm.fming.dev/#use-with-ide | ||
.pdm.toml | ||
|
||
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm | ||
__pypackages__/ | ||
|
||
# Celery stuff | ||
celerybeat-schedule | ||
celerybeat.pid | ||
|
||
# SageMath parsed files | ||
*.sage.py | ||
|
||
# Environments | ||
.env | ||
.venv | ||
env/ | ||
venv/ | ||
ENV/ | ||
env.bak/ | ||
venv.bak/ | ||
|
||
# Spyder project settings | ||
.spyderproject | ||
.spyproject | ||
|
||
# Rope project settings | ||
.ropeproject | ||
|
||
# mkdocs documentation | ||
/site | ||
|
||
# mypy | ||
.mypy_cache/ | ||
.dmypy.json | ||
dmypy.json | ||
|
||
# Pyre type checker | ||
.pyre/ | ||
|
||
# pytype static type analyzer | ||
.pytype/ | ||
|
||
# Cython debug symbols | ||
cython_debug/ | ||
|
||
# PyCharm | ||
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can | ||
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore | ||
# and can be added to the global gitignore or merged into this file. For a more nuclear | ||
# option (not recommended) you can uncomment the following to ignore the entire idea folder. | ||
#.idea/ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,7 +28,7 @@ ENV PYTHON_VER=python3.11 | |
RUN apt update && \ | ||
apt-get install -y software-properties-common && \ | ||
add-apt-repository ppa:deadsnakes/ppa && \ | ||
apt install -y $PYTHON_VER python3-pip && \ | ||
apt install -y $PYTHON_VER python3-pip gcc $PYTHON_VER-dev && \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is gcc needed for? There's no way to do the build in the build image and copy the result over? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. removed gcc. |
||
apt install -y r-base r-base-dev && \ | ||
rm -rf /var/lib/apt/lists/* | ||
|
||
|
@@ -46,6 +46,10 @@ COPY --from=build /opt/$SPARK_VER/ /opt/spark/ | |
# this doesn't seem to actually work | ||
RUN echo "spark.pyspark.python /usr/bin/$PYTHON_VER" > /opt/spark/conf/spark-defaults.conf | ||
|
||
RUN /usr/bin/$PYTHON_VER -m pip install --upgrade pip && \ | ||
/usr/bin/$PYTHON_VER -m pip install --force-reinstall psutil==5.9.8 && \ | ||
/usr/bin/$PYTHON_VER -m pip install jupyterlab==4.2.0 pyspark==3.5.1 findspark==2.0.1 | ||
|
||
COPY entrypoint.sh /opt/ | ||
RUN chmod a+x /opt/entrypoint.sh | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -27,6 +27,18 @@ elif [ "$MODE" == "worker" ] ; then | |
/opt/spark/sbin/start-worker.sh --cores "$CORES" --memory "$MEM" "$SPARK_MASTER_URL" | ||
elif [ "$MODE" == "bash" ] ; then | ||
bash | ||
elif [ "$MODE" == "notebook" ] ; then | ||
echo "starting jupyter notebook" | ||
WORKSPACE_DIR="/cdm_shared_workspace" | ||
mkdir -p "$WORKSPACE_DIR" || { echo "Error: Failed to create workspace directory"; exit 1; } | ||
cd "$WORKSPACE_DIR" || { echo "Error: Failed to navigate to workspace directory"; exit 1; } | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we really need the error handling here? This should never happen There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. removed. |
||
jupyter lab --ip=0.0.0.0 \ | ||
--port=$PORT \ | ||
--no-browser \ | ||
--allow-root \ | ||
--notebook-dir="$WORKSPACE_DIR" \ | ||
--ServerApp.token='' \ | ||
--ServerApp.password='' | ||
Comment on lines
+35
to
+41
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How does the notebook / pyspark know where the local spark installation lives? The driver will have to be configured to advertise the host, port, and bindManager port, e.g.:
The ports will need to be accessible to the workers There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the command above the spark master url should be an env var as well There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added |
||
else | ||
echo "Unrecognized MODE env var: [$MODE]" | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really hate checking in gitignores as you wind up with stuff like this with mountains of cruft that doesn't apply to 99% of people in the project, and has dangerous repercussions, like ignoring
lib/
and*.spec
in this case.It's totally trivial in my experience to just maintain your own tiny gitignore file per project
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍