-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add notebook container #8
base: main
Are you sure you want to change the base?
Conversation
.gitignore
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really hate checking in gitignores as you wind up with stuff like this with mountains of cruft that doesn't apply to 99% of people in the project, and has dangerous repercussions, like ignoring lib/
and *.spec
in this case.
It's totally trivial in my experience to just maintain your own tiny gitignore file per project
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Dockerfile
Outdated
@@ -28,7 +28,7 @@ ENV PYTHON_VER=python3.11 | |||
RUN apt update && \ | |||
apt-get install -y software-properties-common && \ | |||
add-apt-repository ppa:deadsnakes/ppa && \ | |||
apt install -y $PYTHON_VER python3-pip && \ | |||
apt install -y $PYTHON_VER python3-pip gcc $PYTHON_VER-dev && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is gcc needed for? There's no way to do the build in the build image and copy the result over?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed gcc.
entrypoint.sh
Outdated
mkdir -p "$WORKSPACE_DIR" || { echo "Error: Failed to create workspace directory"; exit 1; } | ||
cd "$WORKSPACE_DIR" || { echo "Error: Failed to navigate to workspace directory"; exit 1; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need the error handling here? This should never happen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
jupyter lab --ip=0.0.0.0 \ | ||
--port=$PORT \ | ||
--no-browser \ | ||
--allow-root \ | ||
--notebook-dir="$WORKSPACE_DIR" \ | ||
--ServerApp.token='' \ | ||
--ServerApp.password='' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the notebook / pyspark know where the local spark installation lives?
The driver will have to be configured to advertise the host, port, and bindManager port, e.g.:
root@f058c872158d:/opt/spark# bin/spark-submit --master spark://10.58.1.104:7077 --conf spark.driver.bindAddress=0.0.0.0 --conf spark.driver.host=$SPARK_DRIVER_HOST --conf spark.driver.port=$SPARK_DRIVER_PORT --conf spark.blockManager.port=$SPARK_BLOCKMANAGER_PORT examples/src/main/python/pi.py 10 2>/dev/null
Pi is roughly 3.144080
The ports will need to be accessible to the workers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the command above the spark master url should be an env var as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added SPARK_MASTER_URL
under environment. I was manually configuring the spark master url before.
Do we still need this PR? As we are using |
yeah, I think this repo is redundant. You should check with what Boris is doing - he has a jupyter container running in the cdm stack, not sure if it's working. If not, you might want to consider building a notebook image on top of the bitnami image, but I'd make a new repo for that |
No description provided.