Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add notebook container #8

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

add notebook container #8

wants to merge 7 commits into from

Conversation

Tianhao-Gu
Copy link
Collaborator

No description provided.

@Tianhao-Gu Tianhao-Gu changed the title Dev notebook add notebook container May 15, 2024
.gitignore Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really hate checking in gitignores as you wind up with stuff like this with mountains of cruft that doesn't apply to 99% of people in the project, and has dangerous repercussions, like ignoring lib/ and *.spec in this case.

It's totally trivial in my experience to just maintain your own tiny gitignore file per project

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Dockerfile Outdated
@@ -28,7 +28,7 @@ ENV PYTHON_VER=python3.11
RUN apt update && \
apt-get install -y software-properties-common && \
add-apt-repository ppa:deadsnakes/ppa && \
apt install -y $PYTHON_VER python3-pip && \
apt install -y $PYTHON_VER python3-pip gcc $PYTHON_VER-dev && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is gcc needed for? There's no way to do the build in the build image and copy the result over?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed gcc.

entrypoint.sh Outdated
Comment on lines 33 to 34
mkdir -p "$WORKSPACE_DIR" || { echo "Error: Failed to create workspace directory"; exit 1; }
cd "$WORKSPACE_DIR" || { echo "Error: Failed to navigate to workspace directory"; exit 1; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need the error handling here? This should never happen

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

Comment on lines +35 to +41
jupyter lab --ip=0.0.0.0 \
--port=$PORT \
--no-browser \
--allow-root \
--notebook-dir="$WORKSPACE_DIR" \
--ServerApp.token='' \
--ServerApp.password=''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the notebook / pyspark know where the local spark installation lives?

The driver will have to be configured to advertise the host, port, and bindManager port, e.g.:

root@f058c872158d:/opt/spark# bin/spark-submit --master spark://10.58.1.104:7077 --conf spark.driver.bindAddress=0.0.0.0 --conf spark.driver.host=$SPARK_DRIVER_HOST --conf spark.driver.port=$SPARK_DRIVER_PORT --conf spark.blockManager.port=$SPARK_BLOCKMANAGER_PORT examples/src/main/python/pi.py 10 2>/dev/null
Pi is roughly 3.144080

The ports will need to be accessible to the workers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the command above the spark master url should be an env var as well

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added SPARK_MASTER_URL under environment. I was manually configuring the spark master url before.

@Tianhao-Gu
Copy link
Collaborator Author

Do we still need this PR? As we are using bitnami/spark now?

@MrCreosote
Copy link
Member

yeah, I think this repo is redundant. You should check with what Boris is doing - he has a jupyter container running in the cdm stack, not sure if it's working. If not, you might want to consider building a notebook image on top of the bitnami image, but I'd make a new repo for that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants