Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable UC Merced hub users to select between R and Python focused images #3188

Closed
colliand opened this issue Sep 28, 2023 · 37 comments · Fixed by #3331
Closed

Enable UC Merced hub users to select between R and Python focused images #3188

colliand opened this issue Sep 28, 2023 · 37 comments · Fixed by #3331
Assignees

Comments

@colliand
Copy link
Contributor

Context

UC Merced is working through a pilot with a 2i2c operated education hub service. The pilot involves courses that use R and Python. The software image that 2i2c uses to delivery R and Python is bloated and difficult to keep up to date. The situation is reminiscent of prior experience with the University of Toronto. Merced may wish to move over to multi-hub service after the pilot but wants to complete the current engagement using a single hub. The 2i2c team discussed some scenarios and shared recommendations with Merced.

Proposal

Change from the current single R + Python image to offer an image selector after login with a menu offering Merced hub users to choose between two images: an up-to-date Python-focused image and an up-to-date R-focused image.

Merced personnel have been notified that the menu is likely to create some confusion among hub users. Some students intending to work on Python related work will mistakenly select the R image, etc. Merced is developing communications and local support plans to address these risks. Merced requests that 2i2c provide a precise data and time when this change will be deployed. Merced asked if this change could be done during the week of October 2 or October 9. @colliand indicated he'd respond with a precise date pending a capacity review by 2i2c's Engineering Team. FYI @damianavila.

Updates and actions

No response

@yuvipanda
Copy link
Member

Let's use the jupyter/scipy-notebook image for python, and perhaps the rocker/binder image for R. However, neither of these actually have nbgitpuller in them, which can be problematic if they are using it. I had opened jupyter/docker-stacks#2000 to put nbgitpuller in jupyter docker-stacks.

There's no staging hub for UC merced - I think perhaps we should change that too #3150 so we can deploy things there for them to test. I think that's the easiest way to do this.

@yuvipanda
Copy link
Member

Here's what I think is the easiest, least stressful thing for engineers to do:

  1. Deploy a uc-merced staging hub (Add a staging hub for UCMerced #3150)
  2. Set it up to have new images
  3. Have UC Merced folks to test it
  4. Once they are satisfied, move that to the prod hub

Given the primary reason this is happening is because we're retiring the image we maintain that has both R and python, I think this is the easiest way to do.

@yuvipanda
Copy link
Member

I'm happy to provide review but would like someone else to be assigned to this :)

@colliand
Copy link
Contributor Author

colliand commented Oct 2, 2023

Thanks @yuvipanda! FYI @damianavila I need to reply to Sarvani with a date when the service will change to include the R + Python images drop-down menu. She asked me via text today.

@damianavila
Copy link
Contributor

@GeorgianaElena is going off the support triage cycle on Tuesday, so I assigned her this one to be worked on during the Wed-Fri period of this week. I also count on @yuvipanda to provide Georgina with any help she might need.

@damianavila damianavila moved this to Todo 👍 in Sprint Board Oct 2, 2023
@colliand
Copy link
Contributor Author

colliand commented Oct 3, 2023

Thanks @damianavila. Can engineering forecast a date when the change will be deployed? Or will there be a progress milestone prior to 2i2c's capacity to forecast the deployment? Merced needs a date to plan proactive messaging for their user base.

@GeorgianaElena
Copy link
Member

@colliand, since I will be away half of Thursday and Friday, I will do my best that until I then, the following milestones, based on Yuvi's feedback, will be ready for them to test:

  • a new uc merced staging hub will have been deployed
  • this staging hub will offer users the option to either start a Python server with the jupyter/scipy-notebook image and and a the rocker/binder image for a R one

Does this sound ok?

@colliand
Copy link
Contributor Author

colliand commented Oct 3, 2023

Thanks @GeorgianaElena. I add @schadalapaka to the thread so that she can follow along with our progress. I expect she and her colleagues will like the plan to launch a staging hub and use it for testing prior to deploying big changes to production. Sarvani, let's aim to complete the testing of the staging hub on October 9 (more ambitiously on October 6). If that all looks good, let's pencil in the plan to deploy to production on October 12. It's never a good idea to deploy big changes on Friday!

@schadalapaka
Copy link

Hi All -
Thank you for the notes.
I have reached out to our instructors to find out their availability to test this implementation on staging hub from Oct 9-11.
I will share updates as I have them.

Regards
Sarvani

@schadalapaka
Copy link

Hi All -

Oct 9- 11th works for us to test. However, we request you move implementation after 9:30 am PT on the 13th to avoid conflicting with a course final exam.
In addition, when it is ready, please share instructions on how we can access the staging hub.

Hope this helps, please do not hesitate to let me know if you have any additional questions at this time.

Regards,
Sarvani

@damianavila
Copy link
Contributor

Oct 9- 11th works for us to test. However, we request you move implementation after 9:30 am PT on the 13th to avoid conflicting with a course final exam.

Given this additional context, I would suggest extending the test period from Oct 9 - Oct 13, so the instructors have the whole week to test it and then we deploy it into production on Oct 16th.

@schadalapaka
Copy link

I will check with our instructors.
Also, question: How long do you estimate the deployment takes?

@GeorgianaElena
Copy link
Member

@schadalapaka, there is now a staging hub running at https://staging.ucmerced.2i2c.cloud/hub/spawn, which allows the option to choose between a Python or a R user environment.

Choosing the Python image, will launch the users into JupyterLab, whereas choosing the R image will redirect them to RStudio after the server is started. Does this sound ok @schadalapaka, or would you like to use JupyterLab as the default redirect for both options, and then let the users choose to launch RStudio from there, like it's happening right now?

@schadalapaka
Copy link

Hi All-

  1. Let’s stick to the original schedule for testing on staging hub from Oct 9-12 and making changes on deployment server on the 13th after 9:30 am.
  2. I think a jupyterlab with a default redirect for both options, and letting users choose between Python or R image would be good.

@GeorgianaElena
Copy link
Member

I think a jupyterlab with a default redirect for both options, and letting users choose between Python or R image would be good.

Thanks @schadalapaka. Just note that I believe having the users that want to use the R profile be redirected to JupyterLab instead of RStudio might cause some confusion. This is because from the Lab interface you can choose to start RStudio, but you can also choose to start a Python notebook. Working in a Python notebook from the R server is possible, but there will be less packages available for the users and they might end up confused about this.

Btw, this is how the workflow of choosing between the two profiles on the https://staging.ucmerced.2i2c.cloud hub looks like.

Screen.Recording.2023-10-05.at.10.05.02.mov

@schadalapaka
Copy link

schadalapaka commented Oct 6, 2023

  1. I will inform our users about a potential confusion spots.
  2. Thank you for the screen grab. I will share it with our users.
  3. Also, Is the staging hub ready for our users to start testing?

@damianavila damianavila moved this from Todo 👍 to In Progress ⚡ in Sprint Board Oct 6, 2023
@yuvipanda
Copy link
Member

@schadalapaka yes the staging hub is ready for your users to start testing, using the link from #3188 (comment)

@schadalapaka
Copy link

Hi All -
From our users:
Testing out the image and one big problem: notebooks do not save as .pdf files. This is beyond the usual problem that files with documentation don't save (I think someone hasn't added the correct font), they don't save at all. I've attached the error screen.
image

Please let me know if this is an expected behavior.

Regards,
Sarvani

@yuvipanda
Copy link
Member

@schadalapaka Can you share (privately to support@2i2c.org if necessary) the notebook that caused this? I just tried a simple notebook (just print("Hello world")) and it was able to export to PDF just fine.

@schadalapaka
Copy link

schadalapaka commented Oct 13, 2023

Hi All -
Thank you for the notes. Apologies for the delayed response. I just got back from work travel.

Yes, I think starting RStudio by default would reduce the confusion.

I think the next best step is for Yuvi to make the changes in the staging hub.
We can then have our users test it for a few more days before deciding on a new deployment date.
I'm worried that making this change on the production server tomorrow won't give our users enough time to test it properly.

Please let me know if this works and/or if there is any concerns with this approach.
Do let me know if there is any additional information I might provide at this time.

@GeorgianaElena
Copy link
Member

@schadalapaka, allowing for the users to test this more thoroughly and accommodate to the changes makes sense.

I've updated the staging hub to start in RStudio by default when choosing the R image.

Take the time to see how this works and please let us know when you know the new deployment to production date so we can plan around it.

@schadalapaka
Copy link

schadalapaka commented Oct 16, 2023

Hi All -
Our researchers have said that this Friday(10/20) after 9:30 am PT would work best for them. Please let me know if that works for you all to deploy these changes.

@schadalapaka
Copy link

Another note: At this time, there doesn't seem to be an easy way for users to switch between python and R images easily. Once we choose an image, we have to break the urls to be able to stop the server and only then will we see an option to switch to another image or select between the images.
Would it be possible to fix this?

@schadalapaka
Copy link

schadalapaka commented Oct 16, 2023

Notes from our staging hub testers:

  • My notebooks, most likely because they're .ipynb files, appear to load as code rather than R scripts
  • The notebooks can load into RStudio but the original R code is buried under a mess of formatting. Is this what's going to happen to all our notebooks when we switch over?
  • My notebooks, most likely because they're .ipynb files, appear to load as code rather than R scripts

image

  • Going back to the pilot hub, attempting to open an .ipynb notebook in RStudio gets the "File is binary rather than text so cannot be opened by the source editor" error. I am worried that if we go over to this image we will lose access to all our previous Jupyter notebooks.

@yuvipanda
Copy link
Member

yuvipanda commented Oct 17, 2023

@schadalapaka let me summarize current state of issues:

  1. Opening .ipynb files in RStudio doesn't really work. This is expected, as RStudio uses a different notebook format (.Rmd) not .ipynb files. Our earlier assumption was that most of the R users were using RStudio (and hence .Rmd files or .R files), not JupyterLab with .ipynb files. Is this inaccurate? Do your R users want to use both RStudio and JupyterLab for R? Or perhaps they don't want RStudio at all? So the question here is: "Do R users want JupyterLab with R as well as RStudio with R? Or only JupyterLab with R? Or only RStudio with R?"

  2. Once we choose an image, we have to break the urls to be able to stop the server and only then will we see an option to switch to another image or select between the images.

    Unfortunately there really isn't. This is why we recommend separate hubs for R and python for most cases. This is the primary confusion that users will run into. We can allow users to have multiple servers running at the same time, but in our experience, especially for educational use cases, this usually causes more confusion, not less. Multiple hubs at different URLs is often the way to go, and where I hope UC Merced eventually goes. My understanding is that this image selection is a temporary fix, as the contract probably needs to be different for multiple hubs.

  3. Going back to the pilot hub, attempting to open an .ipynb notebook in RStudio gets the "File is binary rather than text so cannot be opened by the source editor" error. I am worried that if we go over to this image we will lose access to all our previous Jupyter notebooks.

    This is because the pilot hub's RStudio version is really old. And also (1), where .ipynb files can not be opened by RStudio. Users won't lose access to any data! We can also continue to keep the old image as an option as well, although I'm worried that will lead to more confusion (see (2))

What do you think about a 30min call tomorrow Oct 18 9AM pacific (or earlier, if possible - I'm currently traveling in India, and @GeorgianaElena is in the EU) to clear things up?

@colliand
Copy link
Contributor Author

I can also be available for a call with @schadalapaka. This experience of R versus Python user confusion was anticipated and is a main reason why the multi-hub approach is better than image select drop-down for education scenarios. Thanks @yuvipanda for your generosity and clarity.

yuvipanda added a commit to yuvipanda/rocker-versioned2 that referenced this issue Oct 18, 2023
Per jupyter/nbconvert#1328, these
are the packages needed for Jupyter to be able to convert
to PDF. Without it, you get unfriendly errors like [this]
(2i2c-org/infrastructure#3188 (comment)).

Given that the binder image includes jupyter, and texlive is included
in the base, I hope it would be reasonable for the binder image
to include enough texlive packages for PDF conversion to work.
Otherwise, it works in RStudio but not in Jupyter.
yuvipanda added a commit to yuvipanda/rocker-versioned2 that referenced this issue Oct 18, 2023
Per jupyter/nbconvert#1328, these
are the packages needed for Jupyter to be able to convert
to PDF. Without it, you get unfriendly errors like [this]
(2i2c-org/infrastructure#3188 (comment)).

Given that the binder image includes jupyter, and texlive is included
in the base, I hope it would be reasonable for the binder image
to include enough texlive packages for PDF conversion to work.
Otherwise, it works in RStudio but not in Jupyter.
@yuvipanda
Copy link
Member

yuvipanda commented Oct 18, 2023

Just heard from @colliand on slack that nobody at UC Merced is using RStudio! So earlier decision to move default to RStudio was the wrong call. Instead, we should move the default back to JupyterLab, and fix the PDF generation.

@colliand
Copy link
Contributor Author

Default back to RStudio? I'm confused by what @yuvipanda wrote above.

yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Oct 18, 2023
Turns out they're using R with JupyterLab, and so this
is far less confusing.

Ref 2i2c-org#3188
@yuvipanda
Copy link
Member

@colliand sorry, I menat default back to JupyterLab. Edited to fix.

@yuvipanda
Copy link
Member

#3290 moves the hub back to using JupyterLab as the default interface, and #3289 tracks fixing PDF generation in jupyterlab in the R image. I've opened rocker-org/rocker-versioned2#714 upstream to fix that.

So to recap:

  1. The default interface for both the Python and R image will be JupyterLab
  2. We'll work on fixing the PDF generation in JupyterLab in the R image before rollout. Investigation is over and upstream PR has already been sent.
  3. Users wanting to use RStudio specifically can still launch it explicitly if needed.

yuvipanda added a commit to yuvipanda/rocker-versioned2 that referenced this issue Oct 18, 2023
Per jupyter/nbconvert#1328, these
are the packages needed for Jupyter to be able to convert
to PDF. Without it, you get unfriendly errors like [this]
(2i2c-org/infrastructure#3188 (comment)).

Given that the binder image includes jupyter, and texlive is included
in the base, I hope it would be reasonable for the binder image
to include enough texlive packages for PDF conversion to work.
Otherwise, it works in RStudio but not in Jupyter.
@yuvipanda
Copy link
Member

@schadalapaka ok, so everything except PDF conversion in R is ready for testing again. That should be hopefully sorted in a day or two.

@colliand
Copy link
Contributor Author

Thanks Yuvi for highlighting item 3 above: Users wanting to use RStudio specifically can still launch it explicitly if needed.

eitsupi added a commit to rocker-org/rocker-versioned2 that referenced this issue Oct 20, 2023
Per jupyter/nbconvert#1328, these are the
packages needed for Jupyter to be able to convert to PDF. Without it,
you get unfriendly errors like [this]
(2i2c-org/infrastructure#3188 (comment)).

Given that the binder image includes jupyter, and texlive is included in
the base, I hope it would be reasonable for the binder image to include
enough texlive packages for PDF conversion to work. Otherwise, it works
in RStudio but not in Jupyter.

---------

Co-authored-by: eitsupi <50911393+eitsupi@users.noreply.github.com>
@yuvipanda
Copy link
Member

@schadalapaka @colliand we worked with upstream (rocker-org/rocker-versioned2#714) and now PDF generation works fine from inside Jupyter as well! So from our perspective the staging hub is now good to go - please keep us posted on when we can flip the switch.

@colliand
Copy link
Contributor Author

I love that feedback from Merced to 2i2c identified an upstream bug and that 2i2c has deployed upstream changes to fix the big! Thanks Merced and 2i2c engineering for making the open source ecosystem better.

@schadalapaka
Copy link

schadalapaka commented Oct 25, 2023

Hi Everyone -

All-Clear for 2i2c to deploy changes to production after Friday 9:30 am.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done 🎉
7 participants