Title: Developing Interactive Jupyter Notebooks to run on the SDSC HPC Expanse System and the “AI institute for Intelligent Cyberinfrastructure with Computational Learning in the Environment” (ICICLE) project.
Part of the 2022 Research Experience for High School Students Program: https://education.sdsc.edu/studenttech/rehs/
-
Project Lead: Mary Thomas, Ph.D., SDSC HPC Training lead, and Computational Data Scientist in the Data-Enabled Scientific Computing Division.
-
REHS Students:
- Sahil Samar, Del Norte High School, San Diego, CA, sahilsamar031@gmail.com
- Mia Chen, Westview High School, San Diego, CA, mialunachen@gmail.com
- Jack Karpinski, San Diego High School, San Diego, CA, USA, jackadoo4@gmail.com
- Michael Ray, JSerra Catholic High School, San Juan Capistrano, CA, michael.ray@jserra.org
- Archita Sarin, Mission San Jose High School, Fremont, CA, archita.sarin@gmail.com
-
Collaborators/Mentors:
- Christian Garcia, Engineering Scientist Associate (Texas Advanced Computing Center [5]).
- Matthew Lange, Ph.D., CEO, International Center for Food Ontology Operability Data and Semantics (IC-FOODS [4]);
- Joe Stubbs, Ph.D., Manager, Cloud & Interactive Computing (Texas Advanced Computing Center [5]).
This project involves running Jupyter Notebooks on the NSF funded Expanse high-performance (HPC) system [1] and testing software to be used on the NSF funded ICICLE AI project [3]. Expanse is SDSC's newest supercomputer. The result of a $10M National Science Foundation (NSF) award, Expanse delivers over 5,2 peak petaFlOps of computing power to scientists, engineers, and researchers all around the world [2]. Expanse provides three kinds of HPC/CI resources: General Computing Nodes, NVIDIA GPU Nodes, and the petascale Luster filesystem. Thousands of users have accessed these high-performance computing (HPC) resources via traditional runs from the command line and using batch queuing systems. The National Science Foundation funded AI institute for Intelligent Cyberinfrastructure with Computational Learning in the Environment (ICICLE) [3] will build the next generation of Cyberinfrastructure to render Artificial Intelligence (AI) more accessible to everyone and drive its further democratization in the larger society. ICICLE will develop intelligent cyberinfrastructure with transparent and high-performance execution on diverse and heterogeneous environments. It will advance plug-and-play AI that is easy to use by scientists across a wide range of domains, promoting the democratization of AI. Scientists using HPC Systems working with interactive HPC tools such as Jupyter notebooks to implement computational and data analysis functions and workflows [5]. Jupyter notebooks are web applications that allow you to create and share documents that contain live code, equations, visualizations and narrative text. These notebooks part of a general trend in research computing away from command-line style interfaces and towards browser-based and graphical interfaces. Jupyter notebooks are especially useful for interactivity: the development, testing, and exploration of data sets or as an instructional resource [6]. Users working interactively expect a timely response, both for initial application startup and during the course of a session. The goals of this research project will be to:
- Test and develop Jupyter Notebooks that run on Expanse
- Extend these notebooks to run ICICLE-relevant test code.
The students will learn the basics of parallel computing, learn about Jupyter notebooks, and the basics of AI. The research components will include:
- contribute to the body of knowledge needed for hosting live, dynamic, interactive services that interface to HPC systems
- to develop interactive notebooks that will run on the ICICLE system.
Prior to beginning the REHS program, the selected student team members will be provided with recommended programming exercises to help build the skills they will need to learn in order to successfully complete this project. Dr. Thomas and other mentors will be available via email to provide guidance to the students on how to approach these exercises. During the first week of the REHS program, the student team will then work closely with Dr. Thomas and other mentors to build a research plan that clearly defines the milestones of the project in order to meet its goals. In addition, the students will have the opportunity to interact with other REHS students and undergraduate or graduate interns that will be working on similar projects.
- Applicants must have a demonstrated interest in computer science and mathematics.
- Have some previous experience in: Jupyter Notebooks; some exposure to Artificial Intelligence (AI) methods;
- Experience programming in Python (preferred);
- Exposure to the Linux/Unix operating system.
- REHS 2022 Report
- August, 2022 Presentations
- Poster, October 2022
- https://www.sdsc.edu/News%20Items/PR20190716_Expanse.html
- https://www.sdsc.edu/services/hpc/expanse/
- The Intelligent Cyberinfrastructure with Computational Learning in the Environment (ICICLE) Project: https://icicle.osu.edu/
- http://www.ic-foods.org/
- https://www.tacc.utexas.edu/home
- The Jupyter Notebook Project Website, https://jupyter.org/
- Zonca, A. and R.S. Sinkovits, Deploying Jupyter Notebooks at scale on XSEDE for Science Gateways and workshops. Available at: https://zonca.github.io/docs/pearc18_slides_zonca_sinkovits.pdf