Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Goal of KSMM. #29

Closed
Carreau opened this issue Jul 14, 2021 · 5 comments
Closed

Goal of KSMM. #29

Carreau opened this issue Jul 14, 2021 · 5 comments
Assignees
Milestone

Comments

@Carreau
Copy link
Contributor

Carreau commented Jul 14, 2021

Following a quick discussion that happen yesterday at the end of the meeting, I think there needs to be clarification of the goal of KSMM, potentially via users stories. I thought some of the requirements/end goals were clear, but that does not appear to be the case.

I don't particularly like user stories, but let's see if it's clearer.

Alice and Bob work at ACME Inc.

Alice is part of research IT that deploys jupyterHub backed on a multi-tenant and multi-cloud cluster. Alice has deep understanding of resources available, and how to configure a performant kernel deployment, by setting multiple parameters across many software in her stack.

Bob is a data scientist at ACME Inc, bob is good at analysing data, and know his models and the math behind them really well, even if Bob could spend the time to understand the dozen of parameter he could tweak using Jupyter on the cluster, he may not have the time to do so, and trust Alice to provide a number of premade choices with limited configurability.

For example Bob wish to have the following kernel:

  • "Python3 - with GPU - 200Gb – 12 core - 1 node - staging"
    • with the options to switch from 12 core to 48 core, 1 nodes to 10 nodes and staging to production.
  • "R - No GPU - 20Gb – 1 core - 1 node - production"
    • with the options to switch from 200Gb to 1Tb

Bob may want to also set some env variable as part of the kernel, to specify some extra configuration options, like OMP_NUM_THREADS, but those env variable are not know to alice, and might be part of BOB code. For the sake of simplicity we are going to assume those env variable are not Secrets.

Bob want a relatively simple UI. Click on the Python 3 kernel and change parameters, possibly see dropdown only for relevant parameters. For example R is never used with GPU, so no dropdown to switch GPU.

Alice is responsible for preparing the kernel "template" for Bob, she know the same options might be in multiple places, and want it to be both readable for Bob, and flexible to be given to her software.

For example the scheduler only takes memory amounts in KBytes, so she does not want bob to have to choose an option saying "209,715,093" (kb). So she needs a way to create a kernel spec with "templates":

Template 1:

  • A Python Kernel.
  • Can have GPU or not. If GPU, pass the GPU=True env variable, otherwise pass nothing.
  • Memory can be from 100Gb to 1TB by step of 50Gb, option for the scheduler is --mem <value in kb>
  • Core can be integers from 1 to 48.
  • Number of nodes, integer from 1 to 100
  • queues are strings passed as flags to the scheduler.

Template 2:

  • A Python Kernel.
  • Never needs a GPU parameter.
  • Memory can be from 100Gb to 1TB by step of 50Gb, option for the scheduler is --mem <value in kb>
  • Core is always 1.
  • Number of nodes is always 1
  • queues are strings passed as flags to the scheduler.
  • user are not aware, but R is always ran on --tag=ssd machines.

User requirements and changes in machine configuration varies rarely, so it's ok if creating those files is time consuming, it can be automatized later.

Alice is concerned that a high numbers of options would be too confusing for user and would prefer to only have variables parameters to be shown to the user.

@Carreau
Copy link
Contributor Author

Carreau commented Jul 14, 2021

Here is how I see we can solve some of this.

Alice would write a kernelspec template that looks like so:

{
   "argv.tpl":[
      "path/to/scheduler",
      "--mem={mem.value}",
      "--cpu={cpu.value}",
      "--nodes={nodes.value}",
      "--queue",
      "{queue.value}",
      "path/to/bin/python",
      "-m",
      "ipykernel",
      "-f",
      "{connection_file}"
   ],
   "env":{
      "GPU":"{gpu.value}"
   },
   "display_name.tpl":"Python3 - {gpu.key} GPU - {mem.key} – {cpu.key} core - {node.key} node - {queue.key}",
   "language":"python",
   "parameters":{
      "mem":{
         "100Gb":104857546,
         "150Gb":157286319,
         "...":"..."
      },
      "cpu":{
         "type":"number",
         "minimum":0,
         "exclusiveMaximum":49
      },
      "GPU":{
         "with":"True",
         "without":""
      },
      "queue":{
         "staging":"amz-west-1",
         "production":"azure-whatever-unrelated"
      }
   }
}

From this it is "easy" to create a real kernelspec (potentially embedding the template in another key if we want to re-parametrise), from the "parameters" section – which structure is up for debate and can be closer to json-schema, we can create the form that sends us the value, and format all parameters in the rest of the kernelspec.

@mlucool
Copy link
Member

mlucool commented Jul 20, 2021

This seems correct, but needs a bit more story.

A few more examples off the top of my head:

  1. Bob is working with the notebook and realizes that he needs a GPU. He didn't ask for one originally, but wants to edit just one parameter
  2. Bob has a second notebook and wants to reuse a kernel he has created. This will not always be the case, so we'll have to be careful. Sometimes Bob wants it for just one experiment without having to "remember it" forever.
  3. Bob comes back to his notebook after a couple months and wants the notebook to remember exactly the same resources.
  4. Bob works on a team with Carol. He shares the notebook with Carol who wants to run it with whatever Bob thought the right settings were.

@echarles
Copy link
Contributor

echarles commented Jul 21, 2021

Thx for sharing those user stories. In terms of implementation, a few questions pop-up in my mind:

@Carreau
Copy link
Contributor Author

Carreau commented Jul 21, 2021

    • I get the idea of that, but wonder how related, or not, it is with the potential usage of JsonSchema that would help to build the user interface for the forms.

The prototype as well only deal with enum. I was offline and worked without access to json schema example/spec at the time.
There are thing like number of CPU where an enum makes little sense, and you want for example and integer from 0 to 48 (if that's your number of CPU per node), and I believe the describing resources using json schema (or a subset thereof) makes a lot of sens. Obviously not everything will make sens, but I think a large part of it will.

@ericdatakelly ericdatakelly added this to the July 2021 milestone Aug 2, 2021
@ericdatakelly ericdatakelly modified the milestones: July 2021, August 2021 Aug 12, 2021
@ericdatakelly ericdatakelly modified the milestones: August 2021, September 2021 Sep 29, 2021
@ericdatakelly
Copy link

Do we still need this issue or can we close it? If there are still pieces to implement, perhaps new issues can be opened for them.

@mlucool mlucool closed this as completed Jan 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants