Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running instances using pay-as-you-go #510

Closed
hoh opened this issue Dec 13, 2023 · 2 comments
Closed

Running instances using pay-as-you-go #510

hoh opened this issue Dec 13, 2023 · 2 comments
Labels
enhancement New feature or request
Milestone

Comments

@hoh
Copy link
Member

hoh commented Dec 13, 2023

Compute Resource Nodes must be able to run instances using the pay-as-you-go (PAYG) model.

Requirements

  1. An Instance, using the scheduler or the Payg approach, must only be scheduled where there are enough resources available.
  2. A user can create the SuperFluid streams to the node operator and the network.
  3. A CRN can do all required validation to ensure that scheduled instances match all requirements.

1. Resources available

A user must be able to select a CRN with enough resources available (compute units and storage) to run his PAYG VM.

The current scheduler plans the deployment of persistent VMs based on the resources available on a node and based on the resources reserved by other persistent VMs.

A CRN should therefore expose the resources that are reserved by VMs, both scheduled and PAYG, and the resources that are available.

CRNs currently expose system information on /about/usage/system, see an example here.

{"cpu": {"count": 12, "load_average": {"load1": 0.7177734375, "load5": 0.82958984375, "load15": 0.7412109375}, "core_frequencies": {"min": 800.0, "max": 4800.0}}, "mem": {"total_kB": 67337909, "available_kB": 65178669}, "disk": {"total_kB": 500673052, "available_kB": 470017478}, "period": {"start_timestamp": "2023-12-13T09:31:00+00:00", "duration_seconds": 60.0}, "properties": {"cpu": {"architecture": "x86_64", "vendor": "GenuineIntel"}}, "active": true}

CRNs also expose:

  1. a private API (restricted by the use of a token available only to the node operator), about the currently running VMs on about/executions.
  2. a public API with the information about VMs that stopped running on /about/executions/records, see example here.

The idea being that a malicious actor would not have life resource data about a VM he wants to attack.

💡 Suggestion: I recommend adding an API that exposes the amount of compute units and extra storage (available storage minus the max compute units possible on the system) available on the system. This must include both scheduled persistent VMs and PAYG instances.

2. Creating SuperFluid streams

Each CRN must be provided with a unique Avalanche wallet address, and must be aware of that address.

The best place to obtain this would be to store it in the aggregate that contains all node information (used by https://account.aleph.im/ ).

The CRN would then be able to fetch the information from there, or to double check it against its own configuration for double security. This address is not supposed to change frequently, and a restart of aleph-vm is acceptable if it changes since it would be a mess for currently scheduled instances anyways.

3. Stream validation

Extend the aleph.im message specification to include payment information and update PyAleph to accept instance messages with no token held when using the stream payment approach.

Question: Should PyAleph check that the streams are present to accept the instance message ?

A CRN can be notified of a new instance scheduled on it either:

  • By watching new instance messages via websocket. Such connection is already in place to update programs.
  • By receiving a specific request on a new HTTP endpoint with the item_hash of the instance.
  • By the VM scheduler.

Once notified, a CRN will:

  1. Reserve the required resources
  2. Check for the presence of the PAYG streams.
  3. Check the volume of the streams based on existing PAYG instances on the same node. If the volume is invalid or the stream is missing, the resources will be freed.
    Our discussion on SuperFluid documents how to check the presence of the flows.
  4. Start the instance
  5. Start monitoring the volume of the streams. If invalid, the node will stop the PAYG resources, starting with the most recent ones, and publish a message on the network to notify that the resource has been de-allocated.

Once notified, the VM scheduler will take into account the lower amount of resources available on the CRN for the scheduling of future holder tier persistent VMs.

@hoh hoh added the enhancement New feature or request label Dec 13, 2023
@hoh hoh added this to the Pay-as-you-go milestone Dec 13, 2023
@hoh
Copy link
Member Author

hoh commented Dec 14, 2023

  1. @hoh will create the API that exposes the available resources - peer programming tomorrow ?
  2. @hoh will start a skeleton for the recurrent check of SuperFluids and held balance from CCN , @MHHukiewitz and @1yam will complete these and provide the functions for fetching fluid information.
  3. @nesitor will work on section 3. Stream validation ( reserving the required resources, ... ). Have a look at https://docs.python.org/3/library/asyncio-sync.html .
  4. @nesitor will modify the CCN to accept instance messages for PAYG - no token held

@hoh
Copy link
Member Author

hoh commented Dec 15, 2023

@hoh hoh closed this as completed Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant