Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix provider status tracking bugs #112

Merged
merged 7 commits into from
Feb 28, 2024
Merged

Conversation

Redm4x
Copy link
Contributor

@Redm4x Redm4x commented Feb 27, 2024

  • Add timeout to grpc status calls (10s, same as rest calls)
  • Cap available resources to 0 when allocated is larger than allocatable to prevent negative numbers. This can happen due to provider overcommit.
  • Fix a discrepancy between provider capacity tiles and their corresponding graphs. This was caused by an incorrect handling of provider snapshots when status endpoint is not accessible.
  • Fetch status of recently online providers first (so that they come back online sooner after a temporary outage)

@Redm4x Redm4x marked this pull request as ready for review February 27, 2024 23:39
name: gpuInfo.name,
modelId: gpuInfo.modelId,
interface: gpuInfo.interface,
memorySize: gpuInfo.memorySize // TODO: Change type to bytes?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it supposed to be bytes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to keep it as a string I think. Only use for a numerical type would be to query GPUs with X vram "or more", but for now it's simpler like this. I removed the todo, we can always change the type in the futur if needed.

@Redm4x Redm4x merged commit 8f6ab9e into main Feb 28, 2024
5 checks passed
@Redm4x Redm4x deleted the bugfixes/fix-provider-status-tracking branch February 28, 2024 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants