Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shuffle hardware inventory for tinkerbell before reservation #8264

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rahulbabu95
Copy link
Member

Description of changes:
Shuffle hardware inventory before reserving hardware for Tinkerbell E2E tests. As we run quick e2e more frequently the boot entries on the boot list get populated quickly leading to an error when there's no space left to add to that boot list. Ideally we should have an automation around removing the boot entries periodically on the BMCs but until then we should try reserving the hardware in random order for quick test to not burden the boot entries on the first few hardware. Also, with randomness we reduce the likelihood of picking up an erroneous hardware in case during repetitive quick E2E runs.

Testing (if applicable):
Kicked of run against my branch and verified that the hardware reserved for the test were different from the regular hardware (eksa-ci01 to eksa-ci12) that gets reserved at present.

Documentation added/planned (if applicable):

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Signed-off-by: Rahul Ganesh <rahulgab@amazon.com>
@eks-distro-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from rahulbabu95. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@eks-distro-bot eks-distro-bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jun 8, 2024
Copy link

codecov bot commented Jun 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.42%. Comparing base (d485120) to head (0ef7112).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8264   +/-   ##
=======================================
  Coverage   73.42%   73.42%           
=======================================
  Files         578      578           
  Lines       36054    36054           
=======================================
  Hits        26471    26471           
  Misses       7905     7905           
  Partials     1678     1678           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@g-gaston g-gaston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm, only nit comments

Have we planned already the work to clean the boot entries? I'm totally ok merging this, it's a good patch, but it doesn't guarantee the problem won't happen again. In fact if I'm understanding this correctly, it will 100% happen, it will just take longer. And it doesn't seem like an easy issue to diagnose.

@@ -592,3 +596,10 @@ func logTinkerbellTestHardwareInfo(conf *instanceRunConf, action string) {
}
conf.Logger.V(1).Info(action+" hardware for TestRunner", "hardwarePool", strings.Join(hardwareInfo, ", "))
}

func shuffleHardwareInventory(invCatalogue *hardwareCatalogue) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why the use of inventory and catalogue? aren't they representing the same thing?

@@ -217,6 +218,9 @@ func RunTests(conf instanceRunConf, inventoryCatalogue map[string]*hardwareCatal
} else {
hardwareCatalogue = inventoryCatalogue[nonAirgappedHardware]
}
conf.Logger.Info("Shuffling hardware inventory for tinkerbell")
// shuffle hardware to introduce randomness during hardware reservation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would expand more on why randomness is desired. We don't do this to introduce randomness, we do this to avoid picking up the same machines on every run. Randomness is just the mechanism to achieve that goal.

@@ -592,3 +596,10 @@ func logTinkerbellTestHardwareInfo(conf *instanceRunConf, action string) {
}
conf.Logger.V(1).Info(action+" hardware for TestRunner", "hardwarePool", strings.Join(hardwareInfo, ", "))
}

func shuffleHardwareInventory(invCatalogue *hardwareCatalogue) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make this a method in hardwareCatalogue? It's manipulating the internal extructure, it seems like a good idea to abstract that in a method instead of exposing it like this.

I fact, don't you need to use the mutex? If I'm not mistaken the hardwareCatalogue is shared between runner threads and all of them are going to try to call this method concurrently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants