-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TFA-FIX:CEPH-83595932-To verify crashes while executing drain and mgr failover commands #4230
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: SrinivasaBharath The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
35175b4
to
51b25da
Compare
status_flag = False | ||
end_time = datetime.datetime.now() + datetime.timedelta(seconds=600) | ||
while end_time > datetime.datetime.now(): | ||
out, err = self.cephadm.shell([status_cmd]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for executing commands with self.cephadm.shell()
and then performing operations on data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add reason for self.cephadm.shell()
and then performing operations on data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The information is added.
f"OSD remove operation is in progress {osd_id}\nOperations: {entry}" | ||
) | ||
except json.JSONDecodeError: | ||
log.info(f"The OSD removal is completed on OSD : {osd_id}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could either mean that OSD removal is complete, or No OSD removal was started in the 1st place.
@@ -135,6 +135,12 @@ def run(ceph_cluster, **kw): | |||
log.info( | |||
f"The OSDs in the drain node before starting the test - {osd_count_before_test} " | |||
) | |||
cmd_set_unmanaged = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is expecting that the OSD spec on the cluster is with all-available-devices.
Can we write a generic method to export the spec from cluster, add a new unmanaged=True
key and apply the spec?
Something similar to set_mon_service_managed_type()
present in monitor workflows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New method is created and included.
@@ -152,6 +158,12 @@ def run(ceph_cluster, **kw): | |||
"The traceback messages are noticed in logs.The error snippets are noticed in the MGR logs" | |||
) | |||
return 1 | |||
cmd_unset_unmanaged = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above. let's use the new method that will be created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used new method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good.
New method to set services to managed and unmanaged to true/ false should be created and used here.
"This pull request now has conflicts with the target branch. Could you please resolve conflicts and force push the corrected changes?" |
4b5e8fa
to
ffb1680
Compare
… failover commands and preempt scrub fix Signed-off-by: Srinivasa Bharath Kanta <skanta@redhat.com>
dee438f
to
01241e3
Compare
Preempt scrub fix pass log- http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-HWPGNS |
@@ -45,7 +45,6 @@ def set_osd_devices_unmanaged(ceph_cluster, osd_id, unmanaged): | |||
break | |||
|
|||
if not service_name: | |||
log.error(f"No orch service found for osd: {osd_id}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SrinivasaBharath please explain why we are making this change?
log.debug( | ||
f"Setting the {service_type} service as unmanaged by cephadm. current status : {out}" | ||
) | ||
out["unmanaged"] = "false" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here instead of explicitly setting it to False, we should remove the "unmanaged" key from the dictionary if it exists
|
||
time.sleep(10) | ||
# Checking for the unmanaged setting on service | ||
cmd = "ceph orch ls" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please replace the command with "ceph orch ls {service_type}" to avoid the for loop traversal
[ceph: root@ceph-hakumar-ryth74-node1-installer /]# ceph orch ls mon -f json-pretty
[
{
"placement": {
"label": "mon"
},
"service_name": "mon",
"service_type": "mon",
"status": {
"created": "2024-11-15T20:59:25.948847Z",
"last_refresh": "2024-11-22T10:12:24.482009Z",
"running": 3,
"size": 3
}
}
]
replicated_config = config.get("replicated_pool") | ||
pool_name = replicated_config["pool_name"] | ||
active_osd_list = rados_obj.get_osd_list(status="up") | ||
active_osd_list = rados_obj.get_active_osd_list() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert the change in this line to "active_osd_list = rados_obj.get_osd_list(status="up")"
The get_active_osd_list() method no longer exists
@@ -133,8 +134,10 @@ def run(ceph_cluster, **kw): | |||
try: | |||
osd_count_before_test = get_node_osd_list(rados_obj, ceph_nodes, drain_host) | |||
log.info( | |||
f"The OSDs in the drain node before starting the test - {osd_count_before_test} " | |||
f"st The OSDs in the drain node before starting the te- {osd_count_before_test} " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix typo
@@ -194,7 +200,7 @@ def run(ceph_cluster, **kw): | |||
return 1 | |||
|
|||
if bug_exists: | |||
active_osd_list = rados_obj.get_osd_list(status="up") | |||
active_osd_list = rados_obj.get_active_osd_list() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert to using get_osd_list(status="up")
@@ -85,7 +85,7 @@ def run(ceph_cluster, **kw): | |||
|
|||
log_lines = [ | |||
"head preempted", | |||
"WaitReplicas::react(const GotReplicas&) PREEMPTED", | |||
"WaitReplicas::react(const GotReplicas&) PREEMPTED!", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SrinivasaBharath I request more clarity here.
The comparison happening in def verify_preempt_log
is not in
So previously if a subset of the sentence was not being found in log lines, how is adding an additional character '!' going to make any difference?
Description
In test case execution failed and the failed log is -
http://magna002.ceph.redhat.com/cephci-jenkins/results/openstack/RH/8.0/rhel-9/Regression/19.2.0-52/rados/45/tier-2_rados_test-drain-customer-issue
Jira tasks to track the issue are -
Please include Automation development guidelines. Source of Test case - New Feature/Regression Test/Close loop of customer BZs
click to expand checklist