Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1857522: 7-21-2020 merge #204

Merged
merged 107 commits into from
Jul 23, 2020

Conversation

trozet
Copy link
Contributor

@trozet trozet commented Jul 17, 2020

No description provided.

dave-tucker and others added 30 commits June 18, 2020 17:09
This commit asks our tests to poll for the presence of annotations
on the pods that are created as some signal that the network is,
almost, ready.

The OVN/HO master are responsible for writing these annotations.
The nodes wait on the presence of theses anootations before processing
can continue.

Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
Adds log rotation support for ovn-k8s-cni-overlay.log file.
Making (MaxBackups, MaxAge, MaxSize) as a configurable values
through yaml files.

Signed-off-by: Pardhakeswar Pacha <ppacha@nvidia.com>
The delete conntrack needs additional capibilities recently
merged into github.com/vishvananda/netlink

Signed-off-by: Jacob Tanenbaum <jtanenba@redhat.com>
during the code review of edb24e6 (topology changes to enable
Node-Local Services Access in Shared GW Mode) it was decided to use
the prefix of rtos- and stor- for the ports connecting node_local_switch
and the ovn_cluster_router. however, for the switch port we are still
using a different convention.

Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>
there were certain test cases in master_test.go that weren't enabled
for shared gateway mode, but we were adding fake commands related to
building shared gateway logical topology by explicitly overwriting
config.Gateway.Mode to config.GatewayModeShared. this causes confusion
and this commit removes that code.

Note: we already have tests to cover those set of commands.
Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>
the original implementation of the conntrack deletion will cause errors
it deletes all conntrack accross all ports for all protocols.

after revendoring the netlink code use it to allow filtering the UDP and SCTP
protocols and filter to delete selected ports

Signed-off-by: Jacob Tanenbaum <jtanenba@redhat.com>
since the focus in the community is on OVN DB standalone and OVN DB
RAFT HA, the corosync/pacemaker got left behind and is not maintained,
so it is better to remove the support for it from the repo.

Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>
This commit adds a github action that is triggered every 15 minutes.

It identifies stale test runs by cross-referencing the Git SHA of the
commit under test with the latest SHA on the pull request. If the two
do not match, then the job is cancelled.

Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
Signed-off-by: Vishwanath Jayaraman <vjayaram@redhat.com>
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
.github: Add Action To Cleanup Stale Test Runs
not all hybrid network cases will have externalGateway and VTEP IPs.
Correct the functionality of CopyNamespaceAnnotationToPod() to only
copy annotations types.HybridOverlayExternalGw and
types.HybridOverlayVTEP to a pod if they exist on the namespace.
Then the AddPod() can correctly deal with unset values.

without this patch the test 'sets up local pod flows' in
go-controller/hybrid-overlay/pkg/controller/node_linux_test.
shows errors.

Signed-off-by: Jacob Tanenbaum <jtanenba@redhat.com>
when adding a pod for a windows linux hybrid network linux nodes
will not have annotation values types.HybridOverlayExternalGw
and types.HybridOverlayVTEP. Adding a test to see that the correct
behavior will happen when those annotaions do not exist

Signed-off-by: Jacob Tanenbaum <jtanenba@redhat.com>
Signed-off-by: Shahar Klein <sklein@nvidia.com>
Signed-off-by: Alexander Constantinescu <aconstan@redhat.com>
…cts for Services

Signed-off-by: Alexander Constantinescu <aconstan@redhat.com>
Signed-off-by: Alexander Constantinescu <aconstan@redhat.com>
…cer" naming in logs

Signed-off-by: Alexander Constantinescu <aconstan@redhat.com>
…nagement of getLoadBalancer in service.go

Signed-off-by: Alexander Constantinescu <aconstan@redhat.com>
The coverage/show command against ovn-northd provides metrics that
captures various aspects of ovn-northd's functionality. This commit
is going to capture those values and export it.

In addition to those metrics, the output of "status" command is exported
as well.

Please see below document for more information:
https://docs.google.com/document/d/\
1BAsjLOpAeSyIq2UcyukPK7PgKCqgAKuHcz3962DtoUo/edit?usp=sharing

Co-authored-by: Pardhakeswar Pacha <ppacha@nvidia.com>
Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>
Was missing a `run: |` for the build step

Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
The coverage/show command against ovn-controller provides metrics that
captures various aspects of ovn-controller's functionality. This commit
is going to capture those values and export it.

In addition to those metrics, the following metrics on each of the
ovn chassis is exported:
-- ovn-openflow-probe-interval
-- ovn-remote-probe-interval
-- ovn-monitor-all
-- ovn-encap-ip
-- ovn-encap-type
-- ovn-remote
-- integration_bridge_openflow_total
-- integration_bridge_patch_ports_total
-- integration_bridge_geneve_ports_total

Please see below document for more information:
https://docs.google.com/document/d/\
1BAsjLOpAeSyIq2UcyukPK7PgKCqgAKuHcz3962DtoUo/edit?usp=sharing

Co-authored-by: Pardhakeswar Pacha <ppacha@nvidia.com>
Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>
Correct the way that externalGateway and VTEP IP are copied
test: Check network readiness before connection tests
Create Junit Reports for all test suites that support it
Signed-off-by: Jacob Tanenbaum <jtanenba@redhat.com>
One major change brought in by the vendor update is the variables
for Get(), List(), Update(), Delete(), and Create(); update the
calls to accommodate the vendor update.

Signed-off-by: Jacob Tanenbaum <jtanenba@redhat.com>
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jul 22, 2020
@trozet
Copy link
Contributor Author

trozet commented Jul 22, 2020

/hold
something is broken with ovn-k8s, we are getting empty macs in northd commands for lsp set:
2020-07-22T14:36:46Z|00223|ovn_northd|INFO|invalid syntax '10.128.4.4' in logical switch port addresses. No MAC address found
2020-07-22T14:36:46Z|00224|ovn_northd|INFO|Dropped 1961 log messages in last 60 seconds (most recently, 0 seconds ago) due to excessive rate
2020-07-22T14:36:46Z|00225|ovn_northd|INFO|invalid syntax '10.128.4.4' in port security. No MAC address found

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 22, 2020
@trozet trozet force-pushed the 7-17-2020-merge branch 6 times, most recently from 292e2e9 to 6d7b2f3 Compare July 23, 2020 16:04
Billy99 and others added 5 commits July 23, 2020 12:14
ovn-20.06.1-0.fc31 and ovn-20.06.1-4.fc31 are failing CI. So force OVN to
run ovn-20.03.0-3.fc31 while root cause is determined.

Signed-off-by: Billy McFall <22157057+Billy99@users.noreply.github.com>
Addresses must be a space-separated string sent as a single value.
The slice passed to LSPSetAddress() turns into an OvsSet in which
has special semantics for the second and later elements and causes
OVSDB to insert a newline between the MAC and IPs, leading to
northd complaints that the address is malformed.

Signed-off-by: Dan Williams <dcbw@redhat.com>
ovn: fix setting addresses on logical switch ports
@trozet
Copy link
Contributor Author

trozet commented Jul 23, 2020

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 23, 2020
@openshift-ci-robot
Copy link
Contributor

@trozet: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-openstack 6d7b2f3 link /test e2e-openstack
ci/prow/e2e-metal-ipi 6d7b2f3 link /test e2e-metal-ipi
ci/prow/e2e-vsphere 761cd78 link /test e2e-vsphere
ci/prow/e2e-azure 761cd78 link /test e2e-azure
ci/prow/e2e-gcp-ovn-upgrade 761cd78 link /test e2e-gcp-ovn-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@dcbw
Copy link
Contributor

dcbw commented Jul 23, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 23, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhat, dcbw, trozet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 563ca07 into openshift:master Jul 23, 2020
@stbenjam
Copy link
Member

stbenjam commented Jul 26, 2020

This was merged with e2e-metal-ipi tests still running, which I think failed due to the changes here. It'd be great if you'd check out the results on e2e-metal-ipi when merging changes, since none of the other OVN tests cover IPv6, right?

I filed https://bugzilla.redhat.com/show_bug.cgi?id=1860710

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_ovn-kubernetes/204/pull-ci-openshift-ovn-kubernetes-master-e2e-metal-ipi/1286403200164827137

level=error msg="Cluster operator network Degraded is True with RolloutHung: DaemonSet \"openshift-ovn-kubernetes/ovnkube-master\" rollout is not making progress - pod ovnkube-master-slt8c is in CrashLoopBackOff State\nDaemonSet \"openshift-ovn-kubernetes/ovnkube-master\" rollout is not making progress - pod ovnkube-master-llrx7 is in CrashLoopBackOff State\nDaemonSet \"openshift-ovn-kubernetes/ovnkube-master\" rollout is not making progress - pod ovnkube-master-sgdhj is in CrashLoopBackOff State\nDaemonSet \"openshift-ovn-kubernetes/ovnkube-node\" rollout is not making progress - pod ovnkube-node-kwmln is in CrashLoopBackOff State\nDaemonSet \"openshift-ovn-kubernetes/ovnkube-node\" rollout is not making progress - pod ovnkube-node-jc78x is in CrashLoopBackOff State\nDaemonSet \"openshift-ovn-kubernetes/ovnkube-node\" rollout is not making progress - pod ovnkube-node-nm944 is in CrashLoopBackOff State\nDaemonSet \"openshift-ovn-kubernetes/ovnkube-node\" rollout is not making progress - last change 2020-07-23T21:31:57Z"

@trozet
Copy link
Contributor Author

trozet commented Jul 26, 2020

@stbenjam the merge happens automatically. You could propose making it required to pass for merge to happen. I didn't know metal-ipi used ipv6 (maybe we could add that to the job name?), but it would be nice to have that as a required job as long as it is stable.

@stbenjam
Copy link
Member

Well it’s a bit of a catch 22, how can we be stable when we’re constantly broken by changes? I don’t know how to break that cycle.

I think the only thing we can ask is that you look at optional jobs before approving/lgtm to make sure if it’s failing it wasn’t due to the changes here.

@trozet
Copy link
Contributor Author

trozet commented Jul 26, 2020 via email

@stbenjam
Copy link
Member

Do you think we could take #215 downstream before that happens?

@trozet
Copy link
Contributor Author

trozet commented Jul 26, 2020 via email

@stbenjam
Copy link
Member

Right, that PR was open since May so I didn't know when it would land. If it's in the next day that's fine but this is the last week before feature freeze. ovn-org/ovn-kubernetes#1539 was the upstream version of the OpenShift PR if 1343 is going to be delayed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.