Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasionally AF-PACKET interface gets into a failed state after microservice restart #1489

Open
milanlenco opened this issue Oct 1, 2019 · 4 comments
Assignees
Labels

Comments

@milanlenco
Copy link
Collaborator

Based on E2E tests we observe that AF-PACKET is not always in a functional state after it is recreated due to microservice restart (TestBridgeDomainWithAfPackets fails).
Seems to be a VPP issue since agent executes operations in the right order and without any error.
The problem is that it is very difficult to reproduce locally, whereas on Travis it happens quite often.

@milanlenco milanlenco self-assigned this Oct 1, 2019
@rewenset
Copy link
Contributor

rewenset commented Oct 3, 2019

Since I could reproduce this failing state more often I start to investigate it. The error was in this line:

Expect(ctx.pingFromMs(ms2Name, veth1IP)).To(Succeed())

And the error was:

Expected success, but got an error:
    <*errors.errorString | 0xc0003b5090>: {
        s: "failed to ping '192.168.1.2': \n5 packets transmitted, 0 packets received, 100% packet loss",
    }
    failed to ping '192.168.1.2': 
    5 packets transmitted, 0 packets received, 100% packet loss

I've put a for loop, which looked like: for ctx.pingFromMs(ms2Name, veth1IP) != nil {, right before the failing line. After this the test never failed. Sometimes it was just one run of ping, and sometimes it took three times. In the last case, there was SB Notification before successful ping results.

Click here to see SB Notification

+======================================================================================================================+
| #20 - SB Notification                                                                                                |
+======================================================================================================================+
  * transaction arguments:
      - seqNum: 20
      - type: SB Notification
      - values:
          - key: vpp/interface/vpp-afpacket1/link-state/DOWN
            val: <NIL>
          - key: vpp/interface/vpp-afpacket1/link-state/UP
            val: <EMPTY>

o----------------------------------------------------------------------------------------------------------------------o
  * executed operations (2019-10-02 13:54:01.524 +0000 UTC -> 2019-10-02 13:54:01.525 +0000 UTC, dur: 0s):
      1. DELETE [WAS-OBTAINED]:
          - key: vpp/interface/vpp-afpacket1/link-state/DOWN
          - value: <EMPTY>
      2. CREATE [OBTAINED]:
          - key: vpp/interface/vpp-afpacket1/link-state/UP
          - value: <EMPTY>
x----------------------------------------------------------------------------------------------------------------------x
| #20 - SB Notification                                                                                     took 300µs |
x----------------------------------------------------------------------------------------------------------------------x

So this "sometimes failing" test is explained by long time of recreation of af-packet interface.

@ondrej-fabry
Copy link
Member

@milanlenco @rewenset could #1499 resolve this completely?

@milanlenco
Copy link
Collaborator Author

We should definitely re-test, but I would be surprised if it did fix this, because in this e2e test we do not do the "fast" microservice restart (we should add that too), but who knows...

@rewenset
Copy link
Contributor

rewenset commented Oct 7, 2019

"who knows" is good reason for me to start re-test. I'll write about testing results here.
Is it will be good if I try this TestBridgeDomainWithAfPackets but just how it was before waiting as workaround was introduced?
EDIT:
Me re-testing TestBridgeDomainWithAfPackets: ✔️ ✔️ ✔️ ❌

Or maybe you were talking about something else?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants